MarginaliaSearch/code/index/index-forward
Viktor Lofgren dcb43a3308 (slop) Introduce table concept to keep track of positions and simplify closing
The most common error when dealing with Slop columns is that they can fall out of sync with each other if the programmer accidentally does a conditional read and forgets to skip.

The second most common error is forgetting to close one of the columns in a reader or writer.

To deal with both cases, a new class SlopTable is added that keeps track of the lifecycle of all slop columns and performs a check when closing them that they are in sync.
2024-07-27 13:47:47 +02:00
..
java/nu/marginalia/index/forward (slop) Introduce table concept to keep track of positions and simplify closing 2024-07-27 13:47:47 +02:00
test/nu/marginalia/index/forward (wip) Extract and encode spans data 2024-07-27 11:44:13 +02:00
build.gradle (wip) Extract and encode spans data 2024-07-27 11:44:13 +02:00
readme.md (docs) Begin un-fucking the docs after refactoring 2024-02-27 21:22:21 +01:00

Forward Index

The forward index contains a mapping from document id to various forms of document metadata.

In practice, the forward index consists of two files, an id file and a data file.

The id file contains a list of sorted document ids, and the data file contains metadata for each document id, in the same order as the id file, with a fixed size record containing data associated with each document id.

Each record contains a binary encoded DocumentMetadata object, as well as a HtmlFeatures bitmask.

Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory, so it's relatively easy to construct.

Central Classes