mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-24 13:19:02 +00:00
56 lines
2.4 KiB
Markdown
56 lines
2.4 KiB
Markdown
# Reverse Index
|
|
|
|
The reverse index contains a mapping from word to document id.
|
|
|
|
There are two tiers of this index.
|
|
|
|
* A priority index which only indexes terms that are flagged with priority flags<sup>1</sup>.
|
|
* A full index that indexes all terms.
|
|
|
|
The full index also provides access to term-level metadata, while the priority index is
|
|
a binary index that only offers information about which documents has a specific word.
|
|
|
|
The priority index is also compressed, while the full index at this point is not.
|
|
|
|
[1] See WordFlags in [common/model](../../common/model/) and
|
|
KeywordMetadata in [converting-process/ft-keyword-extraction](../../processes/converting-process/ft-keyword-extraction).
|
|
|
|
## Construction
|
|
|
|
The reverse index is constructed by first building a series of preindexes.
|
|
Preindexes consist of a Segment and a Documents object. The segment contains
|
|
information about which word identifiers are present and how many, and the
|
|
documents contain information about in which documents the words can be found.
|
|
|
|
data:image/s3,"s3://crabby-images/e3fab/e3fabd64f6b2f5150e001e1e29d2bba137be95e9" alt="Memory layout illustrations"
|
|
|
|
These would typically not fit in RAM, so the index journal is paged
|
|
and the preindexes are constructed small enough to fit in memory, and
|
|
then merged. Merging sorted arrays is a very fast operation that does
|
|
not require additional RAM.
|
|
|
|
data:image/s3,"s3://crabby-images/0105e/0105eedbc535f0e718f5416a4a31e42712209d97" alt="Illustration of successively merged preindex files"
|
|
|
|
Once merged into one large preindex, indexes are added to the preindex data
|
|
to form a finalized reverse index.
|
|
|
|
data:image/s3,"s3://crabby-images/969b7/969b78e39e2ef13596f4d2eb3ef1b8d7b51bf0a5" alt="Illustration of the data layout of the finalized index"
|
|
## Central Classes
|
|
|
|
Full index:
|
|
* [FullPreindex](java/nu/marginalia/index/construction/full/FullPreindex.java) intermediate reverse index state.
|
|
* [FullIndexConstructor](java/nu/marginalia/index/construction/full/FullIndexConstructor.java) constructs the index.
|
|
* [FullReverseIndexReader](java/nu/marginalia/index/FullReverseIndexReader.java) interrogates the index.
|
|
|
|
Prio index:
|
|
* [PrioPreindex](java/nu/marginalia/index/construction/prio/PrioPreindex.java) intermediate reverse index state.
|
|
* [PrioIndexConstructor](java/nu/marginalia/index/construction/prio/PrioIndexConstructor.java) constructs the index.
|
|
* [PrioIndexReader](java/nu/marginalia/index/PrioReverseIndexReader.java) interrogates the index.
|
|
|
|
|
|
## See Also
|
|
|
|
* [index-journal](../index-journal)
|
|
* [index-forward](../index-forward)
|
|
* [libraries/btree](../../libraries/btree)
|
|
* [libraries/array](../../libraries/array) |