2023-03-05 18:31:43 +00:00
|
|
|
# Reverse Index
|
|
|
|
|
2023-03-21 15:12:31 +00:00
|
|
|
The reverse index contains a mapping from word to document id.
|
|
|
|
|
2023-03-22 16:09:48 +00:00
|
|
|
There are two tiers of this index.
|
|
|
|
|
|
|
|
* A priority index which only indexes terms that are flagged with priority flags<sup>1</sup>.
|
|
|
|
* A full index that indexes all terms.
|
|
|
|
|
2024-09-13 09:01:05 +00:00
|
|
|
The full index also provides access to term-level metadata, while the priority index is
|
|
|
|
a binary index that only offers information about which documents has a specific word.
|
|
|
|
|
|
|
|
The priority index is also compressed, while the full index at this point is not.
|
2023-03-21 15:12:31 +00:00
|
|
|
|
|
|
|
[1] See WordFlags in [common/model](../../common/model/) and
|
2024-09-22 11:56:17 +00:00
|
|
|
KeywordMetadata in [converting-process/ft-keyword-extraction](../../processes/converting-process/ft-keyword-extraction).
|
2023-03-05 18:31:43 +00:00
|
|
|
|
2023-08-29 09:35:54 +00:00
|
|
|
## Construction
|
|
|
|
|
|
|
|
The reverse index is constructed by first building a series of preindexes.
|
|
|
|
Preindexes consist of a Segment and a Documents object. The segment contains
|
|
|
|
information about which word identifiers are present and how many, and the
|
|
|
|
documents contain information about in which documents the words can be found.
|
|
|
|
|
|
|
|
data:image/s3,"s3://crabby-images/03249/0324902bb360c7adb3fc35cf7c17add486a87fcf" alt="Memory layout illustrations"
|
|
|
|
|
|
|
|
These would typically not fit in RAM, so the index journal is paged
|
|
|
|
and the preindexes are constructed small enough to fit in memory, and
|
|
|
|
then merged. Merging sorted arrays is a very fast operation that does
|
|
|
|
not require additional RAM.
|
|
|
|
|
|
|
|
data:image/s3,"s3://crabby-images/bfeef/bfeef782db95923f84a62e0a698e614b86c6dd2d" alt="Illustration of successively merged preindex files"
|
|
|
|
|
|
|
|
Once merged into one large preindex, indexes are added to the preindex data
|
|
|
|
to form a finalized reverse index.
|
|
|
|
|
|
|
|
data:image/s3,"s3://crabby-images/48b20/48b204b473f44a5e17fcdc8635c1f7b4e59e9aeb" alt="Illustration of the data layout of the finalized index"
|
2023-03-05 18:31:43 +00:00
|
|
|
## Central Classes
|
|
|
|
|
2024-09-13 09:01:05 +00:00
|
|
|
Full index:
|
|
|
|
* [FullPreindex](java/nu/marginalia/index/construction/full/FullPreindex.java) intermediate reverse index state.
|
|
|
|
* [FullIndexConstructor](java/nu/marginalia/index/construction/full/FullIndexConstructor.java) constructs the index.
|
|
|
|
* [FullReverseIndexReader](java/nu/marginalia/index/FullReverseIndexReader.java) interrogates the index.
|
|
|
|
|
|
|
|
Prio index:
|
|
|
|
* [PrioPreindex](java/nu/marginalia/index/construction/prio/PrioPreindex.java) intermediate reverse index state.
|
|
|
|
* [PrioIndexConstructor](java/nu/marginalia/index/construction/prio/PrioIndexConstructor.java) constructs the index.
|
|
|
|
* [PrioIndexReader](java/nu/marginalia/index/PrioReverseIndexReader.java) interrogates the index.
|
|
|
|
|
2023-08-29 09:35:54 +00:00
|
|
|
|
|
|
|
## See Also
|
|
|
|
|
|
|
|
* [index-journal](../index-journal)
|
|
|
|
* [index-forward](../index-forward)
|
|
|
|
* [libraries/btree](../../libraries/btree)
|
|
|
|
* [libraries/array](../../libraries/array)
|