mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-23 13:09:00 +00:00

History

Viktor Lofgren 90a2d4ae38 (index) Fix partial buffer writing in PrioDocIdsTransformer Ensure all data is written to writeChannel by looping until the buffer is fully drained. This prevents potential data loss during the close operation and maintains data integrity.		2024-09-29 17:53:40 +02:00
..
java/nu/marginalia/index	(index) Fix partial buffer writing in PrioDocIdsTransformer	2024-09-29 17:53:40 +02:00
test/nu/marginalia/index	(index) Fix write offset calculation in PrioDocIdsTransformer	2024-09-29 17:20:29 +02:00
build.gradle	(slop) Break slop out into its own repository	2024-08-13 09:50:05 +02:00
index.svg	(refac) Remove src/main from all source code paths.	2024-02-23 16:13:40 +01:00
merging.svg	(refac) Remove src/main from all source code paths.	2024-02-23 16:13:40 +01:00
preindex.svg	(refac) Remove src/main from all source code paths.	2024-02-23 16:13:40 +01:00
readme.md	(doc) Fix outdated links in documentation	2024-09-22 13:56:17 +02:00

readme.md

Reverse Index

The reverse index contains a mapping from word to document id.

There are two tiers of this index.

A priority index which only indexes terms that are flagged with priority flags¹.
A full index that indexes all terms.

The full index also provides access to term-level metadata, while the priority index is a binary index that only offers information about which documents has a specific word.

The priority index is also compressed, while the full index at this point is not.

[1] See WordFlags in common/model and KeywordMetadata in converting-process/ft-keyword-extraction.

Construction

The reverse index is constructed by first building a series of preindexes. Preindexes consist of a Segment and a Documents object. The segment contains information about which word identifiers are present and how many, and the documents contain information about in which documents the words can be found.

These would typically not fit in RAM, so the index journal is paged and the preindexes are constructed small enough to fit in memory, and then merged. Merging sorted arrays is a very fast operation that does not require additional RAM.

Once merged into one large preindex, indexes are added to the preindex data to form a finalized reverse index.

Central Classes

Full index:

FullPreindex intermediate reverse index state.
FullIndexConstructor constructs the index.
FullReverseIndexReader interrogates the index.

Prio index:

PrioPreindex intermediate reverse index state.
PrioIndexConstructor constructs the index.
PrioIndexReader interrogates the index.

readme.md

Reverse Index

Construction

Central Classes

See Also