MarginaliaSearch/code/index/index-forward
Viktor Lofgren 51e46ad2b0 (refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents
Since some of the export tasks have been memory hungry, sometimes killing the executor-services, they've been moved to a separate process that can be given a larger Xmx.

While doing this, the ProcessMainClass was given utilities for the boilerplate surrounding receiving mq requests and responding to them, some effort was also put toward making the process boot process a bit more uniform.  It's still a bit heterogeneous between different processes, but a bit less so for now.
2024-11-21 16:00:09 +01:00
..
java/nu/marginalia/index/forward (index, EXPERIMENTAL) Clean up ranking code 2024-08-29 11:34:23 +02:00
test/nu/marginalia/index/forward (chore) Remove lombok 2024-11-11 21:14:38 +01:00
build.gradle (refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents 2024-11-21 16:00:09 +01:00
readme.md (doc) Correct dead links and stale information in the docs 2024-09-13 11:01:05 +02:00

Forward Index

The forward index contains a mapping from document id to various forms of document metadata.

In practice, the forward index consists of two files, an id file and a data file.

The id file contains a list of sorted document ids, and the data file contains metadata for each document id, in the same order as the id file, with a fixed size record containing data associated with each document id.

Each record contains a binary encoded DocumentMetadata object, as well as a HtmlFeatures bitmask.

Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory, so it's relatively easy to construct.

Central Classes