MarginaliaSearch/code/index/index-journal
Viktor Lofgren aebb2652e8 (wip) Extract and encode spans data
Refactoring keyword extraction to extract spans information.

Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions.

This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact.  Will push as-is to get back to being able to do more isolated work.
2024-07-27 11:44:13 +02:00
..
java/nu/marginalia/index/journal (wip) Extract and encode spans data 2024-07-27 11:44:13 +02:00
build.gradle (wip) Extract and encode spans data 2024-07-27 11:44:13 +02:00
readme.md (docs) Begin un-fucking the docs after refactoring 2024-02-27 21:22:21 +01:00

Index Journal

The index journal contains a list of entries with keywords and keyword metadata per document.

This journal is written by processes/loading-process and read when constructing the forward and reverse indices.

The journal format is a file header, followed by a zstd-compressed list of entries, each containing a header with document-level data, and a data section with keyword-level data.

The journal data may be split into multiple files, and the journal writers and readers are designed to handle this transparently via their Paging implementation.

Central Classes

Model

I/O