MarginaliaSearch/code/libraries/term-frequency-dict
Viktor Lofgren b6d365bacd (index) Clean up data model
The change set cleans up the data model for the term-level data.  This used to contain a bunch of fields with document-level metadata.  This data-duplication means a larger memory footprint and worse memory locality.

The ranking code is also modified to not accept SearchResultKeywordScores, but rather CompiledQueryLong and CqDataInts containing only the term metadata and the frequency information needed for ranking.  This is again an effort to improve memory locality.
2024-04-15 16:04:07 +02:00
..
java/nu/marginalia (ngram) Grab titles separately when extracting ngrams from wiki data 2024-04-13 19:34:16 +02:00
test/nu/marginalia/segmentation (index) Clean up data model 2024-04-15 16:04:07 +02:00
build.gradle (ngram) Use simple blocking pool instead of FJP; split on underscores in article names. 2024-04-13 17:07:23 +02:00
readme.md Clean up documentation and rename domain-links to link-graph 2024-02-28 11:40:39 +01:00

Term Frequency Dictionary

This dictionary is used by various parts of the system to evaluate for example the TF-IDF score of a keyword.

Central Classes