Commit Graph

17 Commits

Author SHA1 Message Date
Viktor Lofgren
4489b21528 (ranking) Cleanup 2024-04-24 14:44:39 +02:00
Viktor Lofgren
f623b37577 (ranking) Suppress NaN:s in ranking output 2024-04-24 14:44:39 +02:00
Viktor Lofgren
f4a2fea451 (ranking, bugfix) Use bm25NgramWeight and not full weight for bM25N 2024-04-24 14:44:39 +02:00
Viktor Lofgren
0dcca0cb83 (index) Fix TCF bug where the ngram terms would be considered instead of the regular ones due to a logical derp 2024-04-24 14:44:39 +02:00
Viktor Lofgren
b80a83339b (qs) Additional info in query debug UI 2024-04-24 14:44:39 +02:00
Viktor Lofgren
eb74d08f2a (qs) Additional info in query debug UI 2024-04-24 14:44:39 +02:00
Viktor Lofgren
462aa9af26 (query) Update ranking parameters with new variables for bm25 ngrams and tcf mutual jaccard
The change also makes it so that as long as the values are defaults, they don't need to be sent over the wire and decoded.
2024-04-24 14:44:39 +02:00
Viktor Lofgren
44b33798f3 (index) Clean up jaccard index term code and down-tune the parameter's importance a bit 2024-04-24 14:44:39 +02:00
Viktor Lofgren
2f0b648fad (index) Add jaccard index term to boost results based on term overlap 2024-04-24 14:44:39 +02:00
Viktor Lofgren
cb4b824a85 (index) Split ngram and regular keyword bm25 calculation and add ngram score as a bonus 2024-04-24 14:44:39 +02:00
Viktor Lofgren
6efc0f21fe (index) Clean up data model
The change set cleans up the data model for the term-level data.  This used to contain a bunch of fields with document-level metadata.  This data-duplication means a larger memory footprint and worse memory locality.

The ranking code is also modified to not accept SearchResultKeywordScores, but rather CompiledQueryLong and CqDataInts containing only the term metadata and the frequency information needed for ranking.  This is again an effort to improve memory locality.
2024-04-24 14:44:39 +02:00
Viktor Lofgren
6cba6aef3b (minor) Remove dead code 2024-04-24 14:44:38 +02:00
Viktor Lofgren
a3a6d6292b (qs, index) New query model integrated with index service.
Seems to work, tests are green and initial testing finds no errors.  Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.
2024-04-24 14:44:38 +02:00
Viktor Lofgren
9f1649636e Clean up documentation and rename domain-links to link-graph 2024-02-28 11:40:39 +01:00
Viktor Lofgren
823ca73a3f (domain-ranking) Fix a crash during ranking the edges of the similarity graph doesn't quite match the vertices of the link graph. 2024-02-27 21:22:17 +01:00
Viktor Lofgren
fc00701a1e (index) Experimental refactoring of the indexing functionality 2024-02-25 11:05:10 +01:00
Viktor Lofgren
1d34224416 (refac) Remove src/main from all source code paths.
Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one.

While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules.  Which you'll do a lot, because it's *modul*ar.  The src/main/java convention makes a lot of sense for a non-modular project though.  This ain't that.
2024-02-23 16:13:40 +01:00