Commit Graph

16 Commits

Author SHA1 Message Date
Viktor Lofgren
cb4b824a85 (index) Split ngram and regular keyword bm25 calculation and add ngram score as a bonus 2024-04-24 14:44:39 +02:00
Viktor Lofgren
155be1078d (index) Fix priority search terms
This functionality fell into disrepair some while ago.  It's supposed to allow non-mandatory search terms that boost the ranking if they are present in the document.
2024-04-24 14:44:39 +02:00
Viktor Lofgren
6efc0f21fe (index) Clean up data model
The change set cleans up the data model for the term-level data.  This used to contain a bunch of fields with document-level metadata.  This data-duplication means a larger memory footprint and worse memory locality.

The ranking code is also modified to not accept SearchResultKeywordScores, but rather CompiledQueryLong and CqDataInts containing only the term metadata and the frequency information needed for ranking.  This is again an effort to improve memory locality.
2024-04-24 14:44:39 +02:00
Viktor Lofgren
8bf7d090fd (qs) Clean up parsing code using new record matching 2024-04-24 14:44:38 +02:00
Viktor Lofgren
4fb86ac692 (search) Fix outdated assumptions about the results
We no longer break the query into "sets" of search terms and need to adapt the code to not use this assumption.

For the API service, we'll simulate the old behavior to keep the API stable.

For the search service, we'll introduce a new way of calculating positions through tree aggregation.
2024-04-24 14:44:38 +02:00
Viktor Lofgren
e3316a3672 (index) Clean up new index query code 2024-04-24 14:44:38 +02:00
Viktor Lofgren
a3a6d6292b (qs, index) New query model integrated with index service.
Seems to work, tests are green and initial testing finds no errors.  Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.
2024-04-24 14:44:38 +02:00
Viktor Lofgren
fe8d583fdd (sys) Upgrade to JDK22
This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.
2024-03-21 14:27:13 +01:00
Viktor Lofgren
46423612e3 (refac) Merge service-discovery and service modules
Also adds a few tests to the server/client code.
2024-03-03 10:49:23 +01:00
Viktor Lofgren
427f3e922f (index) Retire count operation, clean up index code. 2024-02-27 21:22:17 +01:00
Viktor Lofgren
9429bf5c45 (index) Clean up 2024-02-27 21:22:17 +01:00
Viktor Lofgren
fc00701a1e (index) Experimental refactoring of the indexing functionality 2024-02-25 11:05:10 +01:00
Viktor Lofgren
1d34224416 (refac) Remove src/main from all source code paths.
Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one.

While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules.  Which you'll do a lot, because it's *modul*ar.  The src/main/java convention makes a lot of sense for a non-modular project though.  This ain't that.
2024-02-23 16:13:40 +01:00
Viktor Lofgren
f8e7f75831 Move index to top level of code 2024-02-22 18:01:35 +01:00
Viktor Lofgren
085137ca63 * Extract the index functionality 2024-02-22 17:31:25 +01:00
Viktor Lofgren
3fd2a83184 * Extract the search-query function 2024-02-22 15:27:39 +01:00