Viktor Lofgren
7641a02f31
(query) Update ranking parameters with new variables for bm25 ngrams and tcf mutual jaccard
...
The change also makes it so that as long as the values are defaults, they don't need to be sent over the wire and decoded.
2024-04-18 10:36:15 +02:00
Viktor Lofgren
d64bd227cf
(index) Clean up jaccard index term code and down-tune the parameter's importance a bit
2024-04-17 17:40:16 +02:00
Viktor Lofgren
c5ab0a9054
(index) Add jaccard index term to boost results based on term overlap
2024-04-17 16:50:26 +02:00
Viktor Lofgren
dac948973d
(index) Remove position overlap check, coherences will do the work instead
2024-04-17 14:20:01 +02:00
Viktor Lofgren
9d008d1d6f
(index) Omit absent terms from coherence checks
2024-04-17 14:12:16 +02:00
Viktor Lofgren
f52457213e
(index) Split ngram and regular keyword bm25 calculation and add ngram score as a bonus
2024-04-17 14:05:02 +02:00
Viktor Lofgren
af8ff8ce99
(index) Improve recall for small queries
...
Partially reverse the previous commit and add a query head for the priority index when there are few query interpretations.
2024-04-16 22:51:03 +02:00
Viktor Lofgren
7fa3e86e64
(index) Remove dead code
...
Since the performance fix in 3359f72239
had a huge positive impact without reducing result quality, it's possible to remove the QueryBranchWalker and associated code.
2024-04-16 19:59:27 +02:00
Viktor Lofgren
3359f72239
(index) Experimental performance regression fix
2024-04-16 19:48:14 +02:00
Viktor Lofgren
41fa154aa6
(test) Fix broken test
2024-04-16 19:48:14 +02:00
Viktor Lofgren
deaba0152d
(index) Explicitly free LongQueryBuffers
2024-04-16 19:23:00 +02:00
Viktor Lofgren
feaef6093e
(index) Fix term coherence evaluation
...
The code was incorrectly using the documentId instead of the combined id, resulting in almost all result sets being incorrectly seen as zero.
2024-04-16 18:07:43 +02:00
Viktor Lofgren
078fa4fdd0
(valuation) Impose stronger constraints on locality of terms
...
Clean up logic a bit
2024-04-16 17:22:58 +02:00
Viktor Lofgren
2dc77a0638
(valuation) Impose stronger constraints on locality of terms
2024-04-16 17:15:21 +02:00
Viktor Lofgren
599e719ad4
(index) Fix priority search terms
...
This functionality fell into disrepair some while ago. It's supposed to allow non-mandatory search terms that boost the ranking if they are present in the document.
2024-04-15 16:44:08 +02:00
Viktor Lofgren
b6d365bacd
(index) Clean up data model
...
The change set cleans up the data model for the term-level data. This used to contain a bunch of fields with document-level metadata. This data-duplication means a larger memory footprint and worse memory locality.
The ranking code is also modified to not accept SearchResultKeywordScores, but rather CompiledQueryLong and CqDataInts containing only the term metadata and the frequency information needed for ranking. This is again an effort to improve memory locality.
2024-04-15 16:04:07 +02:00
Viktor Lofgren
65e3caf402
(index) Clean up the code
2024-04-11 18:50:21 +02:00
Viktor Lofgren
fcdc843c15
(search) Fix outdated assumptions about the results
...
We no longer break the query into "sets" of search terms and need to adapt the code to not use this assumption.
For the API service, we'll simulate the old behavior to keep the API stable.
For the search service, we'll introduce a new way of calculating positions through tree aggregation.
2024-04-07 12:09:44 +02:00
Viktor Lofgren
dbdcf459a7
(minor) Remove dead code
2024-04-06 16:27:16 +02:00
Viktor Lofgren
ef25d60666
(index) Add origin trace information for index readers
...
This used to be supported by the system but got lost in refactoring at some point.
2024-04-06 13:28:14 +02:00
Viktor Lofgren
ae7c760772
(index) Clean up new index query code
2024-04-05 13:30:49 +02:00
Viktor Lofgren
81815f3e0a
(qs, index) New query model integrated with index service.
...
Seems to work, tests are green and initial testing finds no errors. Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.
2024-04-04 20:17:58 +02:00
Viktor Lofgren
002afca1c5
(sys) Upgrade to JDK22
...
This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.
2024-03-21 14:33:27 +01:00
Viktor Lofgren
fe8d583fdd
(sys) Upgrade to JDK22
...
This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.
2024-03-21 14:27:13 +01:00
Viktor Lofgren
46423612e3
(refac) Merge service-discovery and service modules
...
Also adds a few tests to the server/client code.
2024-03-03 10:49:23 +01:00
Viktor Lofgren
9a045a0588
(index) Clean up index code
2024-02-28 13:09:47 +01:00
Viktor Lofgren
d78e9e715f
(misc) Fix broken tests
2024-02-28 12:12:43 +01:00
Viktor Lofgren
9f1649636e
Clean up documentation and rename domain-links
to link-graph
2024-02-28 11:40:39 +01:00
Viktor Lofgren
99a6e56e99
(index-client) Increase thread count in index client
...
This should be a fair bit larger than the number of index nodes
2024-02-27 22:00:29 +01:00
Viktor Lofgren
e696fd9e92
(docs) Begin un-fucking the docs after refactoring
2024-02-27 21:22:21 +01:00
Viktor Lofgren
eaf836dc66
(service/grpc) Reduce thread count
...
Netty and GRPC by default spawns an incredible number of threads on high-core CPUs, which amount to a fair bit of RAM usage.
Add custom executors that throttle this behavior.
2024-02-27 21:22:21 +01:00
Viktor Lofgren
67aa20ea2c
(array) Attempting to debug strange errors
2024-02-27 21:22:18 +01:00
Viktor Lofgren
1a51ec2d69
(index) Index optimization
2024-02-27 21:22:17 +01:00
Viktor Lofgren
3eb0800742
(index) Improve granularity of candidate queue polling
2024-02-27 21:22:17 +01:00
Viktor Lofgren
427f3e922f
(index) Retire count operation, clean up index code.
2024-02-27 21:22:17 +01:00
Viktor Lofgren
823ca73a3f
(domain-ranking) Fix a crash during ranking the edges of the similarity graph doesn't quite match the vertices of the link graph.
2024-02-27 21:22:17 +01:00
Viktor Lofgren
7fc0d4d786
(index) Observability for query execution queues
2024-02-27 21:22:17 +01:00
Viktor Lofgren
b8e336e809
(index) Reduce time allocation a bit
2024-02-27 21:22:17 +01:00
Viktor Lofgren
9429bf5c45
(index) Clean up
2024-02-27 21:22:17 +01:00
Viktor Lofgren
fc00701a1e
(index) Experimental refactoring of the indexing functionality
2024-02-25 11:05:10 +01:00
Viktor Lofgren
1d34224416
(refac) Remove src/main from all source code paths.
...
Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one.
While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's *modul*ar. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.
2024-02-23 16:13:40 +01:00
Viktor Lofgren
2201b1a506
(refac) Clean up code issues
2024-02-23 11:39:19 +01:00
Viktor Lofgren
5cdb07023b
(refac) Clean up unused imports
2024-02-23 11:27:20 +01:00
Viktor Lofgren
4740156cfa
Clean up docs
2024-02-22 18:18:58 +01:00
Viktor Lofgren
f8e7f75831
Move index to top level of code
2024-02-22 18:01:35 +01:00
Viktor Lofgren
73eaa0865d
The refactoring will continue until morale improves.
2023-03-12 10:50:31 +01:00
Viktor Lofgren
616effdb3c
The refactoring will continue until morale improves.
2023-03-12 10:04:48 +01:00
Viktor Lofgren
6d939175b1
Additional code restructuring to get rid of util and misc-style packages.
2023-03-11 13:48:40 +01:00
Viktor Lofgren
722ff3bffb
Word feature bit for words that appear in the URL, new search profile for plain text files, better plain text titles.
2023-03-10 16:46:56 +01:00
Viktor Lofgren
1252f95da5
Fix for valuation bug in index code that wouldn't sort bad-ish items properly.
2023-03-07 21:26:04 +01:00
Viktor Lofgren
ad1be7c835
Move all code to a code directory.
2023-03-07 17:14:32 +01:00