Commit Graph

165 Commits

Author SHA1 Message Date
Viktor Lofgren
569520c9b6 (index) Add manual adjustments for rankings based on domain 2025-01-21 15:07:43 +01:00
Viktor Lofgren
3772bfd387 (query) Fix handling of optional ranking parameters 2025-01-08 17:11:22 +01:00
Viktor Lofgren
a1fb92468f (refac) Remove ResultRankingParameters, QueryLimits class and use protobuf classes directly instead
This is primarily to make the code a bit easier to reason about, and will reduce the level of indirection and data copying in the search-servi->query-service->index-service communication chain.
2025-01-08 16:15:57 +01:00
Viktor Lofgren
a84a06975c (ranking-params) Add disable penalties flag to ranking params
This will help debugging ranking issues.  Later it may be added to some filters.
2025-01-08 00:16:49 +01:00
Viktor Lofgren
dafaab3ef7 (index) Additional optimization pass 2024-12-12 18:57:33 +01:00
Viktor Lofgren
3f11ca409f (index) Increase thread limit and optimize search result handling
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 17:07:06 +01:00
Viktor Lofgren
694eed79ef (index) Increase thread limit and optimize search result handling
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 15:32:31 +01:00
Viktor Lofgren
4220169119 (index) Increase thread limit and optimize search result handling
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 15:31:11 +01:00
Viktor Lofgren
73861e613f (ranking) Downtune score boost for unordered heading matces 2024-12-11 15:44:29 +01:00
Viktor Lofgren
cf7f84f033 (rank) Reduce the impact of domain rank bonus, and only apply it to cancel out negative penalties, never to increase the ranking 2024-12-10 22:04:12 +01:00
Viktor Lofgren
291ca8daf1 (converter/index) Improve atag sentence matching by taking into consideration how many times a sentence appears in the links
This change breaks the format of the atags.parquet file.
2024-12-08 00:27:11 +01:00
Viktor Lofgren
7b64377fd6 (ranking) Promote documents with multiple phrase matches with a log-scale bonus 2024-11-28 13:36:56 +01:00
Viktor Lofgren
ba47d72bf4 (ranking) Adjust scores for external link matches 2024-11-27 14:27:23 +01:00
Viktor Lofgren
077d8dcd11 (result-score) Adjust ranking parameters a tiny bit 2024-11-25 18:30:59 +01:00
Viktor Lofgren
7d1ef08a0f (index) Correct ranking bonus for external linktext appearnces 2024-11-25 17:40:15 +01:00
Viktor Lofgren
0b6b5dab07 (index) Add score bonuses for single-word anchor tag spans
Enhanced scoring logic to add bonuses when the query matches single-word anchor (atag) spans exactly. Implemented this by adding conditions in `IndexResultScoreCalculator.java` and creating a new method `containsRangeExact` in `DocumentSpan.java` to check for exact span matches.
2024-11-25 14:44:41 +01:00
Viktor Lofgren
dc5f97e737 (index) Add bonus for single-word title matches when the title is also a single word 2024-11-25 13:24:12 +01:00
Viktor Lofgren
f09669a5b0 (index) Correct usage of DocumentSpan.length() instead of DocumentSpan.size()
The latter counts the number of spans, and is not what you want here.
2024-11-25 13:11:55 +01:00
Viktor Lofgren
9f47ce8d15 (chore) Remove lombok
There are likely some instances of delombok gore with this commit.
2024-11-11 21:14:38 +01:00
Viktor Lofgren
6460c11107 (index) Short-circuit rankResults when there are no results 2024-10-14 13:47:35 +02:00
Viktor Lofgren
1bd29a586c (service-discovery) Add common base interface to all Grpc services
To be able to tell service discovery whether to enable a service on a particular runtime, a common base interface DiscoverableService extends BindableService was added.
2024-09-27 13:46:34 +02:00
Viktor Lofgren
3dec4b6b34 (index) Fix bug where tcfFirstPosition lit up because one term was in the title and the other was missing from the document
This was because firstPosition calculation was not invalidated when positions were missing.
2024-09-24 13:33:37 +02:00
Viktor Lofgren
f4eeef145e (index) Reduce fetch size to improve timeout characteristics 2024-09-17 15:20:41 +02:00
Viktor Lofgren
87aa869338 (index) Correct positions mask to take into account offsets when overlapping 2024-09-17 14:40:37 +02:00
Viktor Lofgren
bb5d946c26 (index, EXPERIMENTAL) Clean up ranking code 2024-08-29 11:34:23 +02:00
Viktor Lofgren
30bf845c81 (index) Speed up minDist calculations by excluding large lists 2024-08-26 13:04:15 +02:00
Viktor Lofgren
67a98fb0b0 (coded-sequence) Handle weird legacy HTML that puts everything in a heading 2024-08-26 12:49:15 +02:00
Viktor Lofgren
f3182a9264 (coded-sequence) Evaluate new minDist implementation 2024-08-26 12:02:37 +02:00
Viktor Lofgren
9c5f463775 (index) Optimize DocumentSpan.countIntersections 2024-08-25 13:59:11 +02:00
Viktor Lofgren
efd56efc63 (index) Optimize SequenceOperations.minDistance 2024-08-25 13:28:06 +02:00
Viktor Lofgren
d94373f4b1 (index) Optimize calculatePositionsMask 2024-08-25 13:24:37 +02:00
Viktor Lofgren
a5585110a6 (index) Optimize SequenceOperations 2024-08-25 13:16:31 +02:00
Viktor Lofgren
965c89798e (index) Optimize DocumentSpan 2024-08-25 12:44:33 +02:00
Viktor Lofgren
24b805472a (index) Evaluate performance implication of decoding gcs early 2024-08-25 12:23:09 +02:00
Viktor Lofgren
6ce029b317 (index) Remove vestigial parameter 2024-08-25 12:14:12 +02:00
Viktor Lofgren
63e5b0ab18 (index) Correct weightedCounts calculations 2024-08-25 12:06:56 +02:00
Viktor Lofgren
aa2c960b74 (index) Optimize ranking calculations 2024-08-25 11:53:44 +02:00
Viktor Lofgren
9aa8f13731 (index) Remove tcfAvgDist ranking parameter
This is captured by tcfProximity already
2024-08-25 11:20:19 +02:00
Viktor Lofgren
65bee366dc (index) Try harmonic mean for avgMinDist 2024-08-25 11:11:52 +02:00
Viktor Lofgren
53700e6667 (index) Try harmonic mean for avgMinDist 2024-08-25 11:08:41 +02:00
Viktor Lofgren
7f498e10b7 (index) Adjust proximity score 2024-08-25 11:01:35 +02:00
Viktor Lofgren
6eb0f13411 (index) Adjust handling of full phrase matches to prioritize full query matches over large partial matches 2024-08-25 10:54:04 +02:00
Viktor Lofgren
773377fe84 (index) Correct handling of full phrase match group 2024-08-25 10:48:34 +02:00
Viktor Lofgren
4372c8c835 (index) Give ranking components more consistent names 2024-08-25 10:44:27 +02:00
Viktor Lofgren
099133bdbc (index) Fix verbatim match score after moving full phrase group to a separate entity 2024-08-25 10:43:35 +02:00
Viktor Lofgren
96bcf03ad5 (index) Address broken tests
They are still broken, but less so.
2024-08-25 10:34:36 +02:00
Viktor Lofgren
0999f07320 (search-query) Add new ranking parameters for proximity and verbatim matches 2024-08-25 10:34:12 +02:00
Viktor Lofgren
9eb1f120fc (index) Repair positions bitmask for search result presentation 2024-08-22 11:28:23 +02:00
Viktor Lofgren
03d5dec24c (*) Refactor termCoherences and rename them to phrase constraints. 2024-08-15 11:02:19 +02:00
Viktor Lofgren
a18edad04c (index) Remove stopword list from converter
We want to index all words in the document, stopword handling is moved to the index where we change the semantics to elide inclusion checks in query construction for a very short list of words tentatively hard-coded in SearchTerms.
2024-08-15 09:36:50 +02:00