Viktor Lofgren
569520c9b6
(index) Add manual adjustments for rankings based on domain
2025-01-21 15:07:43 +01:00
Viktor Lofgren
3772bfd387
(query) Fix handling of optional ranking parameters
2025-01-08 17:11:22 +01:00
Viktor Lofgren
a1fb92468f
(refac) Remove ResultRankingParameters, QueryLimits class and use protobuf classes directly instead
...
This is primarily to make the code a bit easier to reason about, and will reduce the level of indirection and data copying in the search-servi->query-service->index-service communication chain.
2025-01-08 16:15:57 +01:00
Viktor Lofgren
a84a06975c
(ranking-params) Add disable penalties flag to ranking params
...
This will help debugging ranking issues. Later it may be added to some filters.
2025-01-08 00:16:49 +01:00
Viktor Lofgren
dafaab3ef7
(index) Additional optimization pass
2024-12-12 18:57:33 +01:00
Viktor Lofgren
3f11ca409f
(index) Increase thread limit and optimize search result handling
...
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 17:07:06 +01:00
Viktor Lofgren
694eed79ef
(index) Increase thread limit and optimize search result handling
...
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 15:32:31 +01:00
Viktor Lofgren
4220169119
(index) Increase thread limit and optimize search result handling
...
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 15:31:11 +01:00
Viktor Lofgren
73861e613f
(ranking) Downtune score boost for unordered heading matces
2024-12-11 15:44:29 +01:00
Viktor Lofgren
cf7f84f033
(rank) Reduce the impact of domain rank bonus, and only apply it to cancel out negative penalties, never to increase the ranking
2024-12-10 22:04:12 +01:00
Viktor Lofgren
291ca8daf1
(converter/index) Improve atag sentence matching by taking into consideration how many times a sentence appears in the links
...
This change breaks the format of the atags.parquet file.
2024-12-08 00:27:11 +01:00
Viktor Lofgren
7b64377fd6
(ranking) Promote documents with multiple phrase matches with a log-scale bonus
2024-11-28 13:36:56 +01:00
Viktor Lofgren
ba47d72bf4
(ranking) Adjust scores for external link matches
2024-11-27 14:27:23 +01:00
Viktor Lofgren
077d8dcd11
(result-score) Adjust ranking parameters a tiny bit
2024-11-25 18:30:59 +01:00
Viktor Lofgren
7d1ef08a0f
(index) Correct ranking bonus for external linktext appearnces
2024-11-25 17:40:15 +01:00
Viktor Lofgren
0b6b5dab07
(index) Add score bonuses for single-word anchor tag spans
...
Enhanced scoring logic to add bonuses when the query matches single-word anchor (atag) spans exactly. Implemented this by adding conditions in `IndexResultScoreCalculator.java` and creating a new method `containsRangeExact` in `DocumentSpan.java` to check for exact span matches.
2024-11-25 14:44:41 +01:00
Viktor Lofgren
dc5f97e737
(index) Add bonus for single-word title matches when the title is also a single word
2024-11-25 13:24:12 +01:00
Viktor Lofgren
f09669a5b0
(index) Correct usage of DocumentSpan.length() instead of DocumentSpan.size()
...
The latter counts the number of spans, and is not what you want here.
2024-11-25 13:11:55 +01:00
Viktor Lofgren
9f47ce8d15
(chore) Remove lombok
...
There are likely some instances of delombok gore with this commit.
2024-11-11 21:14:38 +01:00
Viktor Lofgren
6460c11107
(index) Short-circuit rankResults when there are no results
2024-10-14 13:47:35 +02:00
Viktor Lofgren
1bd29a586c
(service-discovery) Add common base interface to all Grpc services
...
To be able to tell service discovery whether to enable a service on a particular runtime, a common base interface DiscoverableService extends BindableService was added.
2024-09-27 13:46:34 +02:00
Viktor Lofgren
3dec4b6b34
(index) Fix bug where tcfFirstPosition lit up because one term was in the title and the other was missing from the document
...
This was because firstPosition calculation was not invalidated when positions were missing.
2024-09-24 13:33:37 +02:00
Viktor Lofgren
f4eeef145e
(index) Reduce fetch size to improve timeout characteristics
2024-09-17 15:20:41 +02:00
Viktor Lofgren
87aa869338
(index) Correct positions mask to take into account offsets when overlapping
2024-09-17 14:40:37 +02:00
Viktor Lofgren
bb5d946c26
(index, EXPERIMENTAL) Clean up ranking code
2024-08-29 11:34:23 +02:00
Viktor Lofgren
30bf845c81
(index) Speed up minDist calculations by excluding large lists
2024-08-26 13:04:15 +02:00
Viktor Lofgren
67a98fb0b0
(coded-sequence) Handle weird legacy HTML that puts everything in a heading
2024-08-26 12:49:15 +02:00
Viktor Lofgren
f3182a9264
(coded-sequence) Evaluate new minDist implementation
2024-08-26 12:02:37 +02:00
Viktor Lofgren
9c5f463775
(index) Optimize DocumentSpan.countIntersections
2024-08-25 13:59:11 +02:00
Viktor Lofgren
efd56efc63
(index) Optimize SequenceOperations.minDistance
2024-08-25 13:28:06 +02:00
Viktor Lofgren
d94373f4b1
(index) Optimize calculatePositionsMask
2024-08-25 13:24:37 +02:00
Viktor Lofgren
a5585110a6
(index) Optimize SequenceOperations
2024-08-25 13:16:31 +02:00
Viktor Lofgren
965c89798e
(index) Optimize DocumentSpan
2024-08-25 12:44:33 +02:00
Viktor Lofgren
24b805472a
(index) Evaluate performance implication of decoding gcs early
2024-08-25 12:23:09 +02:00
Viktor Lofgren
6ce029b317
(index) Remove vestigial parameter
2024-08-25 12:14:12 +02:00
Viktor Lofgren
63e5b0ab18
(index) Correct weightedCounts calculations
2024-08-25 12:06:56 +02:00
Viktor Lofgren
aa2c960b74
(index) Optimize ranking calculations
2024-08-25 11:53:44 +02:00
Viktor Lofgren
9aa8f13731
(index) Remove tcfAvgDist ranking parameter
...
This is captured by tcfProximity already
2024-08-25 11:20:19 +02:00
Viktor Lofgren
65bee366dc
(index) Try harmonic mean for avgMinDist
2024-08-25 11:11:52 +02:00
Viktor Lofgren
53700e6667
(index) Try harmonic mean for avgMinDist
2024-08-25 11:08:41 +02:00
Viktor Lofgren
7f498e10b7
(index) Adjust proximity score
2024-08-25 11:01:35 +02:00
Viktor Lofgren
6eb0f13411
(index) Adjust handling of full phrase matches to prioritize full query matches over large partial matches
2024-08-25 10:54:04 +02:00
Viktor Lofgren
773377fe84
(index) Correct handling of full phrase match group
2024-08-25 10:48:34 +02:00
Viktor Lofgren
4372c8c835
(index) Give ranking components more consistent names
2024-08-25 10:44:27 +02:00
Viktor Lofgren
099133bdbc
(index) Fix verbatim match score after moving full phrase group to a separate entity
2024-08-25 10:43:35 +02:00
Viktor Lofgren
96bcf03ad5
(index) Address broken tests
...
They are still broken, but less so.
2024-08-25 10:34:36 +02:00
Viktor Lofgren
0999f07320
(search-query) Add new ranking parameters for proximity and verbatim matches
2024-08-25 10:34:12 +02:00
Viktor Lofgren
9eb1f120fc
(index) Repair positions bitmask for search result presentation
2024-08-22 11:28:23 +02:00
Viktor Lofgren
03d5dec24c
(*) Refactor termCoherences and rename them to phrase constraints.
2024-08-15 11:02:19 +02:00
Viktor Lofgren
a18edad04c
(index) Remove stopword list from converter
...
We want to index all words in the document, stopword handling is moved to the index where we change the semantics to elide inclusion checks in query construction for a very short list of words tentatively hard-coded in SearchTerms.
2024-08-15 09:36:50 +02:00