Viktor Lofgren
7f498e10b7
(index) Adjust proximity score
2024-08-25 11:01:35 +02:00
Viktor Lofgren
6eb0f13411
(index) Adjust handling of full phrase matches to prioritize full query matches over large partial matches
2024-08-25 10:54:04 +02:00
Viktor Lofgren
773377fe84
(index) Correct handling of full phrase match group
2024-08-25 10:48:34 +02:00
Viktor Lofgren
4372c8c835
(index) Give ranking components more consistent names
2024-08-25 10:44:27 +02:00
Viktor Lofgren
099133bdbc
(index) Fix verbatim match score after moving full phrase group to a separate entity
2024-08-25 10:43:35 +02:00
Viktor Lofgren
96bcf03ad5
(index) Address broken tests
...
They are still broken, but less so.
2024-08-25 10:34:36 +02:00
Viktor Lofgren
0999f07320
(search-query) Add new ranking parameters for proximity and verbatim matches
2024-08-25 10:34:12 +02:00
Viktor Lofgren
9eb1f120fc
(index) Repair positions bitmask for search result presentation
2024-08-22 11:28:23 +02:00
Viktor Lofgren
03d5dec24c
(*) Refactor termCoherences and rename them to phrase constraints.
2024-08-15 11:02:19 +02:00
Viktor Lofgren
a18edad04c
(index) Remove stopword list from converter
...
We want to index all words in the document, stopword handling is moved to the index where we change the semantics to elide inclusion checks in query construction for a very short list of words tentatively hard-coded in SearchTerms.
2024-08-15 09:36:50 +02:00
Viktor Lofgren
92522e8d97
(index) Attenuate bm25 score based on query length
2024-08-15 08:41:38 +02:00
Viktor Lofgren
049d94ce31
(index) Add body position match to qdebug fields
2024-08-15 08:39:37 +02:00
Viktor Lofgren
dbc6a95276
(index) Consume the new 'body' span in index to make it used in ranking
2024-08-15 08:33:43 +02:00
Viktor Lofgren
e6c8a6febe
(index) Add index-side deduplication in selectBestResults
2024-08-10 10:51:59 +02:00
Viktor Lofgren
4ece5f847b
(index) Add more qdebug factors
2024-08-10 10:45:30 +02:00
Viktor Lofgren
e4f04af044
(index) Give BODY matches a verbatim match value
2024-08-10 10:22:19 +02:00
Viktor Lofgren
b730b17f52
(index) Correct handling of firstPosition to avoid d/z
2024-08-10 10:21:59 +02:00
Viktor Lofgren
98c40958ab
(index) Simplify verbatim match calculation
2024-08-10 09:54:56 +02:00
Viktor Lofgren
41b52f5bcd
(index) Simplify verbatim match calculation
2024-08-10 09:51:03 +02:00
Viktor Lofgren
016a4c62e1
(index) Bugs and error fixes, chasing and fixing mystery results that did not contain all relevant keywords
2024-08-10 09:51:03 +02:00
Viktor Lofgren
df89661ed2
(index) In SearchResultItem, populate combinedId with combinedId and not its ranking-removed documentId cousin
2024-08-09 16:32:32 +02:00
Viktor Lofgren
2e89b55593
(wip) Repair qdebug utility and show new ranking details
2024-08-09 12:57:25 +02:00
Viktor Lofgren
7babdb87d5
(index) Remove intermediate models
2024-08-07 10:10:44 +02:00
Viktor Lofgren
df6a05b9a7
(index) Avoid hypothetical divide-by-zero in tcfAvgDist
2024-08-06 10:55:57 +02:00
Viktor Lofgren
8569bb8e11
(index) Avoid divide-by-zero when minDist returns 0
2024-08-06 10:34:05 +02:00
Viktor Lofgren
ca6e2db2b9
(index) Include external link texts in verbatim score
2024-08-06 10:23:23 +02:00
Viktor Lofgren
ee49c01d86
(index) Tune ranking for verbatim matches in the title, rewarding shorter titles
2024-08-03 14:47:23 +02:00
Viktor Lofgren
b21f8538a8
(index) Tune ranking for verbatim matches in the title, rewarding shorter titles
2024-08-03 14:41:38 +02:00
Viktor Lofgren
dd15676d33
(index) Tune ranking for verbatim matches in the title, rewarding shorter titles
2024-08-03 14:18:04 +02:00
Viktor Lofgren
ec5a17ad13
(index) Tune ranking for verbatim matches in the title, rewarding shorter titles
2024-08-03 14:07:02 +02:00
Viktor Lofgren
8462e88b8f
(index) Add min-dist factor and adjust rankings
2024-08-03 13:07:00 +02:00
Viktor Lofgren
bf26ead010
(index) Remove hasPrioTerm check as we should sort this out in ranking
2024-08-03 13:06:50 +02:00
Viktor Lofgren
c2cedfa83c
(index) Experimental ranking signals
2024-08-03 10:33:41 +02:00
Viktor Lofgren
c6c8b059bf
(index) Return some variant of the previously removed 'Bm25PrioGraphVisitor'
2024-08-03 10:10:12 +02:00
Viktor Lofgren
d8a99784e5
(index) Adding a few experimental relevance signals
2024-08-02 20:26:07 +02:00
Viktor Lofgren
e2107901ec
(index) Add span information for anchor tags, tweak ranking params
2024-08-01 11:46:30 +02:00
Viktor Lofgren
15745b692e
(index) Coherences need to be able to deal with null values among positions
2024-07-31 22:00:14 +02:00
Viktor Lofgren
b316b55be9
(index) Experimental initial integration of document spans into index
2024-07-30 12:01:53 +02:00
Viktor Lofgren
aebb2652e8
(wip) Extract and encode spans data
...
Refactoring keyword extraction to extract spans information.
Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions.
This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work.
2024-07-27 11:44:13 +02:00
Viktor Lofgren
0b31c4cfbb
(coded-sequence) Replace GCS usage with an interface
2024-07-16 14:37:50 +02:00
Viktor Lofgren
dfd19b5eb9
(index) Reduce the number of abstractions around result ranking
...
The change also restructures the internal API a bit, moving resultsFromDomain from RpcRawResultItem into RpcDecoratedResultItem, as the previous order was driving complexity in the code that generates these objects, and the consumer side of things puts all this data in the same object regardless.
2024-07-16 08:18:54 +02:00
Viktor Lofgren
ad3857938d
(search-api, ranking) Update with new ranking parameters
...
Adding new ranking parameters to the API and routing them through the system, in order to permit integration of the new position data with the ranking algorithm.
The change also cleans out several parameters that no longer filled any function.
2024-07-15 04:49:40 +02:00
Viktor Lofgren
fa36689597
(index-reverse) Simplify priority index
...
* Do not emit a documents file
* Do not interlace metadata or offsets with doc ids
2024-07-06 18:04:08 +02:00
Viktor Lofgren
0e4dd3d76d
(minor) Remove accidentally committed debug printf
2024-06-27 13:40:53 +02:00
Viktor Lofgren
95b9af92a0
(index) Implement working optional TermCoherences
2024-06-26 12:22:06 +02:00
Viktor Lofgren
8ee64c0771
(index) Correct TermCoherence requirements
2024-06-25 22:18:10 +02:00
Viktor Lofgren
dae22ccbe0
(test) Integration test from crawl->query
2024-06-25 22:17:26 +02:00
Viktor Lofgren
9d00243d7f
(index) Partial re-implementation of position constraints
2024-06-24 15:55:54 +02:00
Viktor Lofgren
36160988e2
(index) Integrate positions data with indexes WIP
...
This change integrates the new positions data with the forward and reverse indexes.
The ranking code is still only partially re-written.
2024-06-10 15:09:06 +02:00
Viktor Lofgren
4fcd4a8197
(index) Refactor to reduce the level of indirection
2024-05-19 12:40:33 +02:00