Viktor Lofgren
2db0e446cb
(search) Absorb SearchQueryIndexService into SearchOperator, and clean up SearchOperator
2024-08-22 11:49:29 +02:00
Viktor Lofgren
557bdaa694
(search) Clean up SearchQueryIndexService and surrounding code
2024-08-22 11:45:28 +02:00
Viktor Lofgren
9eb1f120fc
(index) Repair positions bitmask for search result presentation
2024-08-22 11:28:23 +02:00
Viktor Lofgren
266d6e4bea
(slop) Replace SlopPageRef<T> with SlopTable.Ref<T>
2024-08-21 10:13:49 +02:00
Viktor Lofgren
e4c97a91d8
(*) Comment clarity
2024-08-21 10:12:00 +02:00
Viktor Lofgren
b0a874a842
(*) Upgrade slop library -> 0.0.5
2024-08-18 11:05:27 +02:00
Viktor Lofgren
bca40de107
(*) Upgrade slop library
2024-08-18 10:43:41 +02:00
Viktor Lofgren
93652e0937
(qdebug) Accurately display positions when intersecting with spans
2024-08-15 11:55:48 +02:00
Viktor Lofgren
0a383a712d
(qdebug) Accurately display positions when intersecting with spans
2024-08-15 11:44:17 +02:00
Viktor Lofgren
03d5dec24c
(*) Refactor termCoherences and rename them to phrase constraints.
2024-08-15 11:02:19 +02:00
Viktor Lofgren
b2a3cac351
(*) Remove broken imports
2024-08-15 11:01:34 +02:00
Viktor Lofgren
a18edad04c
(index) Remove stopword list from converter
...
We want to index all words in the document, stopword handling is moved to the index where we change the semantics to elide inclusion checks in query construction for a very short list of words tentatively hard-coded in SearchTerms.
2024-08-15 09:36:50 +02:00
Viktor Lofgren
92522e8d97
(index) Attenuate bm25 score based on query length
2024-08-15 08:41:38 +02:00
Viktor Lofgren
049d94ce31
(index) Add body position match to qdebug fields
2024-08-15 08:39:37 +02:00
Viktor Lofgren
dbc6a95276
(index) Consume the new 'body' span in index to make it used in ranking
2024-08-15 08:33:43 +02:00
Viktor Lofgren
75b0888032
(slop) Migrate to latest Slop version
2024-08-14 11:44:35 +02:00
Viktor Lofgren
2ad93ad41a
(*) Clean up
2024-08-14 11:43:45 +02:00
Viktor Lofgren
623ee5570f
(slop) Break slop out into its own repository
2024-08-13 09:50:05 +02:00
Viktor Lofgren
fd2bad39f3
(keyword-extraction) Add body field for terms that are not otherwise part of a field
2024-08-13 09:49:26 +02:00
Viktor Lofgren
e6c8a6febe
(index) Add index-side deduplication in selectBestResults
2024-08-10 10:51:59 +02:00
Viktor Lofgren
4ece5f847b
(index) Add more qdebug factors
2024-08-10 10:45:30 +02:00
Viktor Lofgren
e4f04af044
(index) Give BODY matches a verbatim match value
2024-08-10 10:22:19 +02:00
Viktor Lofgren
b730b17f52
(index) Correct handling of firstPosition to avoid d/z
2024-08-10 10:21:59 +02:00
Viktor Lofgren
98c40958ab
(index) Simplify verbatim match calculation
2024-08-10 09:54:56 +02:00
Viktor Lofgren
41b52f5bcd
(index) Simplify verbatim match calculation
2024-08-10 09:51:03 +02:00
Viktor Lofgren
4264fb9f49
(query-service) Clean up qdebug UI a bit
2024-08-10 09:51:03 +02:00
Viktor Lofgren
016a4c62e1
(index) Bugs and error fixes, chasing and fixing mystery results that did not contain all relevant keywords
2024-08-10 09:51:03 +02:00
Viktor Lofgren
df89661ed2
(index) In SearchResultItem, populate combinedId with combinedId and not its ranking-removed documentId cousin
2024-08-09 16:32:32 +02:00
Viktor Lofgren
41da4f422d
(search-query) Always generate the "all"-segmentation
2024-08-09 13:20:00 +02:00
Viktor Lofgren
2e89b55593
(wip) Repair qdebug utility and show new ranking details
2024-08-09 12:57:25 +02:00
Viktor Lofgren
7babdb87d5
(index) Remove intermediate models
2024-08-07 10:10:44 +02:00
Viktor Lofgren
680ad19c7d
(keyword-extraction) Correct behavior when loading spans so that they are not double-loaded causing errors
2024-08-06 11:16:56 +02:00
Viktor Lofgren
f01267bc6b
(index) Don't load fwd index offsets into a hash table at start.
...
This makes the service take forever to start up. Memory map the data instead and binary search. This is a bit slower, but not by much.
2024-08-06 11:16:28 +02:00
Viktor Lofgren
df6a05b9a7
(index) Avoid hypothetical divide-by-zero in tcfAvgDist
2024-08-06 10:55:57 +02:00
Viktor Lofgren
8569bb8e11
(index) Avoid divide-by-zero when minDist returns 0
2024-08-06 10:34:05 +02:00
Viktor Lofgren
ca6e2db2b9
(index) Include external link texts in verbatim score
2024-08-06 10:23:23 +02:00
Viktor Lofgren
2080e31616
(converter) Store link text positions
...
To help offer verbatim matches for external link texts, we assign these positions in the document a bit after the actual document ends. Integrating this information with the ranking is not performed here.
2024-08-04 12:00:29 +02:00
Viktor Lofgren
c379be846c
(slop) Update readme
2024-08-04 10:58:23 +02:00
Viktor Lofgren
9bc665628b
(slop) VarintLE implementation, correct enum8 column
2024-08-04 10:57:52 +02:00
Viktor Lofgren
ee49c01d86
(index) Tune ranking for verbatim matches in the title, rewarding shorter titles
2024-08-03 14:47:23 +02:00
Viktor Lofgren
b21f8538a8
(index) Tune ranking for verbatim matches in the title, rewarding shorter titles
2024-08-03 14:41:38 +02:00
Viktor Lofgren
dd15676d33
(index) Tune ranking for verbatim matches in the title, rewarding shorter titles
2024-08-03 14:18:04 +02:00
Viktor Lofgren
ec5a17ad13
(index) Tune ranking for verbatim matches in the title, rewarding shorter titles
2024-08-03 14:07:02 +02:00
Viktor Lofgren
e48f52faba
(experiment) Add add-hoc filter runner
2024-08-03 13:24:03 +02:00
Viktor Lofgren
8462e88b8f
(index) Add min-dist factor and adjust rankings
2024-08-03 13:07:00 +02:00
Viktor Lofgren
bf26ead010
(index) Remove hasPrioTerm check as we should sort this out in ranking
2024-08-03 13:06:50 +02:00
Viktor Lofgren
c2cedfa83c
(index) Experimental ranking signals
2024-08-03 10:33:41 +02:00
Viktor Lofgren
eba2844361
(index) Experimental ranking signals
2024-08-03 10:32:46 +02:00
Viktor Lofgren
c6c8b059bf
(index) Return some variant of the previously removed 'Bm25PrioGraphVisitor'
2024-08-03 10:10:12 +02:00
Viktor Lofgren
d8a99784e5
(index) Adding a few experimental relevance signals
2024-08-02 20:26:07 +02:00