Viktor Lofgren
0a383a712d
(qdebug) Accurately display positions when intersecting with spans
2024-08-15 11:44:17 +02:00
Viktor Lofgren
b316b55be9
(index) Experimental initial integration of document spans into index
2024-07-30 12:01:53 +02:00
Viktor Lofgren
aebb2652e8
(wip) Extract and encode spans data
...
Refactoring keyword extraction to extract spans information.
Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions.
This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work.
2024-07-27 11:44:13 +02:00
Viktor Lofgren
d36055a2d0
(keyword-extractor) Retire TfIdfHigh WordFlag
...
This will bring the word flags count down to 8, and let us pack every value in a byte.
2024-07-17 13:54:39 +02:00
Viktor Lofgren
1ab875a75d
(test) Correcting flaky tests
...
Also changing the inappropriate usage of ReverseIndexPrioFileNames for the full index in test code.
2024-07-11 16:13:23 +02:00
Viktor Lofgren
fa36689597
(index-reverse) Simplify priority index
...
* Do not emit a documents file
* Do not interlace metadata or offsets with doc ids
2024-07-06 18:04:08 +02:00
Viktor Lofgren
85c99ae808
(index-reverse) Split index construction into separate packages for full and priority index
2024-07-06 15:44:47 +02:00
Viktor Lofgren
6ee4d1eb90
(keyword) Increase the work area for position encoding
...
The change also moves the allocation outside of the build()-method to allow re-use of this rather large temporary buffer.
2024-06-28 16:42:39 +02:00
Viktor Lofgren
935234939c
(test) Add query parsing to IntegrationTest
2024-06-27 13:15:20 +02:00
Viktor Lofgren
dae22ccbe0
(test) Integration test from crawl->query
2024-06-25 22:17:26 +02:00