Commit Graph

37 Commits

Author SHA1 Message Date
Viktor Lofgren
c2dd2175a2 (search) Add new query expansion rule contracting WORD NUM pairs into WORD-NUM and WORDNUM 2025-02-01 13:13:30 +01:00
Viktor Lofgren
a1fb92468f (refac) Remove ResultRankingParameters, QueryLimits class and use protobuf classes directly instead
This is primarily to make the code a bit easier to reason about, and will reduce the level of indirection and data copying in the search-servi->query-service->index-service communication chain.
2025-01-08 16:15:57 +01:00
Viktor Lofgren
7c90b6b414 (query) Don't blindly make tokens containing a colon into a non-ranking advice term 2025-01-07 15:18:05 +01:00
Viktor Lofgren
3e66767af3 (search) Adjust query parsing to trim tokens in quoted search terms
Quoted search queries that contained keywords with possessive 's endings were not returning any results, as the index does not retain that suffix, and the query parser was not stripping it away in this code path.

This solves issue #143.
2025-01-05 23:33:09 +01:00
Viktor Lofgren
5f576b7d0c (query-parser) Strip leading underlines
This addresses issue #140, where __builtin_ffs gives no results.
2025-01-02 14:39:03 +01:00
Viktor Lofgren
baeb4a46cd (search) Reintroduce query rewriting for recipes, add rules for wikis and forums 2024-12-31 16:05:00 +01:00
Viktor Lofgren
9eb16cb667 (test) Remove tests from fast suite
Adding a new @Tag("flaky") for tests that do not reliably return successes.  These may still be valuable during development, but should not run in CI.

Also tagging a few of the slower tests with the old @Tag("slow"), to speed up the run-time.
2024-11-17 19:45:59 +01:00
Viktor Lofgren
89f7f3c17c (query-parser) Fix regression where advice terms weren't parsed properly 2024-10-14 13:46:37 +02:00
Viktor Lofgren
c757d116bf (misc) Fix Broken Tests 2024-09-27 13:46:34 +02:00
Viktor Lofgren
99523ca079 (query-parser) Remove test that is no longer relevant 2024-09-10 10:35:56 +02:00
Viktor Lofgren
50ba8fd099 (query-parsing) Correct handling of trailing parentheses 2024-09-03 11:45:14 +02:00
Viktor Lofgren
03d5dec24c (*) Refactor termCoherences and rename them to phrase constraints. 2024-08-15 11:02:19 +02:00
Viktor Lofgren
1ab875a75d (test) Correcting flaky tests
Also changing the inappropriate usage of ReverseIndexPrioFileNames for the full index in test code.
2024-07-11 16:13:23 +02:00
Viktor Lofgren
87e38e6181 (search-query) refac: Move query factory 2024-06-27 13:14:47 +02:00
Viktor Lofgren
6973712480 (query) Tidy up code 2024-06-26 13:40:06 +02:00
Viktor Lofgren
95b9af92a0 (index) Implement working optional TermCoherences 2024-06-26 12:22:06 +02:00
Viktor Lofgren
dae22ccbe0 (test) Integration test from crawl->query 2024-06-25 22:17:26 +02:00
Viktor Lofgren
a69ab311c7 (qword) Fix tests that broke due to stopword removal 2024-05-28 14:15:45 +02:00
Viktor Lofgren
6985ab762a (query) Improve handling of stopwords in queries 2024-05-23 20:50:55 +02:00
Viktor Lofgren
0b60411e5f (query) Bugfix stopword issue
Add a new rule that crates an alternative path that omits a word if it's a stopword.

In queries where a stopword is present, and no query ngram expansion is possible, the query should not require the stopword to be present in the index, as this results in no search results being found.
2024-05-23 20:15:14 +02:00
Viktor Lofgren
0a73b02a00 (query) Mark flaky test, correct assert on test 2024-04-21 12:30:14 +02:00
Viktor Lofgren
2b811fb422 (qs) Basic query debug feature 2024-04-19 11:00:56 +02:00
Viktor Lofgren
8bbaf457de (query) Minor code cleanup 2024-04-18 10:37:51 +02:00
Viktor Lofgren
579295a673 (search) Add implicit coherence constraints based on segmentation 2024-04-17 14:03:35 +02:00
Viktor Lofgren
864d6c28e7 (segmentation) Pick best segmentation using |s|^|s|-style normalization
This is better than doing all segmentations possible at the same time.
2024-04-12 17:44:14 +02:00
Viktor Lofgren
81815f3e0a (qs, index) New query model integrated with index service.
Seems to work, tests are green and initial testing finds no errors.  Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.
2024-04-04 20:17:58 +02:00
Viktor Lofgren
87bb93e1d4 (qs, WIP) Fix edge cases in query compilation
This addresses the relatively common case where the graph consists of two segments, such as x y, z w; in this case we want an output like (x_y) (z w | z_w) | x y (z_w).  The generated output does somewhat pessimize a few other cases, but this one is arguably more important.
2024-03-29 12:40:27 +01:00
Viktor Lofgren
e596c929ac (qs, WIP) Clean up dead code 2024-03-28 16:37:23 +01:00
Viktor Lofgren
4cc11e183c (qs, WIP) Fix output determinism, fix tests 2024-03-28 13:11:26 +01:00
Viktor Lofgren
f82ebd7716 (WIP) Query rendering finally beginning to look like it works 2024-03-28 13:01:21 +01:00
Viktor Lofgren
a4b810f511 WIP 2024-03-21 14:33:26 +01:00
Viktor Lofgren
0bd3365c24 (convert) Initial integration of segmentation data into the converter's keyword extraction logic 2024-03-19 14:28:42 +01:00
Viktor Lofgren
d8f4e7d72b (qs) Retire NGramBloomFilter, integrate new segmentation model instead 2024-03-19 10:42:09 +01:00
Viktor Lofgren
00ef4f9803 (WIP) Partial integration of new query expansion code into the query-serivice 2024-03-18 13:16:49 +01:00
Viktor Lofgren
8ae1f08095 (WIP) Implement first take of new query segmentation algorithm 2024-03-12 13:12:50 +01:00
Viktor Lofgren
427f3e922f (index) Retire count operation, clean up index code. 2024-02-27 21:22:17 +01:00
Viktor Lofgren
1d34224416 (refac) Remove src/main from all source code paths.
Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one.

While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules.  Which you'll do a lot, because it's *modul*ar.  The src/main/java convention makes a lot of sense for a non-modular project though.  This ain't that.
2024-02-23 16:13:40 +01:00