Viktor Lofgren
baeb4a46cd
(search) Reintroduce query rewriting for recipes, add rules for wikis and forums
2024-12-31 16:05:00 +01:00
Viktor Lofgren
a5b4951f23
(chore) Remove use of deprecated STR.-style string templates
2024-11-11 18:02:28 +01:00
Viktor Lofgren
89f7f3c17c
(query-parser) Fix regression where advice terms weren't parsed properly
2024-10-14 13:46:37 +02:00
Viktor Lofgren
1bd29a586c
(service-discovery) Add common base interface to all Grpc services
...
To be able to tell service discovery whether to enable a service on a particular runtime, a common base interface DiscoverableService extends BindableService was added.
2024-09-27 13:46:34 +02:00
Viktor Lofgren
4a0356e26f
(search-service) Add pagination support to the search GUI
2024-09-25 14:26:49 +02:00
Viktor Lofgren
73f973cc06
(search-query) Add pagination to search query API and the direct query-service interface
2024-09-25 14:20:59 +02:00
Viktor Lofgren
50ba8fd099
(query-parsing) Correct handling of trailing parentheses
2024-09-03 11:45:14 +02:00
Viktor Lofgren
99b3b00b68
(query-parsing) Merge QueryTokenizer into QueryParser and add escaping of query grammar
2024-09-03 11:35:32 +02:00
Viktor Lofgren
f6d981761d
(query-parsing) Drop search term elements that aren't indexed by the search engine
2024-09-03 11:24:05 +02:00
Viktor Lofgren
8290c19e24
(query-parsing) Drop search term elements that aren't indexed by the search engine
2024-09-03 11:21:01 +02:00
Viktor Lofgren
03d5dec24c
(*) Refactor termCoherences and rename them to phrase constraints.
2024-08-15 11:02:19 +02:00
Viktor Lofgren
41da4f422d
(search-query) Always generate the "all"-segmentation
2024-08-09 13:20:00 +02:00
Viktor Lofgren
22b35d5d91
(sentence-extractor) Add tag information to document language data
...
Decorates DocumentSentences with information about which HTML tags they are nested in, and removes some redundant data on this rather memory hungry object. Separator information is encoded as a bit set instead of an array of integers.
The change also cleans up the SentenceExtractor class a fair bit. It no longer extracts ngrams, and a significant amount of redundant operations were removed as well. This is still a pretty unpleasant class to work in, but this is the first step in making it a little bit better.
2024-07-18 15:57:48 +02:00
Viktor Lofgren
87e38e6181
(search-query) refac: Move query factory
2024-06-27 13:14:47 +02:00
Viktor Lofgren
f73fc8dd57
(search-query) Fix end-inclusion bug in QWordGraphIterator
2024-06-27 13:13:42 +02:00
Viktor Lofgren
3faa5bf521
(search-query) Tidy up QueryGRPCService and IndexClient
2024-06-26 14:03:30 +02:00
Viktor Lofgren
6973712480
(query) Tidy up code
2024-06-26 13:40:06 +02:00
Viktor Lofgren
95b9af92a0
(index) Implement working optional TermCoherences
2024-06-26 12:22:06 +02:00
Viktor Lofgren
9d00243d7f
(index) Partial re-implementation of position constraints
2024-06-24 15:55:54 +02:00
Viktor Lofgren
6985ab762a
(query) Improve handling of stopwords in queries
2024-05-23 20:50:55 +02:00
Viktor Lofgren
0b60411e5f
(query) Bugfix stopword issue
...
Add a new rule that crates an alternative path that omits a word if it's a stopword.
In queries where a stopword is present, and no query ngram expansion is possible, the query should not require the stopword to be present in the index, as this results in no search results being found.
2024-05-23 20:15:14 +02:00
Viktor Lofgren
2cc74c005a
(query) Always generate an ngram alternative, suppresses generation of multiple identical query branches
2024-04-19 19:42:30 +02:00
Viktor Lofgren
e92c25f7e0
(ranking) Cleanup
2024-04-19 14:13:12 +02:00
Viktor Lofgren
def607d840
(qs) Additional info in query debug UI
2024-04-19 11:46:27 +02:00
Viktor Lofgren
2b811fb422
(qs) Basic query debug feature
2024-04-19 11:00:56 +02:00
Viktor Lofgren
8bbaf457de
(query) Minor code cleanup
2024-04-18 10:37:51 +02:00
Viktor Lofgren
7641a02f31
(query) Update ranking parameters with new variables for bm25 ngrams and tcf mutual jaccard
...
The change also makes it so that as long as the values are defaults, they don't need to be sent over the wire and decoded.
2024-04-18 10:36:15 +02:00
Viktor Lofgren
ce16239e34
(query) Modify tokenizer to match the behavior of the sentence extractor
...
This must match, otherwise a query like "plato's republic" won't match the indexed keywords, since they would strip the possessive.
2024-04-17 17:54:32 +02:00
Viktor Lofgren
579295a673
(search) Add implicit coherence constraints based on segmentation
2024-04-17 14:03:35 +02:00
Viktor Lofgren
fda1c05164
(ngram) Correct |s|^|s|-normalization to use length and not count
2024-04-13 18:05:30 +02:00
Viktor Lofgren
d729c400e5
(query, minor) Remove debug statement
2024-04-12 17:52:55 +02:00
Viktor Lofgren
ad4810d991
(query, minor) Remove debug statement
2024-04-12 17:45:26 +02:00
Viktor Lofgren
864d6c28e7
(segmentation) Pick best segmentation using |s|^|s|-style normalization
...
This is better than doing all segmentations possible at the same time.
2024-04-12 17:44:14 +02:00
Viktor Lofgren
b7d9a7ae89
(ngrams) Remove the vestigial logic for capturing permutations of n-grams
...
The change also reduces the object churn in NGramLexicon, as this is a very hot method in the converter.
2024-04-11 18:12:01 +02:00
Viktor Lofgren
ed73d79ec1
(qs) Clean up parsing code using new record matching
2024-04-11 17:36:08 +02:00
Viktor Lofgren
81815f3e0a
(qs, index) New query model integrated with index service.
...
Seems to work, tests are green and initial testing finds no errors. Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.
2024-04-04 20:17:58 +02:00
Viktor Lofgren
87bb93e1d4
(qs, WIP) Fix edge cases in query compilation
...
This addresses the relatively common case where the graph consists of two segments, such as x y, z w; in this case we want an output like (x_y) (z w | z_w) | x y (z_w). The generated output does somewhat pessimize a few other cases, but this one is arguably more important.
2024-03-29 12:40:27 +01:00
Viktor Lofgren
e596c929ac
(qs, WIP) Clean up dead code
2024-03-28 16:37:23 +01:00
Viktor Lofgren
9852b0e609
(qs, WIP) Tidy it up a bit
2024-03-28 14:18:26 +01:00
Viktor Lofgren
51b0d6c0d3
(qs, WIP) Tidy it up a bit
2024-03-28 14:09:17 +01:00
Viktor Lofgren
15391c7a88
(qs, WIP) Tidy it up a bit
2024-03-28 13:54:30 +01:00
Viktor Lofgren
fe62593286
(qs, WIP) Break up code and tidy it up a bit
2024-03-28 13:26:54 +01:00
Viktor Lofgren
4cc11e183c
(qs, WIP) Fix output determinism, fix tests
2024-03-28 13:11:26 +01:00
Viktor Lofgren
f82ebd7716
(WIP) Query rendering finally beginning to look like it works
2024-03-28 13:01:21 +01:00
Viktor Lofgren
a4b810f511
WIP
2024-03-21 14:33:26 +01:00
Viktor Lofgren
0bd3365c24
(convert) Initial integration of segmentation data into the converter's keyword extraction logic
2024-03-19 14:28:42 +01:00
Viktor Lofgren
d8f4e7d72b
(qs) Retire NGramBloomFilter, integrate new segmentation model instead
2024-03-19 10:42:09 +01:00
Viktor Lofgren
00ef4f9803
(WIP) Partial integration of new query expansion code into the query-serivice
2024-03-18 13:16:49 +01:00
Viktor Lofgren
07e4d7ec6d
(WIP) Improve data extraction from wikipedia data
2024-03-18 13:16:00 +01:00
Viktor Lofgren
8ae1f08095
(WIP) Implement first take of new query segmentation algorithm
2024-03-12 13:12:50 +01:00