Commit Graph

1133 Commits

Author SHA1 Message Date
Viktor Lofgren
ed73d79ec1 (qs) Clean up parsing code using new record matching 2024-04-11 17:36:08 +02:00
Viktor Lofgren
c538c25008 (term-freq-exporter) Reduce thread count and memory usage 2024-04-10 17:11:23 +02:00
Viktor Lofgren
4b47fadbab (term-freq-exporter) Extract ngrams in term-frequency-exporter 2024-04-10 16:58:05 +02:00
Viktor Lofgren
fcdc843c15 (search) Fix outdated assumptions about the results
We no longer break the query into "sets" of search terms and need to adapt the code to not use this assumption.

For the API service, we'll simulate the old behavior to keep the API stable.

For the search service, we'll introduce a new way of calculating positions through tree aggregation.
2024-04-07 12:09:44 +02:00
Viktor Lofgren
dbdcf459a7 (minor) Remove dead code 2024-04-06 16:27:16 +02:00
Viktor Lofgren
ef25d60666 (index) Add origin trace information for index readers
This used to be supported by the system but got lost in refactoring at some point.
2024-04-06 13:28:14 +02:00
Viktor Lofgren
7f7021ce64 (sentence-extractor) Fix resource leak in sentence extractor
The code would always re-initialize the static ngramLexicon and rdrposTagger fields with new instances even if they were already instantiated, leading to a ton of unnecessary RAM allocation.

The modified behavior checks for nullity before creating a new instance.
2024-04-05 18:52:58 +02:00
Joshua Holland
617e633d7a Update keywords docs use of explore to browse
I can't tell when this happened, but the proper keyword now seems to be browse and not explore.
2024-04-05 15:15:49 +02:00
Viktor Lofgren
ae7c760772 (index) Clean up new index query code 2024-04-05 13:30:49 +02:00
Viktor Lofgren
81815f3e0a (qs, index) New query model integrated with index service.
Seems to work, tests are green and initial testing finds no errors.  Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.
2024-04-04 20:17:58 +02:00
Viktor Lofgren
87bb93e1d4 (qs, WIP) Fix edge cases in query compilation
This addresses the relatively common case where the graph consists of two segments, such as x y, z w; in this case we want an output like (x_y) (z w | z_w) | x y (z_w).  The generated output does somewhat pessimize a few other cases, but this one is arguably more important.
2024-03-29 12:40:27 +01:00
Viktor Lofgren
e596c929ac (qs, WIP) Clean up dead code 2024-03-28 16:37:23 +01:00
Viktor Lofgren
9852b0e609 (qs, WIP) Tidy it up a bit 2024-03-28 14:18:26 +01:00
Viktor Lofgren
51b0d6c0d3 (qs, WIP) Tidy it up a bit 2024-03-28 14:09:17 +01:00
Viktor Lofgren
15391c7a88 (qs, WIP) Tidy it up a bit 2024-03-28 13:54:30 +01:00
Viktor Lofgren
fe62593286 (qs, WIP) Break up code and tidy it up a bit 2024-03-28 13:26:54 +01:00
Viktor Lofgren
4cc11e183c (qs, WIP) Fix output determinism, fix tests 2024-03-28 13:11:26 +01:00
Viktor Lofgren
f82ebd7716 (WIP) Query rendering finally beginning to look like it works 2024-03-28 13:01:21 +01:00
Viktor Lofgren
bd0704d5a4 (*) Fix JDK22 migration issues
A few bizarre build errors cropped up when migrating to JDK22.  Not at all sure what caused them, but they were easy to mitigate.
2024-03-21 14:33:27 +01:00
Viktor Lofgren
002afca1c5 (sys) Upgrade to JDK22
This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.
2024-03-21 14:33:27 +01:00
Viktor Lofgren
a4b810f511 WIP 2024-03-21 14:33:26 +01:00
Viktor Lofgren
0bd3365c24 (convert) Initial integration of segmentation data into the converter's keyword extraction logic 2024-03-19 14:28:42 +01:00
Viktor Lofgren
d8f4e7d72b (qs) Retire NGramBloomFilter, integrate new segmentation model instead 2024-03-19 10:42:09 +01:00
Viktor Lofgren
afc047cd27 (control) GUI for exporting segmentation data from a wikipedia zim 2024-03-18 13:45:23 +01:00
Viktor Lofgren
00ef4f9803 (WIP) Partial integration of new query expansion code into the query-serivice 2024-03-18 13:16:49 +01:00
Viktor Lofgren
07e4d7ec6d (WIP) Improve data extraction from wikipedia data 2024-03-18 13:16:00 +01:00
Viktor Lofgren
8ae1f08095 (WIP) Implement first take of new query segmentation algorithm 2024-03-12 13:12:50 +01:00
Viktor Lofgren
57e6a12d08 (registry) Correct registerMonitor() behavior
The previous behavior would listen to too many changes, and based on zookeeper and not curator assumptions about behavior, add an additional monitor on each invocation of each monitor, (which always trigger on service state changes), leading to each monitor re-registering and effectively doubling monitors in numbers whenever a service stopped or started, which in turn meant a lot of bizarre thrashing behavior even on changes in services that don't explicitly talk to each other.

This re-registering behavior is no longer done.
2024-03-06 12:22:15 +01:00
Viktor Lofgren
46423612e3 (refac) Merge service-discovery and service modules
Also adds a few tests to the server/client code.
2024-03-03 10:49:23 +01:00
Viktor Lofgren
29bf473d74 (encyclopedia) Add URLencoding to path element
This prevents corruption of the links to the sideloaded encyclopedia data when the article path contains characters that are not valid in a URL.
2024-03-01 17:28:09 +01:00
Viktor Lofgren
9689f3faee (domain-info) Fix incorrect array indexing 2024-02-29 18:56:09 +01:00
Viktor Lofgren
93fa58c93d (domain-info) Fix incorrect array indexing
Using the id instead of idx when addressing the ranksArray caused exceptions.
2024-02-29 17:54:23 +01:00
Viktor Lofgren
186a98cc99 (doc) Fix wonky bullet lists 2024-02-28 17:43:05 +01:00
Viktor Lofgren
9993f265ca (doc) Remove irrelevant text 2024-02-28 17:40:05 +01:00
Viktor Lofgren
144f967dbf (misc) Tweak pool sizes 2024-02-28 16:23:02 +01:00
Viktor Lofgren
b31c9bb726 (docs) Update process docs 2024-02-28 15:21:33 +01:00
Viktor Lofgren
c0820b5e5c (docs) Update service docs 2024-02-28 15:19:31 +01:00
Viktor Lofgren
65b8a1d5d9 (grpc) Reduce error spam 2024-02-28 14:44:48 +01:00
Viktor Lofgren
a0648844fb (grpc) Reduce error spam 2024-02-28 14:35:29 +01:00
Viktor Lofgren
c4a27003c6 (docs) Fix formatting 2024-02-28 14:22:57 +01:00
Viktor Lofgren
41abd8982f (math) Clean up error handling 2024-02-28 14:19:50 +01:00
Viktor Lofgren
86bbc1043e (service) Clean up thread pool creation 2024-02-28 14:06:32 +01:00
Viktor Lofgren
9a045a0588 (index) Clean up index code 2024-02-28 13:09:47 +01:00
Viktor Lofgren
9415539b38 (docs) Update docs 2024-02-28 12:25:19 +01:00
Viktor Lofgren
84bab2783d (docs) Fix fake news in docs 2024-02-28 12:16:45 +01:00
Viktor Lofgren
d78e9e715f (misc) Fix broken tests 2024-02-28 12:12:43 +01:00
Viktor Lofgren
a8ec59eb75 (conf) Add migration warning when ZOOKEEPER_HOSTS is not set. 2024-02-28 12:09:38 +01:00
Viktor Lofgren
20fc0ef13c (gradle) Add task alias 'docker' for 'jibDockerBuild'
The change also moves the jib boilerplate to an include.
2024-02-28 11:59:15 +01:00
Viktor Lofgren
9f1649636e Clean up documentation and rename domain-links to link-graph 2024-02-28 11:40:39 +01:00
Viktor Lofgren
3a65fe8917 Add offload executor to GrpcChannelPoolFactory 2024-02-27 22:08:39 +01:00