Viktor Lofgren
b7d9a7ae89
(ngrams) Remove the vestigial logic for capturing permutations of n-grams
...
The change also reduces the object churn in NGramLexicon, as this is a very hot method in the converter.
2024-04-11 18:12:01 +02:00
Viktor Lofgren
ed73d79ec1
(qs) Clean up parsing code using new record matching
2024-04-11 17:36:08 +02:00
Viktor Lofgren
c538c25008
(term-freq-exporter) Reduce thread count and memory usage
2024-04-10 17:11:23 +02:00
Viktor Lofgren
4b47fadbab
(term-freq-exporter) Extract ngrams in term-frequency-exporter
2024-04-10 16:58:05 +02:00
Viktor Lofgren
fcdc843c15
(search) Fix outdated assumptions about the results
...
We no longer break the query into "sets" of search terms and need to adapt the code to not use this assumption.
For the API service, we'll simulate the old behavior to keep the API stable.
For the search service, we'll introduce a new way of calculating positions through tree aggregation.
2024-04-07 12:09:44 +02:00
Viktor Lofgren
dbdcf459a7
(minor) Remove dead code
2024-04-06 16:27:16 +02:00
Viktor Lofgren
ef25d60666
(index) Add origin trace information for index readers
...
This used to be supported by the system but got lost in refactoring at some point.
2024-04-06 13:28:14 +02:00
Viktor Lofgren
7f7021ce64
(sentence-extractor) Fix resource leak in sentence extractor
...
The code would always re-initialize the static ngramLexicon and rdrposTagger fields with new instances even if they were already instantiated, leading to a ton of unnecessary RAM allocation.
The modified behavior checks for nullity before creating a new instance.
2024-04-05 18:52:58 +02:00
Viktor Lofgren
5766da69ec
(gradle) Upgrade to Gradle 8.7
...
This will reduce the hassle of juggling JDK versions for JDK 22, which was not supported by Gradle 8.5.
2024-04-05 15:15:49 +02:00
Joshua Holland
617e633d7a
Update keywords docs use of explore to browse
...
I can't tell when this happened, but the proper keyword now seems to be browse and not explore.
2024-04-05 15:15:49 +02:00
Viktor Lofgren
b770a1143f
(run) Fix traefik middleware configuration
2024-04-05 15:15:49 +02:00
Viktor Lofgren
ae7c760772
(index) Clean up new index query code
2024-04-05 13:30:49 +02:00
Viktor Lofgren
81815f3e0a
(qs, index) New query model integrated with index service.
...
Seems to work, tests are green and initial testing finds no errors. Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.
2024-04-04 20:17:58 +02:00
Viktor Lofgren
87bb93e1d4
(qs, WIP) Fix edge cases in query compilation
...
This addresses the relatively common case where the graph consists of two segments, such as x y, z w; in this case we want an output like (x_y) (z w | z_w) | x y (z_w). The generated output does somewhat pessimize a few other cases, but this one is arguably more important.
2024-03-29 12:40:27 +01:00
Viktor Lofgren
e596c929ac
(qs, WIP) Clean up dead code
2024-03-28 16:37:23 +01:00
Viktor Lofgren
9852b0e609
(qs, WIP) Tidy it up a bit
2024-03-28 14:18:26 +01:00
Viktor Lofgren
51b0d6c0d3
(qs, WIP) Tidy it up a bit
2024-03-28 14:09:17 +01:00
Viktor Lofgren
15391c7a88
(qs, WIP) Tidy it up a bit
2024-03-28 13:54:30 +01:00
Viktor Lofgren
fe62593286
(qs, WIP) Break up code and tidy it up a bit
2024-03-28 13:26:54 +01:00
Viktor Lofgren
4cc11e183c
(qs, WIP) Fix output determinism, fix tests
2024-03-28 13:11:26 +01:00
Viktor Lofgren
f82ebd7716
(WIP) Query rendering finally beginning to look like it works
2024-03-28 13:01:21 +01:00
Viktor Lofgren
bd0704d5a4
(*) Fix JDK22 migration issues
...
A few bizarre build errors cropped up when migrating to JDK22. Not at all sure what caused them, but they were easy to mitigate.
2024-03-21 14:33:27 +01:00
Viktor Lofgren
1968485881
(docs) Upgrade to JDK22
2024-03-21 14:33:27 +01:00
Viktor Lofgren
002afca1c5
(sys) Upgrade to JDK22
...
This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.
2024-03-21 14:33:27 +01:00
Your Name
411b3f3138
(run/install.sh) fix docker compose file
...
I was following the release demo video for v2024.01.0
https://www.youtube.com/watch?v=PNwMkenQQ24 and when I did 'docker
compose up' the containers couldn't resolve the DNS name for 'zookeeper'
I realized this was because the zookeeper container was using the
default docker network, so I specified the wmsa network explicitly.
2024-03-21 14:33:27 +01:00
Viktor Lofgren
a4b810f511
WIP
2024-03-21 14:33:26 +01:00
Viktor Lofgren
0bd3365c24
(convert) Initial integration of segmentation data into the converter's keyword extraction logic
2024-03-19 14:28:42 +01:00
Viktor Lofgren
d8f4e7d72b
(qs) Retire NGramBloomFilter, integrate new segmentation model instead
2024-03-19 10:42:09 +01:00
Viktor Lofgren
afc047cd27
(control) GUI for exporting segmentation data from a wikipedia zim
2024-03-18 13:45:23 +01:00
Viktor Lofgren
00ef4f9803
(WIP) Partial integration of new query expansion code into the query-serivice
2024-03-18 13:16:49 +01:00
Viktor Lofgren
07e4d7ec6d
(WIP) Improve data extraction from wikipedia data
2024-03-18 13:16:00 +01:00
Viktor Lofgren
8ae1f08095
(WIP) Implement first take of new query segmentation algorithm
2024-03-12 13:12:50 +01:00
Viktor Lofgren
57e6a12d08
(registry) Correct registerMonitor() behavior
...
The previous behavior would listen to too many changes, and based on zookeeper and not curator assumptions about behavior, add an additional monitor on each invocation of each monitor, (which always trigger on service state changes), leading to each monitor re-registering and effectively doubling monitors in numbers whenever a service stopped or started, which in turn meant a lot of bizarre thrashing behavior even on changes in services that don't explicitly talk to each other.
This re-registering behavior is no longer done.
2024-03-06 12:22:15 +01:00
Viktor Lofgren
46423612e3
(refac) Merge service-discovery and service modules
...
Also adds a few tests to the server/client code.
2024-03-03 10:49:23 +01:00
Viktor Lofgren
29bf473d74
(encyclopedia) Add URLencoding to path element
...
This prevents corruption of the links to the sideloaded encyclopedia data when the article path contains characters that are not valid in a URL.
2024-03-01 17:28:09 +01:00
Viktor Lofgren
9689f3faee
(domain-info) Fix incorrect array indexing
2024-02-29 18:56:09 +01:00
Viktor Lofgren
93fa58c93d
(domain-info) Fix incorrect array indexing
...
Using the id instead of idx when addressing the ranksArray caused exceptions.
2024-02-29 17:54:23 +01:00
Viktor Lofgren
186a98cc99
(doc) Fix wonky bullet lists
2024-02-28 17:43:05 +01:00
Viktor Lofgren
9993f265ca
(doc) Remove irrelevant text
2024-02-28 17:40:05 +01:00
Viktor Lofgren
144f967dbf
(misc) Tweak pool sizes
2024-02-28 16:23:02 +01:00
Viktor Lofgren
b31c9bb726
(docs) Update process docs
2024-02-28 15:21:33 +01:00
Viktor Lofgren
c0820b5e5c
(docs) Update service docs
2024-02-28 15:19:31 +01:00
Viktor Lofgren
65b8a1d5d9
(grpc) Reduce error spam
2024-02-28 14:44:48 +01:00
Viktor Lofgren
a0648844fb
(grpc) Reduce error spam
2024-02-28 14:35:29 +01:00
Viktor Lofgren
c4a27003c6
(docs) Fix formatting
2024-02-28 14:22:57 +01:00
Viktor Lofgren
41abd8982f
(math) Clean up error handling
2024-02-28 14:19:50 +01:00
Viktor Lofgren
86bbc1043e
(service) Clean up thread pool creation
2024-02-28 14:06:32 +01:00
Viktor Lofgren
9a045a0588
(index) Clean up index code
2024-02-28 13:09:47 +01:00
Viktor Lofgren
9415539b38
(docs) Update docs
2024-02-28 12:25:19 +01:00
Viktor Lofgren
84bab2783d
(docs) Fix fake news in docs
2024-02-28 12:16:45 +01:00