Viktor Lofgren
67db3f295e
(index) Revert some optimization changes
2024-12-12 22:14:24 +01:00
Viktor Lofgren
dafaab3ef7
(index) Additional optimization pass
2024-12-12 18:57:33 +01:00
Viktor Lofgren
3f11ca409f
(index) Increase thread limit and optimize search result handling
...
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 17:07:06 +01:00
Viktor Lofgren
694eed79ef
(index) Increase thread limit and optimize search result handling
...
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 15:32:31 +01:00
Viktor Lofgren
4220169119
(index) Increase thread limit and optimize search result handling
...
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 15:31:11 +01:00
Viktor Lofgren
73861e613f
(ranking) Downtune score boost for unordered heading matces
2024-12-11 15:44:29 +01:00
Viktor Lofgren
cf7f84f033
(rank) Reduce the impact of domain rank bonus, and only apply it to cancel out negative penalties, never to increase the ranking
2024-12-10 22:04:12 +01:00
Viktor Lofgren
291ca8daf1
(converter/index) Improve atag sentence matching by taking into consideration how many times a sentence appears in the links
...
This change breaks the format of the atags.parquet file.
2024-12-08 00:27:11 +01:00
Viktor Lofgren
7b64377fd6
(ranking) Promote documents with multiple phrase matches with a log-scale bonus
2024-11-28 13:36:56 +01:00
Viktor Lofgren
e11ebf18e5
(span) Correct intersection counting logic, add comprehensive tests
2024-11-28 13:36:25 +01:00
Viktor Lofgren
ba47d72bf4
(ranking) Adjust scores for external link matches
2024-11-27 14:27:23 +01:00
Viktor Lofgren
077d8dcd11
(result-score) Adjust ranking parameters a tiny bit
2024-11-25 18:30:59 +01:00
Viktor Lofgren
6d7998e349
(index) Correct behavior of debug function positionValues(), which was misleadingly incorrect
2024-11-25 18:28:53 +01:00
Viktor Lofgren
7d1ef08a0f
(index) Correct ranking bonus for external linktext appearnces
2024-11-25 17:40:15 +01:00
Viktor Lofgren
0b6b5dab07
(index) Add score bonuses for single-word anchor tag spans
...
Enhanced scoring logic to add bonuses when the query matches single-word anchor (atag) spans exactly. Implemented this by adding conditions in `IndexResultScoreCalculator.java` and creating a new method `containsRangeExact` in `DocumentSpan.java` to check for exact span matches.
2024-11-25 14:44:41 +01:00
Viktor Lofgren
dc5f97e737
(index) Add bonus for single-word title matches when the title is also a single word
2024-11-25 13:24:12 +01:00
Viktor Lofgren
d919179ba3
(index) Correct off-by-1 error in DocumentSpan.containsRange
2024-11-25 13:24:03 +01:00
Viktor Lofgren
f09669a5b0
(index) Correct usage of DocumentSpan.length() instead of DocumentSpan.size()
...
The latter counts the number of spans, and is not what you want here.
2024-11-25 13:11:55 +01:00
Viktor Lofgren
51e46ad2b0
(refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents
...
Since some of the export tasks have been memory hungry, sometimes killing the executor-services, they've been moved to a separate process that can be given a larger Xmx.
While doing this, the ProcessMainClass was given utilities for the boilerplate surrounding receiving mq requests and responding to them, some effort was also put toward making the process boot process a bit more uniform. It's still a bit heterogeneous between different processes, but a bit less so for now.
2024-11-21 16:00:09 +01:00
Viktor Lofgren
9f47ce8d15
(chore) Remove lombok
...
There are likely some instances of delombok gore with this commit.
2024-11-11 21:14:38 +01:00
Viktor Lofgren
6460c11107
(index) Short-circuit rankResults when there are no results
2024-10-14 13:47:35 +02:00
Viktor Lofgren
90a2d4ae38
(index) Fix partial buffer writing in PrioDocIdsTransformer
...
Ensure all data is written to writeChannel by looping until the buffer is fully drained. This prevents potential data loss during the close operation and maintains data integrity.
2024-09-29 17:53:40 +02:00
Viktor Lofgren
69d99c91dd
(index) Optimize buffer handling in PrioDocIdsTransformer
2024-09-29 17:20:49 +02:00
Viktor Lofgren
a8cc98a0f6
(index) Fix write offset calculation in PrioDocIdsTransformer
...
Adjust the write offset calculation by adding the position of the write buffer. Updated the test to validate the transformation process and ensure correctness of output file positions.
2024-09-29 17:20:29 +02:00
Viktor Lofgren
1bd29a586c
(service-discovery) Add common base interface to all Grpc services
...
To be able to tell service discovery whether to enable a service on a particular runtime, a common base interface DiscoverableService extends BindableService was added.
2024-09-27 13:46:34 +02:00
Viktor Lofgren
336d6fdd14
(index-client) Fix error when zero results are found
2024-09-25 20:23:13 +02:00
Viktor Lofgren
73f973cc06
(search-query) Add pagination to search query API and the direct query-service interface
2024-09-25 14:20:59 +02:00
Viktor Lofgren
3dec4b6b34
(index) Fix bug where tcfFirstPosition lit up because one term was in the title and the other was missing from the document
...
This was because firstPosition calculation was not invalidated when positions were missing.
2024-09-24 13:33:37 +02:00
Viktor Lofgren
9c292a4f62
(doc) Fix outdated links in documentation
2024-09-22 13:56:17 +02:00
Viktor Lofgren
8e78286068
Merge branch 'master' into term-positions
2024-09-17 15:20:46 +02:00
Viktor Lofgren
f4eeef145e
(index) Reduce fetch size to improve timeout characteristics
2024-09-17 15:20:41 +02:00
Viktor Lofgren
87aa869338
(index) Correct positions mask to take into account offsets when overlapping
2024-09-17 14:40:37 +02:00
Viktor Lofgren
a74df7f905
(index) Increase buffer size for PrioDocIdsTransformer
2024-09-17 13:52:52 +02:00
Viktor Lofgren
b95646625f
(index) Correct prio index construction with mmap
...
Accidentally snuck in behavior from full index
2024-09-17 13:39:08 +02:00
Viktor Lofgren
6e47eae903
(index) Correct strange close handling of PositionsFileConstructor
2024-09-13 16:34:14 +02:00
Viktor Lofgren
934af0dd4b
(index) Correct units in log message when shrinking the documents file
2024-09-13 16:33:19 +02:00
Viktor Lofgren
a8bec13ed9
(index) Evaluate using mmap reads during index construction in favor of filechannel reads
...
It's likely that this will be faster, as the reads are on average small and sequential, and can't be buffered easily.
2024-09-13 16:14:56 +02:00
Viktor Lofgren
8047e77757
(doc) Correct dead links and stale information in the docs
2024-09-13 11:01:05 +02:00
Viktor Lofgren
50ec922c2b
(index) Fix broken index tests
...
Also cleaned up the tests to be less fragile to ranking algorithm changes.
2024-09-10 10:23:46 +02:00
Viktor Lofgren
cfbbeaa26e
(ranking) Clean up ranking test code
2024-09-08 15:46:51 +02:00
Viktor Lofgren
bb5d946c26
(index, EXPERIMENTAL) Clean up ranking code
2024-08-29 11:34:23 +02:00
Viktor Lofgren
abab5bdc8a
(index, EXPERIMENTAL) Evaluate using Varint instead of GCS for position data
2024-08-26 14:20:39 +02:00
Viktor Lofgren
30bf845c81
(index) Speed up minDist calculations by excluding large lists
2024-08-26 13:04:15 +02:00
Viktor Lofgren
67a98fb0b0
(coded-sequence) Handle weird legacy HTML that puts everything in a heading
2024-08-26 12:49:15 +02:00
Viktor Lofgren
f3182a9264
(coded-sequence) Evaluate new minDist implementation
2024-08-26 12:02:37 +02:00
Viktor Lofgren
fdf05cedae
(index) Optimize DocumentSpan.countIntersections
2024-08-25 14:12:30 +02:00
Viktor Lofgren
9c5f463775
(index) Optimize DocumentSpan.countIntersections
2024-08-25 13:59:11 +02:00
Viktor Lofgren
893fae6d59
(index) Optimize DocumentSpan.countIntersections
2024-08-25 13:51:43 +02:00
Viktor Lofgren
5660f291af
(index) Optimize DocumentSpan.countIntersections
2024-08-25 13:43:29 +02:00
Viktor Lofgren
efd56efc63
(index) Optimize SequenceOperations.minDistance
2024-08-25 13:28:06 +02:00