Viktor Lofgren
|
cfbbeaa26e
|
(ranking) Clean up ranking test code
|
2024-09-08 15:46:51 +02:00 |
|
Viktor Lofgren
|
a3b0189934
|
Fix build errors after merge
|
2024-09-08 10:22:32 +02:00 |
|
Viktor Lofgren
|
8f367d96f8
|
Merge branch 'master' into term-positions
# Conflicts:
# code/index/java/nu/marginalia/index/results/model/ids/TermIdList.java
# code/processes/converting-process/java/nu/marginalia/converting/ConverterMain.java
# code/processes/crawling-process/java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java
# code/processes/crawling-process/java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java
# code/processes/crawling-process/model/java/nu/marginalia/io/crawldata/CrawledDomainReader.java
# code/processes/crawling-process/test/nu/marginalia/crawling/HttpFetcherTest.java
# code/processes/crawling-process/test/nu/marginalia/crawling/retreival/CrawlerMockFetcherTest.java
# code/services-application/search-service/java/nu/marginalia/search/svc/SearchQueryIndexService.java
|
2024-09-08 10:14:43 +02:00 |
|
Viktor Lofgren
|
f78ef36cd4
|
(slop) Upgrade to 0.0.8, add encodings to string columns.
|
2024-09-04 15:19:00 +02:00 |
|
Viktor Lofgren
|
dc67c81f99
|
(summary) Fix a few cases where noscript tags would sometimes be used for document summary
|
2024-09-04 15:00:40 +02:00 |
|
Viktor Lofgren
|
50ba8fd099
|
(query-parsing) Correct handling of trailing parentheses
|
2024-09-03 11:45:14 +02:00 |
|
Viktor Lofgren
|
99b3b00b68
|
(query-parsing) Merge QueryTokenizer into QueryParser and add escaping of query grammar
|
2024-09-03 11:35:32 +02:00 |
|
Viktor Lofgren
|
f6d981761d
|
(query-parsing) Drop search term elements that aren't indexed by the search engine
|
2024-09-03 11:24:05 +02:00 |
|
Viktor Lofgren
|
8290c19e24
|
(query-parsing) Drop search term elements that aren't indexed by the search engine
|
2024-09-03 11:21:01 +02:00 |
|
Viktor Lofgren
|
7a69dff6cf
|
(search) Correct handling of languages on fandom
|
2024-09-01 13:46:01 +02:00 |
|
Viktor Lofgren
|
bfb7ed2c99
|
(search) Translate cursed medium URLs to scribe.rip links via the search application
|
2024-09-01 13:32:14 +02:00 |
|
Viktor Lofgren
|
e19dc9b13e
|
(search) Translate cursed fandom URLs to breezewiki links via the search application
|
2024-09-01 13:23:35 +02:00 |
|
Viktor Lofgren
|
74148c790e
|
(crawler) Pull additional new domains from node-affinity 0
Previously a bit ambiguously defined, node affinity 0 is now indicative that a domain is up for grabs for the next crawler
|
2024-09-01 13:00:36 +02:00 |
|
Viktor Lofgren
|
3d77456110
|
(*) Add domain parking service to ip blocklist
|
2024-09-01 12:53:22 +02:00 |
|
Viktor Lofgren
|
ab6a4b1749
|
(control) Correct id value for domain addition tool
|
2024-09-01 12:25:15 +02:00 |
|
Viktor Lofgren
|
aeeb1d0cb7
|
(control) Add utility for adding domains from an external URL
|
2024-09-01 12:14:21 +02:00 |
|
Viktor Lofgren
|
185b79f2a5
|
(converter) Fix bug where sideloaded reddit content was errouneously categoriszed as wiki-generated.
|
2024-09-01 11:30:25 +02:00 |
|
Viktor Lofgren
|
8d0f9652c7
|
(crawler) Correct RSS-sitemap behavior
|
2024-08-31 11:38:34 +02:00 |
|
Viktor Lofgren
|
5353805cc6
|
(crawler) Correct RSS-sitemap behavior
|
2024-08-31 11:37:09 +02:00 |
|
Viktor Lofgren
|
5407da5650
|
(crawler) Grab favicons as part of root sniff
|
2024-08-31 11:32:56 +02:00 |
|
Viktor Lofgren
|
b1bfe6f76e
|
(control) New view for domains
Add capability to assign domains, and bulk-add new domains.
|
2024-08-30 17:06:48 +02:00 |
|
Viktor Lofgren
|
74e25370ca
|
(control) New view for domains
Still a work in progress, but at this point it's possible to use for viewing domains
|
2024-08-29 15:40:40 +02:00 |
|
Viktor Lofgren
|
bb5d946c26
|
(index, EXPERIMENTAL) Clean up ranking code
|
2024-08-29 11:34:23 +02:00 |
|
Viktor Lofgren
|
abab5bdc8a
|
(index, EXPERIMENTAL) Evaluate using Varint instead of GCS for position data
|
2024-08-26 14:20:39 +02:00 |
|
Viktor Lofgren
|
30bf845c81
|
(index) Speed up minDist calculations by excluding large lists
|
2024-08-26 13:04:15 +02:00 |
|
Viktor Lofgren
|
77efce0673
|
(paper-doll) Fix compilation
|
2024-08-26 12:51:29 +02:00 |
|
Viktor Lofgren
|
67a98fb0b0
|
(coded-sequence) Handle weird legacy HTML that puts everything in a heading
|
2024-08-26 12:49:15 +02:00 |
|
Viktor Lofgren
|
7d471ec30d
|
(coded-sequence) Evaluate new minDist implementation
|
2024-08-26 12:45:11 +02:00 |
|
Viktor Lofgren
|
f3182a9264
|
(coded-sequence) Evaluate new minDist implementation
|
2024-08-26 12:02:37 +02:00 |
|
Viktor Lofgren
|
805cb5ad58
|
(coded-sequence) Correct behavior of findIntersections
|
2024-08-25 14:54:17 +02:00 |
|
Viktor Lofgren
|
fdf05cedae
|
(index) Optimize DocumentSpan.countIntersections
|
2024-08-25 14:12:30 +02:00 |
|
Viktor Lofgren
|
9c5f463775
|
(index) Optimize DocumentSpan.countIntersections
|
2024-08-25 13:59:11 +02:00 |
|
Viktor Lofgren
|
893fae6d59
|
(index) Optimize DocumentSpan.countIntersections
|
2024-08-25 13:51:43 +02:00 |
|
Viktor Lofgren
|
5660f291af
|
(index) Optimize DocumentSpan.countIntersections
|
2024-08-25 13:43:29 +02:00 |
|
Viktor Lofgren
|
efd56efc63
|
(index) Optimize SequenceOperations.minDistance
|
2024-08-25 13:28:06 +02:00 |
|
Viktor Lofgren
|
d94373f4b1
|
(index) Optimize calculatePositionsMask
|
2024-08-25 13:24:37 +02:00 |
|
Viktor Lofgren
|
0d01a48260
|
(index) Optimize SequenceOperations
|
2024-08-25 13:19:37 +02:00 |
|
Viktor Lofgren
|
00ab2684fa
|
(index) Optimize SequenceOperations
|
2024-08-25 13:17:38 +02:00 |
|
Viktor Lofgren
|
a5585110a6
|
(index) Optimize SequenceOperations
|
2024-08-25 13:16:31 +02:00 |
|
Viktor Lofgren
|
965c89798e
|
(index) Optimize DocumentSpan
|
2024-08-25 12:44:33 +02:00 |
|
Viktor Lofgren
|
982b03382b
|
(index) Optimize DocumentSpan
|
2024-08-25 12:31:15 +02:00 |
|
Viktor Lofgren
|
24b805472a
|
(index) Evaluate performance implication of decoding gcs early
|
2024-08-25 12:23:09 +02:00 |
|
Viktor Lofgren
|
6ce029b317
|
(index) Remove vestigial parameter
|
2024-08-25 12:14:12 +02:00 |
|
Viktor Lofgren
|
63e5b0ab18
|
(index) Correct weightedCounts calculations
|
2024-08-25 12:06:56 +02:00 |
|
Viktor Lofgren
|
6dda2c2d83
|
(coded-sequence) Reduce allocations in GCS.values()
|
2024-08-25 12:06:31 +02:00 |
|
Viktor Lofgren
|
3fb3c0b92e
|
(index) Optimize ranking calculations
|
2024-08-25 11:56:11 +02:00 |
|
Viktor Lofgren
|
aa2c960b74
|
(index) Optimize ranking calculations
|
2024-08-25 11:53:44 +02:00 |
|
Viktor Lofgren
|
4fbcc02f96
|
(index) Adjust sensible defaults for ranking parameters
|
2024-08-25 11:24:16 +02:00 |
|
Viktor Lofgren
|
9aa8f13731
|
(index) Remove tcfAvgDist ranking parameter
This is captured by tcfProximity already
|
2024-08-25 11:20:19 +02:00 |
|
Viktor Lofgren
|
65bee366dc
|
(index) Try harmonic mean for avgMinDist
|
2024-08-25 11:11:52 +02:00 |
|