Viktor Lofgren
|
a8bec13ed9
|
(index) Evaluate using mmap reads during index construction in favor of filechannel reads
It's likely that this will be faster, as the reads are on average small and sequential, and can't be buffered easily.
|
2024-09-13 16:14:56 +02:00 |
|
Viktor Lofgren
|
1cf62f5850
|
(doc) Correct dead links and stale information in the docs
|
2024-09-13 11:02:13 +02:00 |
|
Viktor Lofgren
|
8047e77757
|
(doc) Correct dead links and stale information in the docs
|
2024-09-13 11:01:05 +02:00 |
|
Viktor Lofgren
|
2a92de29ce
|
(loader) Fix it so that the loader doesn't explode if it sees an invalid URL
|
2024-09-12 11:36:00 +02:00 |
|
Viktor Lofgren
|
99523ca079
|
(query-parser) Remove test that is no longer relevant
|
2024-09-10 10:35:56 +02:00 |
|
Viktor Lofgren
|
35f49bbb60
|
(coded-sequence) Add equals and hashCode to VCS
|
2024-09-10 10:33:56 +02:00 |
|
Viktor Lofgren
|
50ec922c2b
|
(index) Fix broken index tests
Also cleaned up the tests to be less fragile to ranking algorithm changes.
|
2024-09-10 10:23:46 +02:00 |
|
Viktor Lofgren
|
cfbbeaa26e
|
(ranking) Clean up ranking test code
|
2024-09-08 15:46:51 +02:00 |
|
Viktor Lofgren
|
a3b0189934
|
Fix build errors after merge
|
2024-09-08 10:22:32 +02:00 |
|
Viktor Lofgren
|
8f367d96f8
|
Merge branch 'master' into term-positions
# Conflicts:
# code/index/java/nu/marginalia/index/results/model/ids/TermIdList.java
# code/processes/converting-process/java/nu/marginalia/converting/ConverterMain.java
# code/processes/crawling-process/java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java
# code/processes/crawling-process/java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java
# code/processes/crawling-process/model/java/nu/marginalia/io/crawldata/CrawledDomainReader.java
# code/processes/crawling-process/test/nu/marginalia/crawling/HttpFetcherTest.java
# code/processes/crawling-process/test/nu/marginalia/crawling/retreival/CrawlerMockFetcherTest.java
# code/services-application/search-service/java/nu/marginalia/search/svc/SearchQueryIndexService.java
|
2024-09-08 10:14:43 +02:00 |
|
Viktor Lofgren
|
f78ef36cd4
|
(slop) Upgrade to 0.0.8, add encodings to string columns.
|
2024-09-04 15:19:00 +02:00 |
|
Viktor Lofgren
|
dc67c81f99
|
(summary) Fix a few cases where noscript tags would sometimes be used for document summary
|
2024-09-04 15:00:40 +02:00 |
|
Viktor Lofgren
|
50ba8fd099
|
(query-parsing) Correct handling of trailing parentheses
|
2024-09-03 11:45:14 +02:00 |
|
Viktor Lofgren
|
99b3b00b68
|
(query-parsing) Merge QueryTokenizer into QueryParser and add escaping of query grammar
|
2024-09-03 11:35:32 +02:00 |
|
Viktor Lofgren
|
f6d981761d
|
(query-parsing) Drop search term elements that aren't indexed by the search engine
|
2024-09-03 11:24:05 +02:00 |
|
Viktor Lofgren
|
8290c19e24
|
(query-parsing) Drop search term elements that aren't indexed by the search engine
|
2024-09-03 11:21:01 +02:00 |
|
Viktor Lofgren
|
7a69dff6cf
|
(search) Correct handling of languages on fandom
|
2024-09-01 13:46:01 +02:00 |
|
Viktor Lofgren
|
bfb7ed2c99
|
(search) Translate cursed medium URLs to scribe.rip links via the search application
|
2024-09-01 13:32:14 +02:00 |
|
Viktor Lofgren
|
e19dc9b13e
|
(search) Translate cursed fandom URLs to breezewiki links via the search application
|
2024-09-01 13:23:35 +02:00 |
|
Viktor Lofgren
|
74148c790e
|
(crawler) Pull additional new domains from node-affinity 0
Previously a bit ambiguously defined, node affinity 0 is now indicative that a domain is up for grabs for the next crawler
|
2024-09-01 13:00:36 +02:00 |
|
Viktor Lofgren
|
3d77456110
|
(*) Add domain parking service to ip blocklist
|
2024-09-01 12:53:22 +02:00 |
|
Viktor Lofgren
|
ab6a4b1749
|
(control) Correct id value for domain addition tool
|
2024-09-01 12:25:15 +02:00 |
|
Viktor Lofgren
|
aeeb1d0cb7
|
(control) Add utility for adding domains from an external URL
|
2024-09-01 12:14:21 +02:00 |
|
Viktor Lofgren
|
185b79f2a5
|
(converter) Fix bug where sideloaded reddit content was errouneously categoriszed as wiki-generated.
|
2024-09-01 11:30:25 +02:00 |
|
Viktor Lofgren
|
8d0f9652c7
|
(crawler) Correct RSS-sitemap behavior
|
2024-08-31 11:38:34 +02:00 |
|
Viktor Lofgren
|
5353805cc6
|
(crawler) Correct RSS-sitemap behavior
|
2024-08-31 11:37:09 +02:00 |
|
Viktor Lofgren
|
5407da5650
|
(crawler) Grab favicons as part of root sniff
|
2024-08-31 11:32:56 +02:00 |
|
Viktor Lofgren
|
b1bfe6f76e
|
(control) New view for domains
Add capability to assign domains, and bulk-add new domains.
|
2024-08-30 17:06:48 +02:00 |
|
Viktor Lofgren
|
74e25370ca
|
(control) New view for domains
Still a work in progress, but at this point it's possible to use for viewing domains
|
2024-08-29 15:40:40 +02:00 |
|
Viktor Lofgren
|
bb5d946c26
|
(index, EXPERIMENTAL) Clean up ranking code
|
2024-08-29 11:34:23 +02:00 |
|
Viktor Lofgren
|
abab5bdc8a
|
(index, EXPERIMENTAL) Evaluate using Varint instead of GCS for position data
|
2024-08-26 14:20:39 +02:00 |
|
Viktor Lofgren
|
30bf845c81
|
(index) Speed up minDist calculations by excluding large lists
|
2024-08-26 13:04:15 +02:00 |
|
Viktor Lofgren
|
77efce0673
|
(paper-doll) Fix compilation
|
2024-08-26 12:51:29 +02:00 |
|
Viktor Lofgren
|
67a98fb0b0
|
(coded-sequence) Handle weird legacy HTML that puts everything in a heading
|
2024-08-26 12:49:15 +02:00 |
|
Viktor Lofgren
|
7d471ec30d
|
(coded-sequence) Evaluate new minDist implementation
|
2024-08-26 12:45:11 +02:00 |
|
Viktor Lofgren
|
f3182a9264
|
(coded-sequence) Evaluate new minDist implementation
|
2024-08-26 12:02:37 +02:00 |
|
Viktor Lofgren
|
805cb5ad58
|
(coded-sequence) Correct behavior of findIntersections
|
2024-08-25 14:54:17 +02:00 |
|
Viktor Lofgren
|
fdf05cedae
|
(index) Optimize DocumentSpan.countIntersections
|
2024-08-25 14:12:30 +02:00 |
|
Viktor Lofgren
|
9c5f463775
|
(index) Optimize DocumentSpan.countIntersections
|
2024-08-25 13:59:11 +02:00 |
|
Viktor Lofgren
|
893fae6d59
|
(index) Optimize DocumentSpan.countIntersections
|
2024-08-25 13:51:43 +02:00 |
|
Viktor Lofgren
|
5660f291af
|
(index) Optimize DocumentSpan.countIntersections
|
2024-08-25 13:43:29 +02:00 |
|
Viktor Lofgren
|
efd56efc63
|
(index) Optimize SequenceOperations.minDistance
|
2024-08-25 13:28:06 +02:00 |
|
Viktor Lofgren
|
d94373f4b1
|
(index) Optimize calculatePositionsMask
|
2024-08-25 13:24:37 +02:00 |
|
Viktor Lofgren
|
0d01a48260
|
(index) Optimize SequenceOperations
|
2024-08-25 13:19:37 +02:00 |
|
Viktor Lofgren
|
00ab2684fa
|
(index) Optimize SequenceOperations
|
2024-08-25 13:17:38 +02:00 |
|
Viktor Lofgren
|
a5585110a6
|
(index) Optimize SequenceOperations
|
2024-08-25 13:16:31 +02:00 |
|
Viktor Lofgren
|
965c89798e
|
(index) Optimize DocumentSpan
|
2024-08-25 12:44:33 +02:00 |
|
Viktor Lofgren
|
982b03382b
|
(index) Optimize DocumentSpan
|
2024-08-25 12:31:15 +02:00 |
|
Viktor Lofgren
|
24b805472a
|
(index) Evaluate performance implication of decoding gcs early
|
2024-08-25 12:23:09 +02:00 |
|
Viktor Lofgren
|
6ce029b317
|
(index) Remove vestigial parameter
|
2024-08-25 12:14:12 +02:00 |
|