MarginaliaSearch/code/processes/crawling-process/model/java/nu/marginalia
Viktor Lofgren b510b7feb8 Spike for storing crawl data in slop instead of parquet
This seems to reduce RAM overhead to 100s of MB (from ~2 GB), as well as roughly double the read speeds.  On disk size is virtually identical.
2024-12-15 15:49:47 +01:00
..
io Merge branch 'master' into live-search 2024-11-21 16:00:20 +01:00
model (model) Remove deprecated fields from CrawledDocument and CrawledDomain 2024-11-20 15:27:05 +01:00
parquet/crawldata (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00
slop Spike for storing crawl data in slop instead of parquet 2024-12-15 15:49:47 +01:00
ContentTypes.java (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00