MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 13:19:02 +00:00

History

Viktor Lofgren b510b7feb8 Spike for storing crawl data in slop instead of parquet This seems to reduce RAM overhead to 100s of MB (from ~2 GB), as well as roughly double the read speeds. On disk size is virtually identical.		2024-12-15 15:49:47 +01:00
..
io	Merge branch 'master' into live-search	2024-11-21 16:00:20 +01:00
model	(model) Remove deprecated fields from CrawledDocument and CrawledDomain	2024-11-20 15:27:05 +01:00
parquet/crawldata	(crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets	2024-12-11 17:01:52 +01:00
slop	Spike for storing crawl data in slop instead of parquet	2024-12-15 15:49:47 +01:00
ContentTypes.java	(crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets	2024-12-11 17:01:52 +01:00