MarginaliaSearch/code/processes/crawling-process/model/java/nu/marginalia
Viktor Lofgren f6f036b9b1 Switch to new Slop format for crawl data storage and processing.
Replaces Parquet output and processing with the new Slop-based format. Includes data migration functionality, updates to handling and writing of crawl data, and introduces support for SLOP in domain readers and converters.
2024-12-15 19:34:03 +01:00
..
io Switch to new Slop format for crawl data storage and processing. 2024-12-15 19:34:03 +01:00
model (model) Remove deprecated fields from CrawledDocument and CrawledDomain 2024-11-20 15:27:05 +01:00
parquet/crawldata (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00
slop Switch to new Slop format for crawl data storage and processing. 2024-12-15 19:34:03 +01:00
ContentTypes.java (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00