MarginaliaSearch/code/process-models/crawling-model
Viktor Lofgren 3a56a06c4f (warc) Add a fields for etags and last-modified headers to the new crawl data formats
Make some temporary modifications to the CrawledDocument model to support both a "big string" style headers field like in the old formats, and explicit fields as in the new formats.  This is a bit awkward to deal with, but it's a necessity until we migrate off the old formats entirely.

The commit also adds a few tests to this logic.
2023-12-18 17:45:54 +01:00
..
src (warc) Add a fields for etags and last-modified headers to the new crawl data formats 2023-12-18 17:45:54 +01:00
build.gradle (warc) Filter WarcResponses based on X-Robots-Tags 2023-12-16 15:58:27 +01:00
readme.md (refactor) Remove features-search and update documentation 2023-10-09 15:12:30 +02:00

Crawling Models

Contains models shared by the crawling-process and converting-process.

Central Classes

Serialization