MarginaliaSearch/code/process-models/crawling-model
Viktor Lofgren 2e7db61808 (warc) More accurate filering of advisory records
We want to mute some of these records so that they don't produce documents, but in some cases we want a document to be produced for accounting purposes.

Added improved tests that reach for known resources on www.marginalia.nu to test the behavior when encountering bad content type and 404s.

The commit also adds some safety try-catch:es around the charset handling, as it may sometimes explode when fed incorrect data, and we do be guessing...
2023-12-15 21:31:16 +01:00
..
src (warc) More accurate filering of advisory records 2023-12-15 21:31:16 +01:00
build.gradle (warc) More accurate filering of advisory records 2023-12-15 21:31:16 +01:00
readme.md (refactor) Remove features-search and update documentation 2023-10-09 15:12:30 +02:00

Crawling Models

Contains models shared by the crawling-process and converting-process.

Central Classes

Serialization