MarginaliaSearch/code/processes/crawling-process/model/java
Viktor Lofgren b8581b0f56 (crawler) Safe sanitization of headers during warc->slop conversion
The warc->slop converter was rejecting some items because they had headers that were representable in the Warc code's MessageHeader map implementation, but illegal in the HttpHeaders' implementation.

Fixing this by manually filtering these out.  Ostensibly the constructor has a filtering predicate, but this annoyingly runs too late and fails to prevent the problem.
2025-01-31 12:47:42 +01:00
..
nu/marginalia (crawler) Safe sanitization of headers during warc->slop conversion 2025-01-31 12:47:42 +01:00
org/netpreserve/jwarc (crawler) Migrate away from using OkHttp in the crawler, use Java's HttpClient instead. 2025-01-19 15:07:11 +01:00