MarginaliaSearch/code/process-models/crawling-model/src
Viktor Lofgren fa81e5b8ee (warc) Use a non-standard WARC header to convey information about whether a website uses cookies
This information is then propagated to the parquet file as a boolean.

For documents that are copied from the reference, use whatever value we last saw.  This isn't 100% deterministic and may result in false negatives, but permits websites that used cookies but have stopped to repent and have the change reflect in the search engine more quickly.
2023-12-15 16:37:53 +01:00
..
main/java (warc) Use a non-standard WARC header to convey information about whether a website uses cookies 2023-12-15 16:37:53 +01:00
test/java/nu/marginalia/crawling/parquet (warc) Clean up parquet conversion 2023-12-14 20:39:40 +01:00