MarginaliaSearch/code/process-models
Viktor Lofgren fa81e5b8ee (warc) Use a non-standard WARC header to convey information about whether a website uses cookies
This information is then propagated to the parquet file as a boolean.

For documents that are copied from the reference, use whatever value we last saw.  This isn't 100% deterministic and may result in false negatives, but permits websites that used cookies but have stopped to repent and have the change reflect in the search engine more quickly.
2023-12-15 16:37:53 +01:00
..
crawl-spec (*) WIP Add node affinity to EC_DOMAIN 2023-10-19 17:48:34 +02:00
crawling-model (warc) Use a non-standard WARC header to convey information about whether a website uses cookies 2023-12-15 16:37:53 +01:00
processed-data (*) Refactor GeoIP-related code 2023-12-10 17:30:43 +01:00
work-log (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00