mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-23 21:18:58 +00:00
![]() Add an optional new field to CrawledDocument containing information about whether the domain has cookies. This was previously on the CrawledDomain object, but since the WarcFormat requires us to write a WarcInfo object at the start of a crawl rather than at the end, this information is unobtainable when creating the CrawledDomain object. Also fix a bug in the deduplication logic in the DomainProcessor class that caused a test to break. |
||
---|---|---|
.. | ||
crawl-spec | ||
crawling-model | ||
processed-data | ||
work-log |