MarginaliaSearch/code/processes/crawling-process/test/nu/marginalia/crawl
Viktor Lofgren 895cee7004 (crawler) Improved feed discovery, new domain state db per crawlset
Feed discover is improved with by probing a few likely endpoints when no feed link tag is provided.  To store the feed URLs, a sqlite database is added to each crawlset that stores a simple summary of the crawl job, including any feed URLs that have been discovered.

Solves issue #135
2024-12-26 15:05:52 +01:00
..
retreival (crawler) Correct content type probing to only run on URLs that are suspected to be binary 2024-12-26 14:26:23 +01:00
DomainStateDbTest.java (crawler) Improved feed discovery, new domain state db per crawlset 2024-12-26 15:05:52 +01:00