MarginaliaSearch/code/processes/crawling-process/model/java/nu/marginalia
Viktor Lofgren 895cee7004 (crawler) Improved feed discovery, new domain state db per crawlset
Feed discover is improved with by probing a few likely endpoints when no feed link tag is provided.  To store the feed URLs, a sqlite database is added to each crawlset that stores a simple summary of the crawl job, including any feed URLs that have been discovered.

Solves issue #135
2024-12-26 15:05:52 +01:00
..
io Merge branch 'master' into live-search 2024-11-21 16:00:20 +01:00
model (crawler) Improved feed discovery, new domain state db per crawlset 2024-12-26 15:05:52 +01:00
parquet/crawldata (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00
ContentTypes.java (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00