MarginaliaSearch/code/processes/live-crawling-process/java/nu/marginalia/livecrawler
2024-12-10 13:42:10 +01:00
..
LiveCrawlDataSet.java (live-crawler) Keep track of bad URLs 2024-11-22 00:55:46 +01:00
LiveCrawlerMain.java (live-crawler) Flag live crawled documents with a special keyword 2024-12-10 13:42:10 +01:00
LiveCrawlerModule.java (refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents 2024-11-21 16:00:09 +01:00
SimpleLinkScraper.java (live-crawl) Flag URLs that don't pass robots.txt as bad so we don't keep fetching robots.txt every day for an empty link list 2024-11-23 17:07:16 +01:00