MarginaliaSearch/code/processes/live-crawling-process/java/nu/marginalia/livecrawler
2024-11-23 17:07:16 +01:00
..
LiveCrawlDataSet.java (live-crawler) Keep track of bad URLs 2024-11-22 00:55:46 +01:00
LiveCrawlerMain.java (live-crawler) Alter DbDomainIdRegistry to make inserts if an id is missing, as this is apparently a rare scenario we need to deal with. 2024-11-22 13:58:57 +01:00
LiveCrawlerModule.java (refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents 2024-11-21 16:00:09 +01:00
SimpleLinkScraper.java (live-crawl) Flag URLs that don't pass robots.txt as bad so we don't keep fetching robots.txt every day for an empty link list 2024-11-23 17:07:16 +01:00