MarginaliaSearch/code/processes/crawling-process/model/java/nu/marginalia/io
Viktor Lofgren 0ca43f0c9c (live-crawler) Improve live crawler short-circuit logic
We should not wait until we've fetched robots.txt to decide whether we have any data to fetch!  This makes the live crawler very slow and leads to unnecessary requests.
2024-12-27 20:54:42 +01:00
..
crawldata/format Merge branch 'master' into live-search 2024-11-21 16:00:20 +01:00
CrawledDomainReader.java (*) Remove the crawl spec abstraction 2024-10-03 13:41:17 +02:00
CrawlerOutputFile.java (chore) Remove use of deprecated STR.-style string templates 2024-11-11 18:02:28 +01:00
SerializableCrawlDataStream.java (live-crawler) Improve live crawler short-circuit logic 2024-12-27 20:54:42 +01:00