MarginaliaSearch/code/processes/crawling-process/java/nu/marginalia/crawl
Viktor Lofgren bae44497fe (crawler) Add a new system property crawler.maxFetchSize
This gives the same upper limit to the live crawler and the big boy crawler, though the live crawler will reject items too large, and the big crawler will truncate at that point.
2024-12-30 15:10:11 +01:00
..
fetcher (crawler) Add a new system property crawler.maxFetchSize 2024-12-30 15:10:11 +01:00
logic (crawler) Clean up the crawler code a bit, removing vestigial abstractions and historical debris 2024-10-15 17:27:59 +02:00
retreival (crawler) Correct feed URLs in domain state db 2024-12-26 15:18:31 +01:00
warc (refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents 2024-11-21 16:00:09 +01:00
CrawlerMain.java (crawler) Improved feed discovery, new domain state db per crawlset 2024-12-26 15:05:52 +01:00
CrawlerModule.java (chore) Remove lombok 2024-11-11 21:14:38 +01:00
DomainStateDb.java (crawler) Improved feed discovery, new domain state db per crawlset 2024-12-26 15:05:52 +01:00