MarginaliaSearch/code/processes/crawling-process/java/nu/marginalia/crawl
Viktor Lofgren ec600b967d (crawler) Adjust domain locking
Turns out throttling to only 1 lock per domain means the crawler chokes hard on large hosting websites such as wordpress.  Giving these a slightly larger allowance.
2024-07-27 11:54:46 +02:00
..
retreival (crawler) Adjust domain locking 2024-07-27 11:54:46 +02:00
spec (crawler) Modify crawl set growth to grow small domains faster than larger ones 2024-04-27 17:36:27 +02:00
warc (crawler) Code quality 2024-04-22 15:37:35 +02:00
AbortMonitor.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
CrawlerMain.java (crawler) Adjust domain locking 2024-07-27 11:54:46 +02:00
CrawlerModule.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00