MarginaliaSearch/code/processes/crawling-process/java/nu/marginalia/crawl/retreival
Viktor Lofgren e1c9313396 (crawler) Emulate if-modified-since for domains that don't support the header
This will help reduce the strain on some server software, in particular Discourse.
2024-04-24 14:44:39 +02:00
..
fetcher (crawler) Emulate if-modified-since for domains that don't support the header 2024-04-24 14:44:39 +02:00
revisit (crawler) Code quality 2024-04-24 14:44:39 +02:00
sitemap (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
Cookies.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
CrawlDataReference.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
CrawlDelayTimer.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
CrawledDocumentFactory.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
CrawlerRetreiver.java (crawler) Use the probe-result to reduce the likelihood of crawling both http and https 2024-04-24 14:44:39 +02:00
CrawlerWarcResynchronizer.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
DomainCrawlFrontier.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
DomainProber.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
LinkFilterSelector.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
RateLimitException.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00