MarginaliaSearch/code/processes/crawling-process/java/nu/marginalia/crawl/retreival
Viktor Lofgren a91ab4c203 (live-crawler) Crude first-try process for live crawling #WIP
Some refactoring is still needed, but an dummy actor is in place and a process that crawls URLs from the livecapture service's RSS endpoints; that makes it all the way to being indexable.
2024-11-19 19:35:01 +01:00
..
revisit (crawler) Clean up the crawler code a bit, removing vestigial abstractions and historical debris 2024-10-15 17:27:59 +02:00
sitemap (crawler) Refactor 2024-09-23 17:51:07 +02:00
CrawlDataReference.java (crawler) Use a better hashInt implementation in CrawlDataReference 2024-10-15 18:25:55 +02:00
CrawlDelayTimer.java (live-crawler) Crude first-try process for live crawling #WIP 2024-11-19 19:35:01 +01:00
CrawlerRetreiver.java (crawler) Clean up the crawler code a bit, removing vestigial abstractions and historical debris 2024-10-15 17:27:59 +02:00
CrawlerWarcResynchronizer.java (crawler) Refactor 2024-09-23 17:51:07 +02:00
DomainCrawlFrontier.java (crawler) Clean up the crawler code a bit, removing vestigial abstractions and historical debris 2024-10-15 17:27:59 +02:00
DomainProber.java (crawler) Refactor boundary between CrawlerRetreiver and HttpFetcherImpl 2024-09-24 15:08:22 +02:00