MarginaliaSearch/code/processes/crawling-process/java/nu/marginalia/crawl
2025-01-21 21:26:12 +01:00
..
fetcher (crawler) Fix urlencoding in sitemap fetcher 2025-01-21 13:33:35 +01:00
logic (crawler) Clean up the crawler code a bit, removing vestigial abstractions and historical debris 2024-10-15 17:27:59 +02:00
retreival Merge branch 'master' into slop-crawl-data-spike 2025-01-21 13:32:58 +01:00
warc (refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents 2024-11-21 16:00:09 +01:00
CrawlerMain.java (crawler) Smarter parquet->slop crawl data migration 2025-01-21 21:26:12 +01:00
CrawlerModule.java (chore) Remove lombok 2024-11-11 21:14:38 +01:00
DomainStateDb.java (crawler) Improved feed discovery, new domain state db per crawlset 2024-12-26 15:05:52 +01:00