MarginaliaSearch/code/processes/crawling-process/java/nu/marginalia/crawl
2024-12-26 14:13:17 +01:00
..
fetcher (crawler) Correct content type probing to only run on URLs that are suspected to be binary 2024-12-26 14:13:17 +01:00
logic (crawler) Clean up the crawler code a bit, removing vestigial abstractions and historical debris 2024-10-15 17:27:59 +02:00
retreival (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00
warc (refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents 2024-11-21 16:00:09 +01:00
CrawlerMain.java (refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents 2024-11-21 16:00:09 +01:00
CrawlerModule.java (chore) Remove lombok 2024-11-11 21:14:38 +01:00