MarginaliaSearch/code/processes/crawling-process/model/java/nu/marginalia
Viktor Lofgren 47e58a21c6 Refactor documentBody method and ContentType charset handling
Updated the `documentBody` method to improve parsing retries and error handling. Refactored `ContentType` charset processing with cleaner logic, removing redundant handling for unsupported charsets. Also, updated the version of the `slop` library in dependency settings.
2024-12-17 17:11:37 +01:00
..
io Add loader for slop data in converter. 2024-12-17 15:40:24 +01:00
model Refactor documentBody method and ContentType charset handling 2024-12-17 17:11:37 +01:00
parquet/crawldata (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00
slop Switch to new Slop format for crawl data storage and processing. 2024-12-15 19:34:03 +01:00
ContentTypes.java (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00