MarginaliaSearch/code/process-models/crawling-model/src
Viktor Lofgren b74a3ebd85 (crawler) WIP integration of WARC files into the crawler process.
At this stage, the crawler will use the WARCs to resume a crawl if it terminates incorrectly.

This is a WIP commit, since the warc files are not fully incorporated into the work flow, they are deleted after the domain is crawled.

The commit also includes fairly invasive refactoring of the crawler classes, to accomplish better separation of concerns.
2023-12-11 19:32:58 +01:00
..
main/java (crawler) WIP integration of WARC files into the crawler process. 2023-12-11 19:32:58 +01:00