mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-24 05:18:58 +00:00
![]() ... also move some common configuration into the root build.gradle-file. Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work. This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory. |
||
---|---|---|
.. | ||
src | ||
build.gradle | ||
readme.md |
Crawling Process
The crawling process downloads HTML and saves them into per-domain snapshots.
Central Classes
- CrawlerMain orchestrates the crawling.
- CrawlerRetreiver visits known addresses from a domain and downloads each document.
- HttpFetcher fetches a URL.