mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-24 05:18:58 +00:00
![]() The crawler was incorrectly using the request URL as the base URL when resolving relative links. This caused problems when encountering redirects. For example if we fetch /log, redirecting to /log/ and find links to foo/, and bar/; these would resolve to /foo and /bar, and not /log/foo and /log/bar. |
||
---|---|---|
.. | ||
fetcher | ||
logic | ||
retreival | ||
warc | ||
CrawlerMain.java | ||
CrawlerModule.java | ||
DomainStateDb.java |