MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 05:18:58 +00:00

History

Viktor Lofgren 2ea34767d8 (crawler) Use the response URL when resolving relative links The crawler was incorrectly using the request URL as the base URL when resolving relative links. This caused problems when encountering redirects. For example if we fetch /log, redirecting to /log/ and find links to foo/, and bar/; these would resolve to /foo and /bar, and not /log/foo and /log/bar.		2025-01-31 12:40:13 +01:00
..
revisit	(crawler) Reduce long retention of CrawlDataReference objects and their associated SerializableCrawlDataStreams	2025-01-26 15:40:17 +01:00
CrawlDataReference.java	(crawler) Reduce long retention of CrawlDataReference objects and their associated SerializableCrawlDataStreams	2025-01-26 15:40:17 +01:00
CrawlDelayTimer.java	(live-crawler) Crude first-try process for live crawling #WIP	2024-11-19 19:35:01 +01:00
CrawlerRetreiver.java	(crawler) Use the response URL when resolving relative links	2025-01-31 12:40:13 +01:00
CrawlerWarcResynchronizer.java	(crawler) Refactor	2024-09-23 17:51:07 +02:00
DomainCrawlFrontier.java	(crawler) Clean up the crawler code a bit, removing vestigial abstractions and historical debris	2024-10-15 17:27:59 +02:00
DomainProber.java	(crawler) Refactor boundary between CrawlerRetreiver and HttpFetcherImpl	2024-09-24 15:08:22 +02:00