MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 05:18:58 +00:00

History

Viktor Lofgren 895cee7004 (crawler) Improved feed discovery, new domain state db per crawlset Feed discover is improved with by probing a few likely endpoints when no feed link tag is provided. To store the feed URLs, a sqlite database is added to each crawlset that stores a simple summary of the crawl job, including any feed URLs that have been discovered. Solves issue #135		2024-12-26 15:05:52 +01:00
..
retreival	(crawler) Improved feed discovery, new domain state db per crawlset	2024-12-26 15:05:52 +01:00
DomainCrawlerRobotsTxtTest.java	(refac) Remove src/main from all source code paths.	2024-02-23 16:13:40 +01:00
HttpFetcherTest.java	(chore) Remove lombok	2024-11-11 21:14:38 +01:00
LinkParserTest.java	(refac) Remove src/main from all source code paths.	2024-02-23 16:13:40 +01:00
RssCrawlerTest.java	(refac) Remove src/main from all source code paths.	2024-02-23 16:13:40 +01:00