mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-24 13:19:02 +00:00
598 B
598 B
Crawl Features
These are bits of search-engine related code that are relatively isolated pieces of business logic, that benefit from the clarity of being kept separate from the rest of the crawling code.
-
adblock - Simulates Adblock
-
pubdate - Determines when a document was published
-
topic-detection - Tries to identify the topic of a website
-
crawl-blocklist - IP and URL blocklists
-
work-log - Work journal for resuming long processes
-
link-parser - Code for parsing and normalizing links