mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-24 13:19:02 +00:00
14 lines
598 B
Markdown
14 lines
598 B
Markdown
# Crawl Features
|
|
|
|
These are bits of search-engine related code that are relatively isolated pieces of business logic,
|
|
that benefit from the clarity of being kept separate from the rest of the crawling code.
|
|
|
|
|
|
* [adblock](adblock/) - Simulates Adblock
|
|
* [pubdate](pubdate/) - Determines when a document was published
|
|
* [topic-detection](topic-detection/) - Tries to identify the topic of a website
|
|
|
|
* [crawl-blocklist](crawl-blocklist/) - IP and URL blocklists
|
|
* [work-log](work-log/) - Work journal for resuming long processes
|
|
* [link-parser](link-parser/) - Code for parsing and normalizing links
|