mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-23 21:18:58 +00:00
![]() Cache the Charset object returned from Charset.forName() for future use, since we're likely to see the same charset again and Charset.forName(...) can be surprisingly expensive and its built-in caching strategy, which just caches the 2 last values seen doesn't cope well with how we're hitting it with a wide array of random charsets |
||
---|---|---|
.. | ||
content-type | ||
crawl-blocklist | ||
link-parser | ||
readme.md |
Crawl Features
These are bits of search-engine related code that are relatively isolated pieces of business logic, that benefit from the clarity of being kept separate from the rest of the crawling code.
- content-type - Content Type identification
- crawl-blocklist - IP and URL blocklists
- link-parser - Code for parsing and normalizing links