MarginaliaSearch/code/features-crawl
Viktor Lofgren c92f1b8df8 (geo-ip) Revert removal of ip2location logic
We do both ip2location and ASN data.

The change also adds some keywords based on autonomous system information, on a somewhat experimental basis.  It would be neat to be able to e.g. exclude cloud services or just e.g. cloudflare from the search results.
2023-12-17 15:03:00 +01:00
..
content-type (warc) Clean up parquet conversion 2023-12-14 20:39:40 +01:00
crawl-blocklist (geo-ip) Revert removal of ip2location logic 2023-12-17 15:03:00 +01:00
link-parser (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00
readme.md Yet more restructuring. Improved search result ranking. 2023-03-16 21:35:54 +01:00

Crawl Features

These are bits of search-engine related code that are relatively isolated pieces of business logic, that benefit from the clarity of being kept separate from the rest of the crawling code.