MarginaliaSearch/code/processes/crawling-process/ft-link-parser
Viktor Lofgren 3b99cffb3d (link-parser) Filter out URLs with binary file suffixes in LinkParser
Added an additional filter step to ensure URLs with binary suffixes are excluded during crawling. This prevents unnecessary processing of non-HTML content, improving the efficiency of the link parsing process.
2024-12-11 16:42:47 +01:00
..
java/nu/marginalia/link_parser (link-parser) Filter out URLs with binary file suffixes in LinkParser 2024-12-11 16:42:47 +01:00
build.gradle (restructure) Clean up repo by moving stray features into converter-process and crawler-process 2024-07-30 10:14:00 +02:00
readme.md (restructure) Clean up repo by moving stray features into converter-process and crawler-process 2024-07-30 10:14:00 +02:00

Link Parser

Deals with the various cases in link parsing, such as relative links, internal links, external links, pathological links, etc.

Central Classes