mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-24 05:18:58 +00:00
![]() * Encyclopedia sideloader; permit providing base URL. * Storage base shows node id in GUI * ProcessLivenessMonitorActor restarts automatically * Clean-up of outbox code |
||
---|---|---|
.. | ||
src | ||
build.gradle | ||
readme.md |
Converting Process
The converting process reads crawl data and extracts information to be fed into the index, such as keywords, metadata, urls, descriptions...
Central Classes
- ConverterMain orchestrates the conversion process.
- DocumentProcessor converts a single document.
-
- HtmlDocumentProcessorPlugin has HTML-specific logic related to a document, keywords and identifies features such as whether it has javascript.
-
- PlainTextDocumentProcessorPlugin has plain text-specific logic related to a document...
- DomainProcessor converts each document and generates domain-wide metadata such as link graphs.