mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-23 21:18:58 +00:00
![]() Refactoring keyword extraction to extract spans information. Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions. This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work. |
||
---|---|---|
.. | ||
adblock | ||
anchor-keywords | ||
data-extractors | ||
keyword-extraction | ||
pubdate | ||
reddit-json | ||
stackexchange-xml | ||
summary-extraction | ||
topic-detection | ||
readme.md |
Converter Features
Major features
- keyword-extraction - Identifies keywords to index in a document
- summary-extraction - Generate an excerpt/quote from a website to display on the search results page.
Smaller features:
- adblock - Simulates Adblock
- pubdate - Determines when a document was published
- topic-detection - Tries to identify the topic of a website