mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-23 21:18:58 +00:00
![]() Refactoring keyword extraction to extract spans information. Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions. This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work. |
||
---|---|---|
.. | ||
java/nu/marginalia/language | ||
resources/dictionary | ||
test/nu/marginalia/language | ||
test-resources/html | ||
build.gradle | ||
readme.md |
Language Processing
This library contains various tools used in language processing.
Central Classes
- SentenceExtractor - Creates a DocumentLanguageData from a text, containing its words, how they stem, POS tags, and so on.
See Also
features-convert/keyword-extraction uses this code to identify which keywords are important.