mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-24 05:18:58 +00:00
![]() Refactoring keyword extraction to extract spans information. Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions. This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work. |
||
---|---|---|
.. | ||
java/nu/marginalia/keyword | ||
test/nu/marginalia | ||
test-resources/test-data | ||
build.gradle | ||
readme.md |
Keyword Extraction
This code deals with identifying keywords in a document, their positions in the document, their important based on TF-IDF and their grammatical functions based on POS tags.
Central Classes
See Also
- libraries/language-processing does a lot of the heavy lifting.