mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-24 21:29:00 +00:00
14 lines
609 B
Markdown
14 lines
609 B
Markdown
# Keyword Extraction
|
|
|
|
This code deals with identifying keywords in a document, their positions in the document,
|
|
their important based on [TF-IDF](https://en.wikipedia.org/wiki/Tf-idf) and their grammatical
|
|
functions based on [POS tags](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html).
|
|
|
|
## Central Classes
|
|
|
|
* [DocumentKeywordExtractor](java/nu/marginalia/keyword/DocumentKeywordExtractor.java)
|
|
* [KeywordMetadata](java/nu/marginalia/keyword/KeywordMetadata.java)
|
|
|
|
## See Also
|
|
|
|
* [libraries/language-processing](../../libraries/language-processing) does a lot of the heavy lifting. |