mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 05:18:58 +00:00

History

Viktor Lofgren 22b35d5d91 (sentence-extractor) Add tag information to document language data Decorates DocumentSentences with information about which HTML tags they are nested in, and removes some redundant data on this rather memory hungry object. Separator information is encoded as a bit set instead of an array of integers. The change also cleans up the SentenceExtractor class a fair bit. It no longer extracts ngrams, and a significant amount of redundant operations were removed as well. This is still a pretty unpleasant class to work in, but this is the first step in making it a little bit better.		2024-07-18 15:57:48 +02:00
..
java/nu/marginalia/keyword	(sentence-extractor) Add tag information to document language data	2024-07-18 15:57:48 +02:00
test/nu/marginalia	(sentence-extractor) Add tag information to document language data	2024-07-18 15:57:48 +02:00
test-resources/test-data	(refac) Remove src/main from all source code paths.	2024-02-23 16:13:40 +01:00
build.gradle	(converter) Amend existing modifications to use gamma coded positions lists	2024-05-30 14:20:36 +02:00
readme.md	(docs) Begin un-fucking the docs after refactoring	2024-02-27 21:22:21 +01:00

readme.md

Keyword Extraction

This code deals with identifying keywords in a document, their positions in the document, their important based on TF-IDF and their grammatical functions based on POS tags.

readme.md

Keyword Extraction

Central Classes

See Also