Commit Graph

5 Commits

Author SHA1 Message Date
Viktor Lofgren
8a81a480a1 (ngram) Only extract frequencies of title words, but use the body to increment the counters...
The sign of the counter is used to indicate whether a term has appeared as title.  Until it's seen in the title, it's provisionally saved as a negative count.
2024-04-12 18:08:31 +02:00
Viktor Lofgren
6a67043537 (ngram) Clean up ngram lexicon code
This is both an optimization that removes some GC churn, as well as a clean-up of the code that removes references to outdated concepts.
2024-04-12 17:45:06 +02:00
Viktor Lofgren
bb6b51ad91 (ngram) Fix index range in NgramLexicon to an avoid exception 2024-04-12 10:13:25 +02:00
Viktor Lofgren
b7d9a7ae89 (ngrams) Remove the vestigial logic for capturing permutations of n-grams
The change also reduces the object churn in NGramLexicon, as this is a very hot method in the converter.
2024-04-11 18:12:01 +02:00
Viktor Lofgren
0bd3365c24 (convert) Initial integration of segmentation data into the converter's keyword extraction logic 2024-03-19 14:28:42 +01:00