Viktor Lofgren
8a81a480a1
(ngram) Only extract frequencies of title words, but use the body to increment the counters...
...
The sign of the counter is used to indicate whether a term has appeared as title. Until it's seen in the title, it's provisionally saved as a negative count.
2024-04-12 18:08:31 +02:00
Viktor Lofgren
6a67043537
(ngram) Clean up ngram lexicon code
...
This is both an optimization that removes some GC churn, as well as a clean-up of the code that removes references to outdated concepts.
2024-04-12 17:45:06 +02:00
Viktor Lofgren
bb6b51ad91
(ngram) Fix index range in NgramLexicon to an avoid exception
2024-04-12 10:13:25 +02:00
Viktor Lofgren
b7d9a7ae89
(ngrams) Remove the vestigial logic for capturing permutations of n-grams
...
The change also reduces the object churn in NGramLexicon, as this is a very hot method in the converter.
2024-04-11 18:12:01 +02:00
Viktor Lofgren
0bd3365c24
(convert) Initial integration of segmentation data into the converter's keyword extraction logic
2024-03-19 14:28:42 +01:00