Viktor Lofgren
a0b3634cb6
(ngram) Only extract frequencies of title words, but use the body to increment the counters...
...
The sign of the counter is used to indicate whether a term has appeared as title. Until it's seen in the title, it's provisionally saved as a negative count.
2024-04-24 14:44:39 +02:00
Viktor Lofgren
150ee21f3c
(ngram) Clean up ngram lexicon code
...
This is both an optimization that removes some GC churn, as well as a clean-up of the code that removes references to outdated concepts.
2024-04-24 14:44:38 +02:00
Viktor Lofgren
a0d9e66ff7
(ngram) Fix index range in NgramLexicon to an avoid exception
2024-04-24 14:44:38 +02:00
Viktor Lofgren
7dd8c78c6b
(ngrams) Remove the vestigial logic for capturing permutations of n-grams
...
The change also reduces the object churn in NGramLexicon, as this is a very hot method in the converter.
2024-04-24 14:44:38 +02:00
Viktor Lofgren
6a7a7009c7
(convert) Initial integration of segmentation data into the converter's keyword extraction logic
2024-04-24 14:44:17 +02:00