(term-frequency) Fix concurrency issues in SentenceExtractor and TermFrequencyExporter

How'd This Ever Work? (tm)

TermFrequencyExporter was using Math.clamp() incorrectly, and SentenceExtractor was synchronizing on its own instance when initializing shared static members, causing rare issues when spinning multiple SE:s up at once.
This commit is contained in:
Viktor Lofgren 2024-07-15 05:15:30 +02:00
parent ad3857938d
commit fa162698c2
2 changed files with 2 additions and 2 deletions

View File

@ -54,7 +54,7 @@ public class TermFrequencyExporter implements ExporterIf {
TLongIntHashMap counts = new TLongIntHashMap(100_000_000, 0.7f, -1, -1);
AtomicInteger docCount = new AtomicInteger();
SimpleBlockingThreadPool sjp = new SimpleBlockingThreadPool("exporter", Math.clamp(2, 16, Runtime.getRuntime().availableProcessors() / 2), 4);
SimpleBlockingThreadPool sjp = new SimpleBlockingThreadPool("exporter", Math.clamp(Runtime.getRuntime().availableProcessors() / 2, 2, 16), 4);
Path crawlerLogFile = inputDir.resolve("crawler.log");
for (var item : WorkLog.iterable(crawlerLogFile)) {

View File

@ -59,7 +59,7 @@ public class SentenceExtractor {
logger.error("Could not initialize sentence detector", ex);
}
synchronized (this) {
synchronized (SentenceExtractor.class) {
if (ngramLexicon == null) {
ngramLexicon = new NgramLexicon(models);
}