Commit Graph

  • 462aa9af26 (query) Update ranking parameters with new variables for bm25 ngrams and tcf mutual jaccard Viktor Lofgren 2024-04-18 10:36:15 +0200
  • a09c84e1b8 (query) Modify tokenizer to match the behavior of the sentence extractor Viktor Lofgren 2024-04-17 17:54:32 +0200
  • 44b33798f3 (index) Clean up jaccard index term code and down-tune the parameter's importance a bit Viktor Lofgren 2024-04-17 17:40:16 +0200
  • 2f0b648fad (index) Add jaccard index term to boost results based on term overlap Viktor Lofgren 2024-04-17 16:50:26 +0200
  • de0e56f027 (index) Remove position overlap check, coherences will do the work instead Viktor Lofgren 2024-04-17 14:20:01 +0200
  • 973ced7b13 (index) Omit absent terms from coherence checks Viktor Lofgren 2024-04-17 14:12:16 +0200
  • cb4b824a85 (index) Split ngram and regular keyword bm25 calculation and add ngram score as a bonus Viktor Lofgren 2024-04-17 14:04:35 +0200
  • c583a538b1 (search) Add implicit coherence constraints based on segmentation Viktor Lofgren 2024-04-17 14:03:35 +0200
  • e0224085b4 (index) Improve recall for small queries Viktor Lofgren 2024-04-16 22:51:03 +0200
  • 44c1e1d6d9 (index) Remove dead code Viktor Lofgren 2024-04-16 19:59:27 +0200
  • c620e9c026 (index) Experimental performance regression fix Viktor Lofgren 2024-04-16 19:43:14 +0200
  • 1bb88968c5 (test) Fix broken test Viktor Lofgren 2024-04-16 19:44:51 +0200
  • df75e8f4aa (index) Explicitly free LongQueryBuffers Viktor Lofgren 2024-04-16 19:23:00 +0200
  • adf846bfd2 (index) Fix term coherence evaluation Viktor Lofgren 2024-04-16 18:07:43 +0200
  • 1748fcc5ac (valuation) Impose stronger constraints on locality of terms Viktor Lofgren 2024-04-16 17:22:58 +0200
  • 08416393e0 (valuation) Impose stronger constraints on locality of terms Viktor Lofgren 2024-04-16 17:15:21 +0200
  • fce26015c9 (encyclopedia) Index the full articles Viktor Lofgren 2024-04-16 12:10:13 +0200
  • 155be1078d (index) Fix priority search terms Viktor Lofgren 2024-04-15 16:44:08 +0200
  • 6efc0f21fe (index) Clean up data model Viktor Lofgren 2024-04-15 16:04:07 +0200
  • f3255e080d (ngram) Grab titles separately when extracting ngrams from wiki data Viktor Lofgren 2024-04-13 19:34:16 +0200
  • 0da03d4cfc (zim) Fix title extractor Viktor Lofgren 2024-04-13 19:33:47 +0200
  • 5f6a3ef9d0 (ngram) Correct |s|^|s|-normalization to use length and not count Viktor Lofgren 2024-04-13 18:05:30 +0200
  • afc4fed591 (ngram) Correct size value in ngram lexicon generation, trim the terms better Viktor Lofgren 2024-04-13 17:51:02 +0200
  • cb505f98ef (ngram) Use simple blocking pool instead of FJP; split on underscores in article names. Viktor Lofgren 2024-04-13 17:07:23 +0200
  • a0b3634cb6 (ngram) Only extract frequencies of title words, but use the body to increment the counters... Viktor Lofgren 2024-04-12 18:08:31 +0200
  • e23359bae9 (query, minor) Remove debug statement Viktor Lofgren 2024-04-12 17:52:55 +0200
  • 5531ed632a (query, minor) Remove debug statement Viktor Lofgren 2024-04-12 17:45:26 +0200
  • 150ee21f3c (ngram) Clean up ngram lexicon code Viktor Lofgren 2024-04-12 17:45:06 +0200
  • c96da0ce1e (segmentation) Pick best segmentation using |s|^|s|-style normalization Viktor Lofgren 2024-04-12 17:44:14 +0200
  • a0d9e66ff7 (ngram) Fix index range in NgramLexicon to an avoid exception Viktor Lofgren 2024-04-12 10:13:25 +0200
  • 55f627ed4c (index) Clean up the code Viktor Lofgren 2024-04-11 18:50:21 +0200
  • 7dd8c78c6b (ngrams) Remove the vestigial logic for capturing permutations of n-grams Viktor Lofgren 2024-04-11 18:12:01 +0200
  • 8bf7d090fd (qs) Clean up parsing code using new record matching Viktor Lofgren 2024-04-11 17:20:13 +0200
  • 6bfe04b609 (term-freq-exporter) Reduce thread count and memory usage Viktor Lofgren 2024-04-10 17:11:23 +0200
  • 491d6bec46 (term-freq-exporter) Extract ngrams in term-frequency-exporter Viktor Lofgren 2024-04-10 16:58:05 +0200
  • 4fb86ac692 (search) Fix outdated assumptions about the results Viktor Lofgren 2024-04-07 11:24:30 +0200
  • 6cba6aef3b (minor) Remove dead code Viktor Lofgren 2024-04-06 14:34:15 +0200
  • 7e216db463 (index) Add origin trace information for index readers Viktor Lofgren 2024-04-06 13:28:14 +0200
  • adc90c8f1e (sentence-extractor) Fix resource leak in sentence extractor Viktor Lofgren 2024-04-05 18:52:58 +0200
  • e3316a3672 (index) Clean up new index query code Viktor Lofgren 2024-04-05 13:30:49 +0200
  • a3a6d6292b (qs, index) New query model integrated with index service. Viktor Lofgren 2024-04-04 20:17:58 +0200
  • 8cb9455c32 (qs, WIP) Fix edge cases in query compilation Viktor Lofgren 2024-03-29 12:40:27 +0100
  • dc65b2ee01 (qs, WIP) Clean up dead code Viktor Lofgren 2024-03-28 16:37:23 +0100
  • 98a1adbf81 (qs, WIP) Tidy it up a bit Viktor Lofgren 2024-03-28 14:18:26 +0100
  • 0bd1e15cce (qs, WIP) Tidy it up a bit Viktor Lofgren 2024-03-28 14:09:17 +0100
  • eda926767e (qs, WIP) Tidy it up a bit Viktor Lofgren 2024-03-28 13:54:30 +0100
  • cd1a18c045 (qs, WIP) Break up code and tidy it up a bit Viktor Lofgren 2024-03-28 13:26:54 +0100
  • 6f567fbea8 (qs, WIP) Fix output determinism, fix tests Viktor Lofgren 2024-03-28 13:11:26 +0100
  • 0ebadd03a5 (WIP) Query rendering finally beginning to look like it works Viktor Lofgren 2024-03-28 13:01:21 +0100
  • 2253b556b2 WIP Viktor Lofgren 2024-03-21 12:00:52 +0100
  • 6a7a7009c7 (convert) Initial integration of segmentation data into the converter's keyword extraction logic Viktor Lofgren 2024-03-19 14:28:42 +0100
  • 3c75057dcd (qs) Retire NGramBloomFilter, integrate new segmentation model instead Viktor Lofgren 2024-03-19 10:33:29 +0100
  • 212d101727 (control) GUI for exporting segmentation data from a wikipedia zim Viktor Lofgren 2024-03-18 13:45:23 +0100
  • 760b80659d (WIP) Partial integration of new query expansion code into the query-serivice Viktor Lofgren 2024-03-18 13:16:49 +0100
  • 04879c005d (WIP) Improve data extraction from wikipedia data Viktor Lofgren 2024-03-18 13:16:00 +0100
  • cb82927756 (WIP) Implement first take of new query segmentation algorithm Viktor Lofgren 2024-03-12 13:12:50 +0100
  • 8b9629f2f6 (crawler) Remove unnecessary double-fetch of the root document Viktor Lofgren 2024-04-24 14:38:59 +0200
  • f6db16b313 (crawler) Reduce log noise from timeouts in SoftIfModifiedSinceProber Viktor Lofgren 2024-04-24 14:10:03 +0200
  • 4668b1ddcb (build) Java 22 and its consequences has been a disaster for Marginalia Search Viktor Lofgren 2024-04-24 13:54:04 +0200
  • dcf9d9caad (crawler) Emulate if-modified-since for domains that don't support the header Viktor Lofgren 2024-04-22 17:26:31 +0200
  • 7a69b76001 (crawler) Remove accidental log spam Viktor Lofgren 2024-04-22 15:51:37 +0200
  • ac07ef822f (crawler) Code quality Viktor Lofgren 2024-04-22 15:37:35 +0200
  • e7d4bcd872 (crawler) Use the probe-result to reduce the likelihood of crawling both http and https Viktor Lofgren 2024-04-22 15:36:43 +0200
  • a28c6d7cfe (crawler) Strip W/-prefix from the etag when supplied as If-None-Match Viktor Lofgren 2024-04-22 14:31:05 +0200
  • d816f048f5 (crawler) Ensure all appropriate headers are recorded on the request Viktor Lofgren 2024-04-22 14:14:24 +0200
  • b09ddd0036 (crawler/converter) Remove legacy junk from parquet migration Viktor Lofgren 2024-04-22 12:34:28 +0200
  • 0a73b02a00 (query) Mark flaky test, correct assert on test Viktor Lofgren 2024-04-21 12:30:14 +0200
  • 8769704462 (ranking) TermCoherenceFactory should be run for size=2 queries Viktor Lofgren 2024-04-21 12:29:25 +0200
  • 214551f1df (converter) Stopgap fix for some cases of lost crawl data due to HTTP 304. The root cause needs further investigation. Viktor Lofgren 2024-04-19 20:36:01 +0200
  • 2cc74c005a (query) Always generate an ngram alternative, suppresses generation of multiple identical query branches Viktor Lofgren 2024-04-19 19:42:30 +0200
  • ed250f57f2 (ranking) Set regularMask correctly Viktor Lofgren 2024-04-19 14:31:57 +0200
  • e92c25f7e0 (ranking) Cleanup Viktor Lofgren 2024-04-19 14:13:12 +0200
  • 3ab563f314 (ranking) Suppress NaN:s in ranking output Viktor Lofgren 2024-04-19 13:58:28 +0200
  • 426338cb45 (ranking, bugfix) Use bm25NgramWeight and not full weight for bM25N Viktor Lofgren 2024-04-19 12:41:48 +0200
  • 5fa2375898 (index, bugfix) Pass url quality to query service Viktor Lofgren 2024-04-19 12:41:26 +0200
  • 41782a0ab5 (index) Fix TCF bug where the ngram terms would be considered instead of the regular ones due to a logical derp Viktor Lofgren 2024-04-19 12:19:26 +0200
  • 9b06433b82 (qs) Additional info in query debug UI Viktor Lofgren 2024-04-19 12:18:53 +0200
  • def607d840 (qs) Additional info in query debug UI Viktor Lofgren 2024-04-19 11:46:27 +0200
  • 2b811fb422 (qs) Basic query debug feature Viktor Lofgren 2024-04-19 11:00:56 +0200
  • 36cc62c10c (proto) Improve handling of omitted parameters Viktor Lofgren 2024-04-18 10:47:12 +0200
  • 975d92912c (qs) Improve logging Viktor Lofgren 2024-04-18 10:44:08 +0200
  • 8bbaf457de (query) Minor code cleanup Viktor Lofgren 2024-04-18 10:37:51 +0200
  • 7641a02f31 (query) Update ranking parameters with new variables for bm25 ngrams and tcf mutual jaccard Viktor Lofgren 2024-04-18 10:36:15 +0200
  • ce16239e34 (query) Modify tokenizer to match the behavior of the sentence extractor Viktor Lofgren 2024-04-17 17:54:32 +0200
  • d64bd227cf (index) Clean up jaccard index term code and down-tune the parameter's importance a bit Viktor Lofgren 2024-04-17 17:40:16 +0200
  • c5ab0a9054 (index) Add jaccard index term to boost results based on term overlap Viktor Lofgren 2024-04-17 16:50:26 +0200
  • dac948973d (index) Remove position overlap check, coherences will do the work instead Viktor Lofgren 2024-04-17 14:20:01 +0200
  • 9d008d1d6f (index) Omit absent terms from coherence checks Viktor Lofgren 2024-04-17 14:12:16 +0200
  • f52457213e (index) Split ngram and regular keyword bm25 calculation and add ngram score as a bonus Viktor Lofgren 2024-04-17 14:04:35 +0200
  • 579295a673 (search) Add implicit coherence constraints based on segmentation Viktor Lofgren 2024-04-17 14:03:35 +0200
  • af8ff8ce99 (index) Improve recall for small queries Viktor Lofgren 2024-04-16 22:51:03 +0200
  • 7fa3e86e64 (index) Remove dead code Viktor Lofgren 2024-04-16 19:59:27 +0200
  • 3359f72239 (index) Experimental performance regression fix Viktor Lofgren 2024-04-16 19:43:14 +0200
  • 41fa154aa6 (test) Fix broken test Viktor Lofgren 2024-04-16 19:44:51 +0200
  • deaba0152d (index) Explicitly free LongQueryBuffers Viktor Lofgren 2024-04-16 19:23:00 +0200
  • feaef6093e (index) Fix term coherence evaluation Viktor Lofgren 2024-04-16 18:07:43 +0200
  • 078fa4fdd0 (valuation) Impose stronger constraints on locality of terms Viktor Lofgren 2024-04-16 17:22:58 +0200
  • 2dc77a0638 (valuation) Impose stronger constraints on locality of terms Viktor Lofgren 2024-04-16 17:15:21 +0200
  • cfd9a7187f
    (query-segmentation) Merge pull request #89 from MarginaliaSearch/query-segmentation Viktor 2024-04-16 15:31:05 +0200
  • f434a8b492 (build) Upgrade jib plugin version Viktor Lofgren 2024-04-16 15:25:23 +0200