Commit Graph

  • 44d6bc71b7 (assistant) Migrate to Jooby framework master Viktor Lofgren 2025-02-15 13:28:12 +0100
  • 9d302e2973 (assistant) Migrate to Jooby framework Viktor Lofgren 2025-02-15 13:26:04 +0100
  • f553701224 (assistant) Migrate to Jooby framework Viktor Lofgren 2025-02-15 13:21:48 +0100
  • f076d05595 (deps) Upgrade slf4j to latest deploy-0088 Viktor Lofgren 2025-02-15 12:50:16 +0100
  • b513809710 (*) Stopgap fix for metrics server initialization errors bringing down services deploy-0087 Viktor Lofgren 2025-02-14 17:09:48 +0100
  • 7519b28e21 (search) Correct exception from misbehaving bots feeding invalid urls deploy-0086 Viktor Lofgren 2025-02-14 17:05:24 +0100
  • 3eac4dd57f (search) Correct exception in error handler when page is missing deploy-0085 Viktor Lofgren 2025-02-14 17:00:21 +0100
  • 4c2810720a (search) Add redirect handler for full URLs in the /site endpoint deploy-0084 Viktor Lofgren 2025-02-14 16:31:11 +0100
  • 8480ba8daa (live-capture) Code cleanup Viktor Lofgren 2025-02-04 14:05:36 +0100
  • fbba392491 (live-capture) Send a UA-string from the browserless fetcher as well deploy-0083 Viktor Lofgren 2025-02-04 13:36:49 +0100
  • 530eb35949 (update-rss) Do not fail the feed fetcher control actor if it takes a long time to complete. Viktor Lofgren 2025-02-03 11:35:32 +0100
  • c2dd2175a2 (search) Add new query expansion rule contracting WORD NUM pairs into WORD-NUM and WORDNUM deploy-0082 Viktor Lofgren 2025-02-01 13:13:30 +0100
  • b8581b0f56 (crawler) Safe sanitization of headers during warc->slop conversion Viktor Lofgren 2025-01-31 12:47:42 +0100
  • 2ea34767d8 (crawler) Use the response URL when resolving relative links Viktor Lofgren 2025-01-31 12:40:13 +0100
  • e9af838231 (actor) Fix migration actor final steps deploy-0081 Viktor Lofgren 2025-01-30 11:48:21 +0100
  • ae0cad47c4 (actor) Utility method for getting a json prototype for actor states Viktor Lofgren 2025-01-29 15:20:25 +0100
  • 5fbc8ef998 (misc) Tidying Viktor Lofgren 2025-01-29 15:17:04 +0100
  • 32c6dd9e6a (actor) Delete old data in the migration actor Viktor Lofgren 2025-01-29 14:51:46 +0100
  • 6ece6a6cfb (actor) Improve resilience for the migration actor Viktor Lofgren 2025-01-29 14:43:09 +0100
  • 39cd1c18f8 Automatically run npm install tailwindcss@3 via setup.sh, as the new default version of the package is incompatible with the project Viktor Lofgren 2025-01-29 12:19:37 +0100
  • eb65daaa88
    Merge pull request #151 from Lionstiger/master Viktor 2025-01-28 21:49:50 +0100
  • 0bebdb6e33
    Merge branch 'master' into master Viktor 2025-01-28 21:49:36 +0100
  • 1e50e392c6 (actor) Improve logging and error handling for data migration actor deploy-0080 Viktor Lofgren 2025-01-28 15:34:36 +0100
  • fb673de370 (crawler) Change the header 'User-agent' to 'User-Agent' Viktor Lofgren 2025-01-28 15:34:16 +0100
  • eee73ab16c (crawler) Be more lenient when performing a domain probe Viktor Lofgren 2025-01-28 15:24:30 +0100
  • 5354e034bf (search) Minor grammar fix deploy-0079 Viktor Lofgren 2025-01-27 18:36:31 +0100
  • 72384ad6ca
    fix small grammar error Magnus Wulf 2025-01-27 15:04:57 +0100
  • a2b076f9be (converter) Add progress tracking for big domains in converter deploy-0078 Viktor Lofgren 2025-01-26 18:03:59 +0100
  • c8b0a32c0f (crawler) Reduce long retention of CrawlDataReference objects and their associated SerializableCrawlDataStreams deploy-0077 Viktor Lofgren 2025-01-26 15:40:17 +0100
  • f0d74aa3bb (converter) Fix close() ordering to prevent converter crash deploy-0076 Viktor Lofgren 2025-01-26 14:47:36 +0100
  • 74a1f100f4 (converter) Refactor to remove CrawledDomainReader and move its functionality into SerializableCrawlDataStream Viktor Lofgren 2025-01-26 14:46:50 +0100
  • eb049658e4 (converter) Add truncation att the parser step to prevent the converter from spending too much time on excessively large documents deploy-0075 Viktor Lofgren 2025-01-26 14:28:53 +0100
  • db138b2a6f (converter) Add truncation att the parser step to prevent the converter from spending too much time on exessively large documents deploy-0074 Viktor Lofgren 2025-01-26 14:25:57 +0100
  • 1673fc284c (converter) Reduce lock contention in converter by separating the processing of full and simple-track domains deploy-0073 Viktor Lofgren 2025-01-26 13:21:46 +0100
  • 503ea57d5b (converter) Reduce lock contention in converter by separating the processing of full and simple-track domains deploy-0072 Viktor Lofgren 2025-01-26 13:18:14 +0100
  • 18ca926c7f (converter) Truncate excessively long strings in SentenceExtractor, malformed data was effectively DOS:ing the converter deploy-0071 Viktor Lofgren 2025-01-26 12:52:54 +0100
  • db99242db2 (converter) Adding some logging around the simple processing track to investigate an issue with the converter stalling deploy-0070 Viktor Lofgren 2025-01-26 12:02:00 +0100
  • 2b9d2985ba (doc) Update readme with up-to-date install instructions. Viktor Lofgren 2025-01-24 18:51:41 +0100
  • eeb6ecd711 (search) Make it clearer that the affiliate marker applies to the result, and not the search engine's relation to the result. deploy-0069 Viktor Lofgren 2025-01-24 18:50:00 +0100
  • 1f58aeadbf (build) Upgrade JIB Viktor Lofgren 2025-01-24 18:49:28 +0100
  • 3d68be64da (crawler) Add default CT when it's missing for icons Viktor Lofgren 2025-01-22 13:55:47 +0100
  • 668f3b16ef (search) Redirect ^/site/$ to /site deploy-0068 Viktor Lofgren 2025-01-22 13:35:18 +0100
  • 98a340a0d1 (crawler) Add favicon data to domain state db in its own table Viktor Lofgren 2025-01-22 11:41:20 +0100
  • 8862100f7e (crawler) Improve logging and error handling deploy-0067 Viktor Lofgren 2025-01-21 21:44:21 +0100
  • 274941f6de (crawler) Smarter parquet->slop crawl data migration Viktor Lofgren 2025-01-21 21:26:12 +0100
  • abec83582d Fix refactoring gore deploy-0066 Viktor Lofgren 2025-01-21 15:08:04 +0100
  • 569520c9b6 (index) Add manual adjustments for rankings based on domain Viktor Lofgren 2025-01-21 15:07:43 +0100
  • 088310e998 (converter) Improve simple processing performance Viktor Lofgren 2025-01-21 14:13:33 +0100
  • 270cab874b
    Merge pull request #134 from MarginaliaSearch/slop-crawl-data-spike Viktor 2025-01-21 13:34:22 +0100
  • 4c74e280d3 (crawler) Fix urlencoding in sitemap fetcher Viktor Lofgren 2025-01-21 13:33:35 +0100
  • 5b347e17ac (crawler) Automatically migrate to slop from parquet when crawling Viktor Lofgren 2025-01-21 13:33:14 +0100
  • 55d6ab933f Merge branch 'master' into slop-crawl-data-spike Viktor Lofgren 2025-01-21 12:50:12 +0100
  • 43b74e9706 (crawler) Fix exception handler and resource leak in WarcRecorder Viktor Lofgren 2025-01-20 23:45:28 +0100
  • 579a115243 (crawler) Reduce log spam from error handling in new sitemap fetcher deploy-0065 Viktor Lofgren 2025-01-20 23:17:13 +0100
  • 2c67f50a43
    Merge pull request #150 from MarginaliaSearch/httpclient-in-crawler Viktor 2025-01-20 19:35:30 +0100
  • 78a958e2b0 (crawler) Fix broken test that started failing after the search engine moved to a new domain httpclient-in-crawler Viktor Lofgren 2025-01-20 18:52:14 +0100
  • 4e939389b2 (crawler) New Jsoup based sitemap parser Viktor Lofgren 2025-01-20 14:37:44 +0100
  • e67a9bdb91 (crawler) Migrate away from using OkHttp in the crawler, use Java's HttpClient instead. Viktor Lofgren 2025-01-19 15:07:11 +0100
  • 567e4e1237 (crawler) Fast detection and bail-out for crawler traps deploy-0064 Viktor Lofgren 2025-01-18 15:28:54 +0100
  • 4342e42722 (crawler) Fast detection and bail-out for crawler traps deploy-0063 Viktor Lofgren 2025-01-17 13:02:57 +0100
  • bc818056e6 (run) Fix templates for mariadb Viktor Lofgren 2025-01-16 15:27:02 +0100
  • de2feac238 (chore) Upgrade jib from 3.4.3 to 3.4.4 Viktor Lofgren 2025-01-16 15:10:45 +0100
  • 1e770205a5 (search) Dyslexia fix deploy-0062 Viktor Lofgren 2025-01-12 20:39:59 +0100
  • e34230c25b (search) Dyslexia fix deploy-0061 Viktor Lofgren 2025-01-12 20:39:59 +0100
  • e44ecd6d69
    Merge pull request #149 from MarginaliaSearch/vlofgren-patch-1 Viktor 2025-01-12 20:38:36 +0100
  • 5b93a0e633
    Update ROADMAP.md Viktor 2025-01-12 20:38:11 +0100
  • 08fb0e5efe
    Update ROADMAP.md Viktor 2025-01-12 20:37:43 +0100
  • bcf67782ea
    Update ROADMAP.md Viktor 2025-01-12 20:37:09 +0100
  • ef3f175ede (search) Don't clobber the search query URL with default values deploy-0060 Viktor Lofgren 2025-01-10 15:57:30 +0100
  • bbe4b5d9fd Revert experimental changes Viktor Lofgren 2025-01-10 15:51:28 +0100
  • c67a635103 (search, experimental) Add a few debugging tracks to the search UI deploy-0059 Viktor Lofgren 2025-01-10 15:44:44 +0100
  • 20b24133fb (search, experimental) Add a few debugging tracks to the search UI deploy-0058 Viktor Lofgren 2025-01-10 15:34:48 +0100
  • f2567677e8 (index-client) Clean up index client code Viktor Lofgren 2025-01-10 15:17:07 +0100
  • bc2c2061f2 (index-client) Clean up index client code deploy-0057 Viktor Lofgren 2025-01-10 15:14:42 +0100
  • 1c7f5a31a5 (search) Further reduce the number of db queries by adding more caching to DbDomainQueries. deploy-0056 Viktor Lofgren 2025-01-10 14:17:29 +0100
  • 59a8ea60f7 (search) Further reduce the number of db queries by adding more caching to DbDomainQueries. deploy-0055 Viktor Lofgren 2025-01-10 14:15:22 +0100
  • aa9b1244ea (search) Reduce the number of db queries a bit by caching data that doesn't change too often deploy-0054 Viktor Lofgren 2025-01-10 13:56:04 +0100
  • 2d17233366 (search) Reduce the number of db queries a bit by caching data that doesn't change too often deploy-0053 Viktor Lofgren 2025-01-10 13:53:56 +0100
  • b245cc9f38 (search) Reduce the number of db queries a bit by caching data that doesn't change too often deploy-0052 Viktor Lofgren 2025-01-10 13:46:19 +0100
  • 6614d05bdf (db) Make db pool size configurable deploy-0051 Viktor Lofgren 2025-01-09 20:20:51 +0100
  • 55aeb03c4a (feeds) Replace rssreader based parsing with a custom jsoup based rss parser deploy-0050 Viktor Lofgren 2025-01-09 18:29:55 +0100
  • faa589962f (live-capture) Browserless now requires a token Viktor Lofgren 2025-01-09 14:51:11 +0100
  • c7edd6b39f (live-capture) Browserless now requires a token deploy-0049 Viktor Lofgren 2025-01-09 14:46:05 +0100
  • 79da622e3b (search) Update front page with new banner about move deploy-0048 Viktor Lofgren 2025-01-08 21:38:19 +0100
  • 3da8337ba6 (feeds) Add system property for exporting fetched feeds to a slop table for debugging deploy-0047 Viktor Lofgren 2025-01-08 20:45:25 +0100
  • a32d230f0a (special) Trigger deployment deploy-0046 Viktor Lofgren 2025-01-08 20:07:54 +0100
  • 3772bfd387 (query) Fix handling of optional ranking parameters Viktor Lofgren 2025-01-08 17:11:22 +0100
  • 02a7900d1a (search) Correct search-in-title toggle in search UI Viktor Lofgren 2025-01-08 16:51:10 +0100
  • a1fb92468f (refac) Remove ResultRankingParameters, QueryLimits class and use protobuf classes directly instead deploy-0045 Viktor Lofgren 2025-01-08 16:15:57 +0100
  • b7f0a2a98e (search-service) Fix metrics for errors and request times deploy-0044 Viktor Lofgren 2025-01-08 14:10:43 +0100
  • 5fb76b2e79 (search-service) Fix metrics for errors and request times deploy-0043 Viktor Lofgren 2025-01-08 14:06:03 +0100
  • ad8c97f342 (search-service) Begin replacement of the crawl queue mechanism with node_affinity flagging Viktor Lofgren 2025-01-08 13:25:56 +0100
  • dc1b6373eb (search-service) Clean up readme deploy-0042 Viktor Lofgren 2025-01-08 13:04:39 +0100
  • 983d6d067c (search-service) Add indexing indicator to sibling domains listing Viktor Lofgren 2025-01-08 12:58:34 +0100
  • a84a06975c (ranking-params) Add disable penalties flag to ranking params deploy-0041 Viktor Lofgren 2025-01-08 00:16:49 +0100
  • d2864c13ec (query-params) Add additional permitted query params Viktor Lofgren 2025-01-07 20:21:44 +0100
  • 03ba53ce51 (legacy-search) Update nav bar with correct links deploy-0040 Viktor Lofgren 2025-01-07 17:44:52 +0100
  • d4a6684931 (specialization) Soften length requirements for wiki-specialized documents (incl. cppreference) deploy-0039 Viktor Lofgren 2025-01-07 15:53:25 +0100
  • 6f0485287a
    Merge pull request #145 from MarginaliaSearch/cppreference_fixes deploy-0038 deploy-0037 Viktor 2025-01-07 15:43:19 +0100
  • 59e2dd4c26 (specialization) Soften length requirements for wiki-specialized documents (incl. cppreference) deploy-0036 Viktor Lofgren 2025-01-07 15:41:30 +0100