MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-23 21:18:58 +00:00

Author	SHA1	Message	Date
Viktor Lofgren	c2dd2175a2	(search) Add new query expansion rule contracting WORD NUM pairs into WORD-NUM and WORDNUM	2025-02-01 13:13:30 +01:00
Viktor Lofgren	3772bfd387	(query) Fix handling of optional ranking parameters	2025-01-08 17:11:22 +01:00
Viktor Lofgren	a1fb92468f	(refac) Remove ResultRankingParameters, QueryLimits class and use protobuf classes directly instead This is primarily to make the code a bit easier to reason about, and will reduce the level of indirection and data copying in the search-servi->query-service->index-service communication chain.	2025-01-08 16:15:57 +01:00
Viktor Lofgren	983d6d067c	(search-service) Add indexing indicator to sibling domains listing	2025-01-08 12:58:34 +01:00
Viktor Lofgren	a84a06975c	(ranking-params) Add disable penalties flag to ranking params This will help debugging ranking issues. Later it may be added to some filters.	2025-01-08 00:16:49 +01:00
Viktor Lofgren	7c90b6b414	(query) Don't blindly make tokens containing a colon into a non-ranking advice term	2025-01-07 15:18:05 +01:00
Viktor Lofgren	3e66767af3	(search) Adjust query parsing to trim tokens in quoted search terms Quoted search queries that contained keywords with possessive 's endings were not returning any results, as the index does not retain that suffix, and the query parser was not stripping it away in this code path. This solves issue #143.	2025-01-05 23:33:09 +01:00
Viktor Lofgren	78eb1417a7	(service) Only block on SingleNodeChannelPool creation in QueryClient The code was always blocking for up to 5s while waiting for the remote end to become available, meaning some services would stall for several seconds on start-up for no sensible reason. This should make most services start faster as a result.	2025-01-02 18:42:01 +01:00
Viktor Lofgren	5f576b7d0c	(query-parser) Strip leading underlines This addresses issue #140, where __builtin_ffs gives no results.	2025-01-02 14:39:03 +01:00
Viktor Lofgren	baeb4a46cd	(search) Reintroduce query rewriting for recipes, add rules for wikis and forums	2024-12-31 16:05:00 +01:00
Viktor Lofgren	cf7f84f033	(rank) Reduce the impact of domain rank bonus, and only apply it to cancel out negative penalties, never to increase the ranking	2024-12-10 22:04:12 +01:00
Viktor Lofgren	c97c66a41c	(ranking) Reduce the verbatim score multiplier	2024-11-28 13:37:11 +01:00
Viktor Lofgren	9eb16cb667	(test) Remove tests from fast suite Adding a new @Tag("flaky") for tests that do not reliably return successes. These may still be valuable during development, but should not run in CI. Also tagging a few of the slower tests with the old @Tag("slow"), to speed up the run-time.	2024-11-17 19:45:59 +01:00
Viktor Lofgren	e5db3f11e1	(chore) Clean up some of the uglier delomboking artifacts	2024-11-15 13:57:20 +01:00
Viktor Lofgren	9f47ce8d15	(chore) Remove lombok There are likely some instances of delombok gore with this commit.	2024-11-11 21:14:38 +01:00
Viktor Lofgren	a5b4951f23	(chore) Remove use of deprecated STR.-style string templates	2024-11-11 18:02:28 +01:00
Viktor Lofgren	89f7f3c17c	(query-parser) Fix regression where advice terms weren't parsed properly	2024-10-14 13:46:37 +02:00
Viktor Lofgren	2ee58f4bc9	(index) Adjust ranking parameters to dial down the importance of tcfProximity and firstPosition	2024-09-29 15:33:12 +02:00
Viktor Lofgren	1bd29a586c	(service-discovery) Add common base interface to all Grpc services To be able to tell service discovery whether to enable a service on a particular runtime, a common base interface DiscoverableService extends BindableService was added.	2024-09-27 13:46:34 +02:00
Viktor Lofgren	c757d116bf	(misc) Fix Broken Tests	2024-09-27 13:46:34 +02:00
Viktor Lofgren	4a0356e26f	(search-service) Add pagination support to the search GUI	2024-09-25 14:26:49 +02:00
Viktor Lofgren	73f973cc06	(search-query) Add pagination to search query API and the direct query-service interface	2024-09-25 14:20:59 +02:00
Viktor Lofgren	28e7c8e5e0	Increase temporal bias weight to give the recent results filter a bit more recency	2024-09-17 18:11:40 +02:00
Viktor Lofgren	99523ca079	(query-parser) Remove test that is no longer relevant	2024-09-10 10:35:56 +02:00
Viktor Lofgren	50ec922c2b	(index) Fix broken index tests Also cleaned up the tests to be less fragile to ranking algorithm changes.	2024-09-10 10:23:46 +02:00
Viktor Lofgren	50ba8fd099	(query-parsing) Correct handling of trailing parentheses	2024-09-03 11:45:14 +02:00
Viktor Lofgren	99b3b00b68	(query-parsing) Merge QueryTokenizer into QueryParser and add escaping of query grammar	2024-09-03 11:35:32 +02:00
Viktor Lofgren	f6d981761d	(query-parsing) Drop search term elements that aren't indexed by the search engine	2024-09-03 11:24:05 +02:00
Viktor Lofgren	8290c19e24	(query-parsing) Drop search term elements that aren't indexed by the search engine	2024-09-03 11:21:01 +02:00
Viktor Lofgren	bb5d946c26	(index, EXPERIMENTAL) Clean up ranking code	2024-08-29 11:34:23 +02:00
Viktor Lofgren	4fbcc02f96	(index) Adjust sensible defaults for ranking parameters	2024-08-25 11:24:16 +02:00
Viktor Lofgren	9aa8f13731	(index) Remove tcfAvgDist ranking parameter This is captured by tcfProximity already	2024-08-25 11:20:19 +02:00
Viktor Lofgren	0999f07320	(search-query) Add new ranking parameters for proximity and verbatim matches	2024-08-25 10:34:12 +02:00
Viktor Lofgren	5d2b455572	(search) Clean up inconsistent usage of MathClient in SearchOperator Also clean up SearchOperator and adjacent code	2024-08-24 10:39:31 +02:00
Viktor Lofgren	9eb1f120fc	(index) Repair positions bitmask for search result presentation	2024-08-22 11:28:23 +02:00
Viktor Lofgren	03d5dec24c	(*) Refactor termCoherences and rename them to phrase constraints.	2024-08-15 11:02:19 +02:00
Viktor Lofgren	016a4c62e1	(index) Bugs and error fixes, chasing and fixing mystery results that did not contain all relevant keywords	2024-08-10 09:51:03 +02:00
Viktor Lofgren	41da4f422d	(search-query) Always generate the "all"-segmentation	2024-08-09 13:20:00 +02:00
Viktor Lofgren	2e89b55593	(wip) Repair qdebug utility and show new ranking details	2024-08-09 12:57:25 +02:00
Viktor Lofgren	7babdb87d5	(index) Remove intermediate models	2024-08-07 10:10:44 +02:00
Viktor Lofgren	8462e88b8f	(index) Add min-dist factor and adjust rankings	2024-08-03 13:07:00 +02:00
Viktor Lofgren	b316b55be9	(index) Experimental initial integration of document spans into index	2024-07-30 12:01:53 +02:00
Viktor Lofgren	80900107f7	(restructure) Clean up repo by moving stray features into converter-process and crawler-process	2024-07-30 10:14:00 +02:00
Viktor Lofgren	aebb2652e8	(wip) Extract and encode spans data Refactoring keyword extraction to extract spans information. Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions. This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work.	2024-07-27 11:44:13 +02:00
Viktor Lofgren	22b35d5d91	(sentence-extractor) Add tag information to document language data Decorates DocumentSentences with information about which HTML tags they are nested in, and removes some redundant data on this rather memory hungry object. Separator information is encoded as a bit set instead of an array of integers. The change also cleans up the SentenceExtractor class a fair bit. It no longer extracts ngrams, and a significant amount of redundant operations were removed as well. This is still a pretty unpleasant class to work in, but this is the first step in making it a little bit better.	2024-07-18 15:57:48 +02:00
Viktor Lofgren	dfd19b5eb9	(index) Reduce the number of abstractions around result ranking The change also restructures the internal API a bit, moving resultsFromDomain from RpcRawResultItem into RpcDecoratedResultItem, as the previous order was driving complexity in the code that generates these objects, and the consumer side of things puts all this data in the same object regardless.	2024-07-16 08:18:54 +02:00
Viktor Lofgren	ad3857938d	(search-api, ranking) Update with new ranking parameters Adding new ranking parameters to the API and routing them through the system, in order to permit integration of the new position data with the ranking algorithm. The change also cleans out several parameters that no longer filled any function.	2024-07-15 04:49:40 +02:00
Viktor Lofgren	1ab875a75d	(test) Correcting flaky tests Also changing the inappropriate usage of ReverseIndexPrioFileNames for the full index in test code.	2024-07-11 16:13:23 +02:00
Viktor Lofgren	87e38e6181	(search-query) refac: Move query factory	2024-06-27 13:14:47 +02:00
Viktor Lofgren	f73fc8dd57	(search-query) Fix end-inclusion bug in QWordGraphIterator	2024-06-27 13:13:42 +02:00

1 2 3

111 Commits