MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-23 21:18:58 +00:00

Author	SHA1	Message	Date
Viktor Lofgren	569520c9b6	(index) Add manual adjustments for rankings based on domain	2025-01-21 15:07:43 +01:00
Viktor Lofgren	a1fb92468f	(refac) Remove ResultRankingParameters, QueryLimits class and use protobuf classes directly instead This is primarily to make the code a bit easier to reason about, and will reduce the level of indirection and data copying in the search-servi->query-service->index-service communication chain.	2025-01-08 16:15:57 +01:00
Viktor Lofgren	9f47ce8d15	(chore) Remove lombok There are likely some instances of delombok gore with this commit.	2024-11-11 21:14:38 +01:00
Viktor Lofgren	8e78286068	Merge branch 'master' into term-positions	2024-09-17 15:20:46 +02:00
Viktor Lofgren	50ec922c2b	(index) Fix broken index tests Also cleaned up the tests to be less fragile to ranking algorithm changes.	2024-09-10 10:23:46 +02:00
Viktor Lofgren	cfbbeaa26e	(ranking) Clean up ranking test code	2024-09-08 15:46:51 +02:00
Viktor Lofgren	abab5bdc8a	(index, EXPERIMENTAL) Evaluate using Varint instead of GCS for position data	2024-08-26 14:20:39 +02:00
Viktor Lofgren	96bcf03ad5	(index) Address broken tests They are still broken, but less so.	2024-08-25 10:34:36 +02:00
Viktor Lofgren	03d5dec24c	(*) Refactor termCoherences and rename them to phrase constraints.	2024-08-15 11:02:19 +02:00
Viktor Lofgren	2e89b55593	(wip) Repair qdebug utility and show new ranking details	2024-08-09 12:57:25 +02:00
Viktor Lofgren	7babdb87d5	(index) Remove intermediate models	2024-08-07 10:10:44 +02:00
Viktor Lofgren	b316b55be9	(index) Experimental initial integration of document spans into index	2024-07-30 12:01:53 +02:00
Viktor Lofgren	34703da144	(slop) Support for nested array types and array-of-object types Also adding very basic support for filtered reads via SlopTable. This is probably not a final design.	2024-07-29 14:00:43 +02:00
Viktor Lofgren	aebb2652e8	(wip) Extract and encode spans data Refactoring keyword extraction to extract spans information. Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions. This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work.	2024-07-27 11:44:13 +02:00
Viktor Lofgren	5c098005cc	(index) Fix broken test Expected behavior changed since the ranking algorithm now takes into account the number of positions of the keyword, and the test loader was previously modified to generate positions based on prime factors of the document id.	2024-07-16 12:37:59 +02:00
Viktor Lofgren	dfd19b5eb9	(index) Reduce the number of abstractions around result ranking The change also restructures the internal API a bit, moving resultsFromDomain from RpcRawResultItem into RpcDecoratedResultItem, as the previous order was driving complexity in the code that generates these objects, and the consumer side of things puts all this data in the same object regardless.	2024-07-16 08:18:54 +02:00
Viktor Lofgren	1ab875a75d	(test) Correcting flaky tests Also changing the inappropriate usage of ReverseIndexPrioFileNames for the full index in test code.	2024-07-11 16:13:23 +02:00
Viktor Lofgren	abf7a8d78d	(coded-sequence) Correct implementation of Elias gamma Also clean up the code a bit as the EliasGammaCodec class was an iterator, and it was leaking abstraction details.	2024-07-10 14:28:28 +02:00
Viktor Lofgren	85c99ae808	(index-reverse) Split index construction into separate packages for full and priority index	2024-07-06 15:44:47 +02:00
Viktor Lofgren	6973712480	(query) Tidy up code	2024-06-26 13:40:06 +02:00
Viktor Lofgren	9d00243d7f	(index) Partial re-implementation of position constraints	2024-06-24 15:55:54 +02:00
Viktor Lofgren	36160988e2	(index) Integrate positions data with indexes WIP This change integrates the new positions data with the forward and reverse indexes. The ranking code is still only partially re-written.	2024-06-10 15:09:06 +02:00
Viktor Lofgren	9f982a0c3d	(index) Integrate positions file properly	2024-06-06 16:45:42 +02:00
Viktor Lofgren	4a8afa6b9f	(index, WIP) Position data partially integrated with forward and reverse indexes. There's no graceful way of doing this in small commits, pushing to avoid the risk of data loss.	2024-06-06 12:54:52 +02:00
Viktor Lofgren	ed250f57f2	(ranking) Set regularMask correctly	2024-04-19 14:31:57 +02:00
Viktor Lofgren	def607d840	(qs) Additional info in query debug UI	2024-04-19 11:46:27 +02:00
Viktor Lofgren	c5ab0a9054	(index) Add jaccard index term to boost results based on term overlap	2024-04-17 16:50:26 +02:00
Viktor Lofgren	7fa3e86e64	(index) Remove dead code Since the performance fix in `3359f72239` had a huge positive impact without reducing result quality, it's possible to remove the QueryBranchWalker and associated code.	2024-04-16 19:59:27 +02:00
Viktor Lofgren	41fa154aa6	(test) Fix broken test	2024-04-16 19:48:14 +02:00
Viktor Lofgren	599e719ad4	(index) Fix priority search terms This functionality fell into disrepair some while ago. It's supposed to allow non-mandatory search terms that boost the ranking if they are present in the document.	2024-04-15 16:44:08 +02:00
Viktor Lofgren	b6d365bacd	(index) Clean up data model The change set cleans up the data model for the term-level data. This used to contain a bunch of fields with document-level metadata. This data-duplication means a larger memory footprint and worse memory locality. The ranking code is also modified to not accept SearchResultKeywordScores, but rather CompiledQueryLong and CqDataInts containing only the term metadata and the frequency information needed for ranking. This is again an effort to improve memory locality.	2024-04-15 16:04:07 +02:00
Viktor Lofgren	81815f3e0a	(qs, index) New query model integrated with index service. Seems to work, tests are green and initial testing finds no errors. Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.	2024-04-04 20:17:58 +02:00
Viktor Lofgren	46423612e3	(refac) Merge service-discovery and service modules Also adds a few tests to the server/client code.	2024-03-03 10:49:23 +01:00
Viktor Lofgren	d78e9e715f	(misc) Fix broken tests	2024-02-28 12:12:43 +01:00
Viktor Lofgren	9f1649636e	Clean up documentation and rename `domain-links` to `link-graph`	2024-02-28 11:40:39 +01:00
Viktor Lofgren	427f3e922f	(index) Retire count operation, clean up index code.	2024-02-27 21:22:17 +01:00
Viktor Lofgren	fc00701a1e	(index) Experimental refactoring of the indexing functionality	2024-02-25 11:05:10 +01:00
Viktor Lofgren	1d34224416	(refac) Remove src/main from all source code paths. Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one. While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's modular. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.	2024-02-23 16:13:40 +01:00

38 Commits