MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 21:29:00 +00:00

Author	SHA1	Message	Date
Viktor Lofgren	2db0e446cb	(search) Absorb SearchQueryIndexService into SearchOperator, and clean up SearchOperator	2024-08-22 11:49:29 +02:00
Viktor Lofgren	557bdaa694	(search) Clean up SearchQueryIndexService and surrounding code	2024-08-22 11:45:28 +02:00
Viktor Lofgren	9eb1f120fc	(index) Repair positions bitmask for search result presentation	2024-08-22 11:28:23 +02:00
Viktor Lofgren	285e657f68	Merge branch 'master' into term-positions # Conflicts: # code/processes/crawling-process/java/nu/marginalia/crawl/CrawlerMain.java # code/processes/crawling-process/java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java	2024-07-31 10:44:01 +02:00
Viktor Lofgren	f19148132a	(search) Restrict site-search by passing domain id along with the site:-term This will help these queries deal with domains that do not have a subdomain so that they do not drag up subdomains as well, as they are also given the special site:-keyword for their corresponding parent domain.	2024-07-30 21:41:07 +02:00
Viktor Lofgren	aebb2652e8	(wip) Extract and encode spans data Refactoring keyword extraction to extract spans information. Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions. This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work.	2024-07-27 11:44:13 +02:00
Viktor Lofgren	dfd19b5eb9	(index) Reduce the number of abstractions around result ranking The change also restructures the internal API a bit, moving resultsFromDomain from RpcRawResultItem into RpcDecoratedResultItem, as the previous order was driving complexity in the code that generates these objects, and the consumer side of things puts all this data in the same object regardless.	2024-07-16 08:18:54 +02:00
Viktor Lofgren	4a8afa6b9f	(index, WIP) Position data partially integrated with forward and reverse indexes. There's no graceful way of doing this in small commits, pushing to avoid the risk of data loss.	2024-06-06 12:54:52 +02:00
Viktor Lofgren	d01d9fa670	(search) Add screenreader-specific notification remark about when search results start.	2024-05-04 11:41:06 +02:00
Viktor Lofgren	a53a32f006	(search) Spell out website problems with "atomic elements" instead of having a hover that's inaccessible with keyboard navigation	2024-05-04 11:41:05 +02:00
Viktor Lofgren	3548d54cf6	(search) Add a screenreader-only alert when the search filters are updated to make it easier to understand what happens.	2024-05-04 11:41:04 +02:00
Viktor Lofgren	2840d9d403	(search) Add screenreader-only positions count text to search results	2024-05-04 11:41:03 +02:00
Viktor Lofgren	2ad0bfda1e	(*) Fix boot orchestration for the services This corrects an annoying bug that had the system crash and burn on first start-up due to a race condition in service initialization, where the services were attempting to access the database before it was properly migrated. A fix was in principle already in place, but it was running too late and did not prevent attempts to access the as-yet uninitialized database. Move the first boot check into the MainClass instead of the Service constructor. The change also adds more appropriate docker dependencies to the services to fix rare errors resolving the hostname of the database.	2024-05-01 12:39:48 +02:00
Viktor Lofgren	4772e0b59d	(service) Deprecate /public prefix on HTTP Before the gRPC migration, the system would serve both public and internal requests over HTTP, but distinguish the two using path prefixes and a few HTTP Headers (X-Public, X-Context) added by the reverse proxy to prevent misconfigurations. Since internal requests meaningfully no longer use HTTP, this convention is just an obstacle now, adding the need to always run the system behind a reverse proxy that rewrites the paths. The change removes the path prefix, and updates the docker templates to reflect the change. This will require a migration for existing systems.	2024-04-30 14:46:18 +02:00
Viktor Lofgren	6690e9bde8	(service) Ensure the service discovery starts early This is necessary as we use zookeeper to orchestrate first-time startup of the services, to ensure that the database is properly migrated by the control service before anything else is permitted to start.	2024-04-25 15:08:33 +02:00
Viktor Lofgren	6efc0f21fe	(index) Clean up data model The change set cleans up the data model for the term-level data. This used to contain a bunch of fields with document-level metadata. This data-duplication means a larger memory footprint and worse memory locality. The ranking code is also modified to not accept SearchResultKeywordScores, but rather CompiledQueryLong and CqDataInts containing only the term metadata and the frequency information needed for ranking. This is again an effort to improve memory locality.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	4fb86ac692	(search) Fix outdated assumptions about the results We no longer break the query into "sets" of search terms and need to adapt the code to not use this assumption. For the API service, we'll simulate the old behavior to keep the API stable. For the search service, we'll introduce a new way of calculating positions through tree aggregation.	2024-04-24 14:44:38 +02:00
Viktor Lofgren	6cba6aef3b	(minor) Remove dead code	2024-04-24 14:44:38 +02:00
Viktor Lofgren	a3a6d6292b	(qs, index) New query model integrated with index service. Seems to work, tests are green and initial testing finds no errors. Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.	2024-04-24 14:44:38 +02:00
Viktor Lofgren	46423612e3	(refac) Merge service-discovery and service modules Also adds a few tests to the server/client code.	2024-03-03 10:49:23 +01:00
Viktor Lofgren	8d0af9548b	(search) Bot mitigation Add the ability to indicate to the search service that a request is malicious, and to poison the results by providing randomly reorered old results instead.	2024-02-27 21:22:19 +01:00
Viktor Lofgren	427f3e922f	(index) Retire count operation, clean up index code.	2024-02-27 21:22:17 +01:00
Viktor Lofgren	1d34224416	(refac) Remove src/main from all source code paths. Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one. While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's modular. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.	2024-02-23 16:13:40 +01:00

23 Commits