MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 05:18:58 +00:00

Author	SHA1	Message	Date
Viktor Lofgren	d84a2c183f	(*) Remove the crawl spec abstraction The crawl spec abstraction was used to upload lists of domains into the system for future crawling. This was fairly clunky, and it was difficult to understand what was going to be crawled. Since a while back, a new domains listing view has been added to the control view that allows direct access to the domains table. This is much preferred and means the operator can directly manage domains without specs. This commit removes the crawl spec abstraction from the code, and changes the GUI to direct to the domains list instead.	2024-10-03 13:41:17 +02:00
Viktor Lofgren	23cce0c78a	Add a new function 'Live Capture' for on-demand screenshot capture The screenshots are requested by the site-service, and triggered via the site-info view.	2024-09-27 13:46:34 +02:00
Viktor Lofgren	73f973cc06	(search-query) Add pagination to search query API and the direct query-service interface	2024-09-25 14:20:59 +02:00
Viktor Lofgren	8f367d96f8	Merge branch 'master' into term-positions # Conflicts: # code/index/java/nu/marginalia/index/results/model/ids/TermIdList.java # code/processes/converting-process/java/nu/marginalia/converting/ConverterMain.java # code/processes/crawling-process/java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java # code/processes/crawling-process/java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java # code/processes/crawling-process/model/java/nu/marginalia/io/crawldata/CrawledDomainReader.java # code/processes/crawling-process/test/nu/marginalia/crawling/HttpFetcherTest.java # code/processes/crawling-process/test/nu/marginalia/crawling/retreival/CrawlerMockFetcherTest.java # code/services-application/search-service/java/nu/marginalia/search/svc/SearchQueryIndexService.java	2024-09-08 10:14:43 +02:00
Viktor Lofgren	ab6a4b1749	(control) Correct id value for domain addition tool	2024-09-01 12:25:15 +02:00
Viktor Lofgren	aeeb1d0cb7	(control) Add utility for adding domains from an external URL	2024-09-01 12:14:21 +02:00
Viktor Lofgren	b1bfe6f76e	(control) New view for domains Add capability to assign domains, and bulk-add new domains.	2024-08-30 17:06:48 +02:00
Viktor Lofgren	74e25370ca	(control) New view for domains Still a work in progress, but at this point it's possible to use for viewing domains	2024-08-29 15:40:40 +02:00
Viktor Lofgren	9aa8f13731	(index) Remove tcfAvgDist ranking parameter This is captured by tcfProximity already	2024-08-25 11:20:19 +02:00
Viktor Lofgren	b09e2dbeb7	(build) Fix dependency churn from testcontainers Apparently you need to pull in commons-codec now in order to run testcontainers, through spooky action at a distance.	2024-08-25 10:35:48 +02:00
Viktor Lofgren	0999f07320	(search-query) Add new ranking parameters for proximity and verbatim matches	2024-08-25 10:34:12 +02:00
Viktor Lofgren	03d5dec24c	(*) Refactor termCoherences and rename them to phrase constraints.	2024-08-15 11:02:19 +02:00
Viktor Lofgren	4264fb9f49	(query-service) Clean up qdebug UI a bit	2024-08-10 09:51:03 +02:00
Viktor Lofgren	2e89b55593	(wip) Repair qdebug utility and show new ranking details	2024-08-09 12:57:25 +02:00
Viktor Lofgren	046ffc7752	(build) Upgrade jib to 3.4.3	2024-07-31 10:39:50 +02:00
Viktor Lofgren	6d7b886aaa	(converter) Correct sort order of files in control storage GUI Previously it was sorted on a field that would switch to just showing the time whenever the date was the same as the day's date, leading to a bizarre sort order where files created today was typically shown first, followed by the rest of the files with the oldest date first.	2024-07-30 19:43:27 +02:00
Viktor Lofgren	80900107f7	(restructure) Clean up repo by moving stray features into converter-process and crawler-process	2024-07-30 10:14:00 +02:00
Viktor Lofgren	aebb2652e8	(wip) Extract and encode spans data Refactoring keyword extraction to extract spans information. Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions. This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work.	2024-07-27 11:44:13 +02:00
Viktor	8ed5b51a32	Merge branch 'master' into term-positions	2024-07-15 07:05:31 +02:00
Viktor Lofgren	ad3857938d	(search-api, ranking) Update with new ranking parameters Adding new ranking parameters to the API and routing them through the system, in order to permit integration of the new position data with the ranking algorithm. The change also cleans out several parameters that no longer filled any function.	2024-07-15 04:49:40 +02:00
Viktor Lofgren	6401a513d7	(crawl) Fix onsubmit confirm dialog for single-site recrawl	2024-07-05 17:21:03 +02:00
Viktor Lofgren	d86926be5f	(crawl) Add new functionality for re-crawling a single domain	2024-07-05 15:31:55 +02:00
Viktor Lofgren	3faa5bf521	(search-query) Tidy up QueryGRPCService and IndexClient	2024-06-26 14:03:30 +02:00
Viktor Lofgren	d0d6bb173c	(control) Fix warc data http status filter default value	2024-06-17 12:40:25 +02:00
Viktor Lofgren	89aae93e60	(*) Lift jetty and guava-dependencies	2024-05-23 14:20:01 +02:00
Viktor Lofgren	59ec70eb73	(*) Clean up code related to crawl parquet inspection	2024-05-22 12:55:08 +02:00
Viktor Lofgren	365229991b	(control) Improve pagination for crawl data inspector	2024-05-21 19:44:48 +02:00
Viktor Lofgren	959a8e29ee	(control) Improve pagination for crawl data inspector	2024-05-21 19:27:25 +02:00
Viktor Lofgren	197c82acd4	(control) Add filter functionality for crawl data inspector	2024-05-21 19:05:44 +02:00
Viktor Lofgren	9539fdb53c	(control) Clean up UX for crawl data inspector	2024-05-21 18:27:24 +02:00
Viktor Lofgren	17dc00d05f	(control) Partial implementation of inspection utility for crawl data Uses duckdb and range queries to read the parquet files directly from the index partitions. UX is a bit rough but is in working order.	2024-05-20 18:02:46 +02:00
Viktor Lofgren	7d1cafc070	(control) Add skip link for navigation in control GUI	2024-05-04 12:36:44 +02:00
Viktor Lofgren	4021a0ae98	(search) Add en-US language tags to all templates	2024-05-04 11:40:59 +02:00
Viktor Lofgren	6087f9635c	(qs) Move index.html out of public directory It was put there to simulate the /public interface paradigm that is now deprecated.	2024-05-01 12:56:12 +02:00
Viktor Lofgren	2ad0bfda1e	(*) Fix boot orchestration for the services This corrects an annoying bug that had the system crash and burn on first start-up due to a race condition in service initialization, where the services were attempting to access the database before it was properly migrated. A fix was in principle already in place, but it was running too late and did not prevent attempts to access the as-yet uninitialized database. Move the first boot check into the MainClass instead of the Service constructor. The change also adds more appropriate docker dependencies to the services to fix rare errors resolving the hostname of the database.	2024-05-01 12:39:48 +02:00
Viktor Lofgren	908535a3a0	(single-service) Ensure single-service spawner can specify the node	2024-04-30 18:27:46 +02:00
Viktor Lofgren	4772e0b59d	(service) Deprecate /public prefix on HTTP Before the gRPC migration, the system would serve both public and internal requests over HTTP, but distinguish the two using path prefixes and a few HTTP Headers (X-Public, X-Context) added by the reverse proxy to prevent misconfigurations. Since internal requests meaningfully no longer use HTTP, this convention is just an obstacle now, adding the need to always run the system behind a reverse proxy that rewrites the paths. The change removes the path prefix, and updates the docker templates to reflect the change. This will require a migration for existing systems.	2024-04-30 14:46:18 +02:00
Viktor Lofgren	89889ecbbd	(single-service) Skip starting Prometheus if it's not explicitly enabled	2024-04-25 17:54:07 +02:00
Viktor Lofgren	4e5f069809	(build) Migrate ssr to the new root setting schema of java lang version	2024-04-25 15:08:56 +02:00
Viktor Lofgren	6690e9bde8	(service) Ensure the service discovery starts early This is necessary as we use zookeeper to orchestrate first-time startup of the services, to ensure that the database is properly migrated by the control service before anything else is permitted to start.	2024-04-25 15:08:33 +02:00
Viktor Lofgren	3952ef6ca5	(service) Let singleservice configure ports and bind addresses	2024-04-25 13:49:57 +02:00
Viktor Lofgren	32fe864a33	(build) Java 22 and its consequences has been a disaster for Marginalia Search Roll back to JDK 21 for now, and make Java version configurable in the root build.gradle The project has run into no less than three distinct show-stopping bugs in JDK22, across multiple vendors, and gradle still doesn't fully support it, meaning you need multiple JDK versions installed.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	b80a83339b	(qs) Additional info in query debug UI	2024-04-24 14:44:39 +02:00
Viktor Lofgren	eb74d08f2a	(qs) Additional info in query debug UI	2024-04-24 14:44:39 +02:00
Viktor Lofgren	e79ab0c70e	(qs) Basic query debug feature	2024-04-24 14:44:39 +02:00
Viktor Lofgren	6102fd99bf	(qs) Improve logging	2024-04-24 14:44:39 +02:00
Viktor Lofgren	212d101727	(control) GUI for exporting segmentation data from a wikipedia zim	2024-04-24 14:44:17 +02:00
Viktor Lofgren	f434a8b492	(build) Upgrade jib plugin version	2024-04-16 15:25:23 +02:00
Viktor Lofgren	d2658d6f84	(sys) Add springboard service that can spawn multiple different marginalia services to make distribution easier.	2024-04-16 13:25:15 +02:00
Viktor Lofgren	fe8d583fdd	(sys) Upgrade to JDK22 This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.	2024-03-21 14:27:13 +01:00

1 2 3 4 5 ...

433 Commits