MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 21:29:00 +00:00

Author	SHA1	Message	Date
Viktor Lofgren	51e46ad2b0	(refac) Move export tasks to a process and clean up process initialization for all ProcessMainClass descendents Since some of the export tasks have been memory hungry, sometimes killing the executor-services, they've been moved to a separate process that can be given a larger Xmx. While doing this, the ProcessMainClass was given utilities for the boilerplate surrounding receiving mq requests and responding to them, some effort was also put toward making the process boot process a bit more uniform. It's still a bit heterogeneous between different processes, but a bit less so for now.	2024-11-21 16:00:09 +01:00
Viktor Lofgren	47dfbacb00	(conf) Introduce a new concept of node profiles Node profiles decide which actors are started, and which views are available in the control GUI. This helps keep the system organized, and hides real-time clutter from the batch-oriented nodes.	2024-11-20 18:15:22 +01:00
Viktor Lofgren	a91ab4c203	(live-crawler) Crude first-try process for live crawling #WIP Some refactoring is still needed, but an dummy actor is in place and a process that crawls URLs from the livecapture service's RSS endpoints; that makes it all the way to being indexable.	2024-11-19 19:35:01 +01:00
Viktor Lofgren	a456ec9599	(feed) Use the message queue to permit the feeds service to tell the calling actor when it's finished	2024-11-10 18:30:28 +01:00
Viktor Lofgren	a2bc9a98c0	(feed) Use the message queue to permit the feeds service to tell the calling actor when it's finished	2024-11-10 17:45:20 +01:00
Viktor Lofgren	e24a98390c	(feed) Update API to allow specifying clean vs refresh update Move the logic deciding which operation to perform into the actor, updating its state graph to incorporate a counter that runs a clean update once in a blue moon.	2024-11-09 18:43:47 +01:00
Viktor Lofgren	6f858cd627	(feed) Decrease update interval to 24 hours	2024-11-09 18:17:51 +01:00
Viktor Lofgren	bfeb9a4538	(feeds) Retire feedlot the feed bot, move RSS capture into the live-capture service	2024-11-09 17:56:43 +01:00
Viktor Lofgren	938431e514	(scrape-feeds-actor) Add deduplication of insertion data To avoid unnecessary db churn, the domains to be added are put in a set instead of a list, ensuring that they are unique.	2024-09-28 14:41:14 +02:00
Viktor Lofgren	b2de3c70fa	(scrape-feeds-actor) Add explicit commit in case it's disabled	2024-09-28 14:36:57 +02:00
Viktor Lofgren	596a7fb4ea	(actor) Disable the feed scraper on all nodes but the first	2024-09-28 12:36:16 +02:00
Viktor Lofgren	c3f726a01f	(actor) Add a feed scraping actor Add a new actor that polls an URL every 6 hours and amends the domain database with any unseen domains, flagging them to be crawled by the next crawl job. The URLs are specified in data/scrape-urls.txt. If this file is absent, the actor shuts down.	2024-09-28 12:33:29 +02:00
Viktor Lofgren	1d34224416	(refac) Remove src/main from all source code paths. Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one. While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's modular. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.	2024-02-23 16:13:40 +01:00

13 Commits