MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 13:19:02 +00:00

Author	SHA1	Message	Date
Viktor Lofgren	47dfbacb00	(conf) Introduce a new concept of node profiles Node profiles decide which actors are started, and which views are available in the control GUI. This helps keep the system organized, and hides real-time clutter from the batch-oriented nodes.	2024-11-20 18:15:22 +01:00
Viktor Lofgren	a91ab4c203	(live-crawler) Crude first-try process for live crawling #WIP Some refactoring is still needed, but an dummy actor is in place and a process that crawls URLs from the livecapture service's RSS endpoints; that makes it all the way to being indexable.	2024-11-19 19:35:01 +01:00
Viktor Lofgren	a456ec9599	(feed) Use the message queue to permit the feeds service to tell the calling actor when it's finished	2024-11-10 18:30:28 +01:00
Viktor Lofgren	a2bc9a98c0	(feed) Use the message queue to permit the feeds service to tell the calling actor when it's finished	2024-11-10 17:45:20 +01:00
Viktor Lofgren	e24a98390c	(feed) Update API to allow specifying clean vs refresh update Move the logic deciding which operation to perform into the actor, updating its state graph to incorporate a counter that runs a clean update once in a blue moon.	2024-11-09 18:43:47 +01:00
Viktor Lofgren	6f858cd627	(feed) Decrease update interval to 24 hours	2024-11-09 18:17:51 +01:00
Viktor Lofgren	bfeb9a4538	(feeds) Retire feedlot the feed bot, move RSS capture into the live-capture service	2024-11-09 17:56:43 +01:00
Viktor Lofgren	938431e514	(scrape-feeds-actor) Add deduplication of insertion data To avoid unnecessary db churn, the domains to be added are put in a set instead of a list, ensuring that they are unique.	2024-09-28 14:41:14 +02:00
Viktor Lofgren	b2de3c70fa	(scrape-feeds-actor) Add explicit commit in case it's disabled	2024-09-28 14:36:57 +02:00
Viktor Lofgren	596a7fb4ea	(actor) Disable the feed scraper on all nodes but the first	2024-09-28 12:36:16 +02:00
Viktor Lofgren	c3f726a01f	(actor) Add a feed scraping actor Add a new actor that polls an URL every 6 hours and amends the domain database with any unseen domains, flagging them to be crawled by the next crawl job. The URLs are specified in data/scrape-urls.txt. If this file is absent, the actor shuts down.	2024-09-28 12:33:29 +02:00
Viktor Lofgren	1d34224416	(refac) Remove src/main from all source code paths. Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one. While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's modular. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.	2024-02-23 16:13:40 +01:00

12 Commits