Node profiles decide which actors are started, and which views are available in the control GUI. This helps keep the system organized, and hides real-time clutter from the batch-oriented nodes.
Some refactoring is still needed, but an dummy actor is in place and a process that crawls URLs from the livecapture service's RSS endpoints; that makes it all the way to being indexable.
Move the logic deciding which operation to perform into the actor, updating its state graph to incorporate a counter that runs a clean update once in a blue moon.
Add a new actor that polls an URL every 6 hours and amends the domain database with any unseen domains, flagging them to be crawled by the next crawl job.
The URLs are specified in data/scrape-urls.txt. If this file is absent, the actor shuts down.
Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one.
While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's *modul*ar. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.