Previously a special db table was used to hold domains slated for crawling, but this is deprecated, and instead now each domain has a node_affinity flag that decides its indexing state, where a value of -1 indicates it shouldn't be crawled, a value of 0 means it's slated for crawling by the next index partition to be crawled, and a positive value means it's assigned to an index partition.
The change set also adds a test case validating the modified behavior.
Still missing is a proper build, we're currently pulling in tailwind from a CDN, which is no bueno in prod.
There's also a lot of polish remaining everywhere, dead links, etc.
This update eliminates all occurrences of the OpenNLP token model from the setup script, configuration, and test files, as this model file is no longer used.
The change also restructures the internal API a bit, moving resultsFromDomain from RpcRawResultItem into RpcDecoratedResultItem, as the previous order was driving complexity in the code that generates these objects, and the consumer side of things puts all this data in the same object regardless.
Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one.
While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's *modul*ar. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.