MarginaliaSearch/code/processes/website-adjacencies-calculator
Viktor Lofgren 66c1281301 (zk-registry) epic jak shaving WIP
Cleaning out a lot of old junk from the code, and one thing lead to another...

* Build is improved, now constructing docker images with 'jib'.  Clean build went from 3 minutes to 50 seconds.
* The ProcessService's spawning is smarter.  Will now just spawn a java process instead of relying on the application plugin's generated outputs.
* Project is migrated to GraalVM
* gRPC clients are re-written with a neat fluent/functional style. e.g.
```channelPool.call(grpcStub::method)
              .async(executor) // <-- optional
              .run(argument);
```
This change is primarily to allow handling ManagedChannel errors, but it turned out to be a pretty clean API overall.
* For now the project is all in on zookeeper
* Service discovery is now based on APIs and not services.  Theoretically means we could ship the same code either a monolith or a service mesh.
* To this end, began modularizing a few of the APIs so that they aren't strongly "living" in a service.  WIP!

Missing is documentation and testing, and some more breaking apart of code.
2024-02-22 14:01:23 +01:00
..
src (zk-registry) epic jak shaving WIP 2024-02-22 14:01:23 +01:00
build.gradle (zk-registry) epic jak shaving WIP 2024-02-22 14:01:23 +01:00
readme.md (executor-service) Embed dist/ in executor-service's docker image 2023-10-19 17:48:34 +02:00

Website Adjacencies Calculator

This job updates the website similarity table based on the data in the domain and links-tables in the URL database.

It performs a brute force cosine similarity calculation across the entire link graph.

These adjacencies power the explorer service and random websites-functionality.