Commit Graph

315 Commits

Author SHA1 Message Date
Viktor Lofgren
4c62065e74 (install) Add two separate templates for the install script
One template is for the full Marginalia Search style install, and the other is for a barebones install with no Marginalia-related fluff.
2024-01-13 18:27:42 +01:00
Viktor Lofgren
d28fc99119 (MainClass) ensure logging isn't loaded before service name is known
This causes logs all to have names like ${sys:service-name}, instead of the service name...
2024-01-13 18:19:50 +01:00
Viktor Lofgren
7c6e18f7a7 (*) Overhaul settings and properties
Use a system.properties file to configure the system.  This is loaded statically by MainClass or ProcessMainClass.  Update the property names to be more consistent, and update the documentations to reflect the changes.
2024-01-13 17:12:18 +01:00
Viktor Lofgren
ecd9c35233 (control) Clean up the event log
* Generate fewer uninteresting event messages.
* Display fewer irrelevant fields in the overview table.
2024-01-13 13:28:02 +01:00
Viktor Lofgren
8dea7217a6 (control) UX fixes, node GUI doesn't break when an executor service goes offline. 2024-01-13 12:17:30 +01:00
Viktor Lofgren
708a741960 (test) Clean up test usage of migrations
Several tests were manually running migrations in a large copy-paste blob of code.  This makes the test less useful as it's possible to break the code while keeping the tests green by introducing a new migration that never gets run in the tests, and it's also difficult to reason about what the tests are doing.

A new test helper library is introduced with a TestMigrationLoader that can both run Flyway migrations, or load specific migrations in the cases a specific set of migrations need to be loaded.   Existing tests are migrated to use the new code.
2024-01-12 15:55:50 +01:00
Viktor Lofgren
0caef1b307 (warc) Toggle for saving WARC data
Add a toggle for saving the WARC data generated by the search engine's crawler.  Normally this is discarded, but for debugging or archival purposes, retaining it may be of interest.

The warc files are concatenated into larger archives, up to about 1 GB each.
An index is also created containing filenames, domain names, offsets and sizes
to help navigate these larger archives.

The warc data is saved in a directory warc/ under the crawl data storage.
2024-01-12 13:45:14 +01:00
Viktor Lofgren
264e2db539 (control) UX-improvements for control service
This commit overhauls a lot of the UX for the control service, adding a new actions menu to the nodes views.  It has many small tweaks to make the work flow better.

It also adds a new /uploads directory in each index node, from which sideloaded data can be selected.  This is a bit of a breaking change, as this directory needs to exist in each index node.
2024-01-12 12:33:05 +01:00
Viktor Lofgren
734996002c (*) install script for deploying Marginalia outside the codebase
The changeset also makes the control service responsible for flyway migrations.  This helps reduce the number of places the database configuration needs to be spread out.  These automatic migrations can be disabled with -DdisableFlyway=true.

The commit also adds curl to the docker container, to enable docker health checks and interdependencies.
2024-01-11 12:40:03 +01:00
Viktor Lofgren
55c9501e57 (search) Serve proper content type for static resources 2024-01-10 10:46:51 +01:00
Viktor Lofgren
f592c9f04d (search) Fix acknowledgement page for domain complaints rendering as plain text
This was caused by incorrect usage of the renderInto() function, which was always buggy and should never be used.  This method is removed with this change.
2024-01-10 09:26:34 +01:00
Viktor Lofgren
fbad625126 (linkdb) Add delegating implementation of DomainLinkDb
This facilitates switching between SQL and File-backed implementations on the fly while migrating from one to the other.
2024-01-08 19:56:33 +01:00
Viktor Lofgren
edc1acbb7e (*) Replace EC_DOMAIN_LINK table with files and in-memory caching
The EC_DOMAIN_LINK MariaDB table stores links between domains.  This is problematic, as both updating and querying this table is very slow in relation to how small the data is (~10 GB).  This slowness is largely caused by the database enforcing ACID guarantees we don't particularly need.

This changeset replaces the EC_DOMAIN_LINK table with a file in each index node containing 32 bit integer pairs corresponding to links between two domains.  This file is loaded in memory in each node, and can be queried via the Query Service.

A migration step is needed before this file is created in each node.   Until that happens, the actual data is loaded from the EC_DOMAIN_LINK table, but accessed as though it was a file.

The changeset also migrates/renames the links.db file to documents.db to avoid naming confusion between the two.
2024-01-08 15:53:13 +01:00
Viktor Lofgren
6aee27a3f1 (*) Fix bug in EdgeDomain where it would permit domains with a trailing period, DNS style. 2023-12-29 16:36:01 +01:00
Viktor Lofgren
a7cd490593 (minor) Remove dead code. 2023-12-19 18:58:33 +01:00
Viktor Lofgren
bde68ba48b Merge branch 'master' into asn-info 2023-12-17 14:00:23 +01:00
Viktor Lofgren
bf44805e69 (*) Rename EdgeDomain$domain into topDomain
This variable had a very confusing name, and was dangerously easy to use in the wrong place with the result of getting something that only works as expected half the time.

Ideally this class needs an overhaul, the assumptions it makes about domain names aren't great.
2023-12-17 14:00:07 +01:00
Viktor Lofgren
bcad6492d6 (sideloader) Fix integration problems with sideloaders
In encyclopedia, add a class "mw-content-text" that the WikiSpecialization class is looking for during pruning to give the articles a more fair treatment.

Also add generator keywords based on the generator type provided, to ensure that these documents show up in appropriate filters.

Further, add a new document flag value 'Sideloaded' to be able to distinguish these entries.
2023-12-17 13:28:17 +01:00
Viktor Lofgren
d7bd540683 (*) Replace the ip2location IP geolocation data with ASN information from apnic.net.
Doesn't really make sense to use ip2location as a middle man for information that is already freely available...
2023-12-16 21:55:04 +01:00
Viktor Lofgren
b74a3ebd85 (crawler) WIP integration of WARC files into the crawler process.
At this stage, the crawler will use the WARCs to resume a crawl if it terminates incorrectly.

This is a WIP commit, since the warc files are not fully incorporated into the work flow, they are deleted after the domain is crawled.

The commit also includes fairly invasive refactoring of the crawler classes, to accomplish better separation of concerns.
2023-12-11 19:32:58 +01:00
Viktor Lofgren
45987a1d98 Merge branch 'master' into warc 2023-12-11 14:32:35 +01:00
Viktor Lofgren
8ef34883a8 (search) Move site information out of the search service and into assistant.
This reduces the impact of restarting the search service, as the site information takes a few minutes to load during which it's not available.  It also permits exposing this information via API in the future if there is interest in this.

The assistant service was also modified to do a late load of the suggestions trie, as this is a major contributor to its start-up time.

Finally, some changes were made to the client library, a new get() method was added that takes a TypeToken to allow deserialization of generics such as List<Foo>, and the scheduler was also modified to use virtual threads.
2023-12-09 16:30:06 +01:00
Viktor Lofgren
072b5fcd12 Implement Warc-recording wrapper for OkHttp3 client
This is a first step of using WARC as an intermediate flight recorder style step in the crawler, ultimately aimed at being able to resume crawls if the crawler is restarted.  This component is currently not hooked into anything.

The OkHttp3 client wrapper class 'WarcRecordingFetcherClient' was implemented for web archiving. This allows for the recording of HTTP requests and responses. New classes were introduced, 'WarcDigestBuilder', 'IpInterceptingNetworkInterceptor', and 'WarcProtocolReconstructor'.

The JWarc dependency was added to the build.gradle file, and relevant unit tests were also introduced. Some HttpFetcher-adjacent structural changes were also done for better organization.
2023-12-08 13:49:16 +01:00
Viktor Lofgren
280132dad0 (search) Fix script loading for mobile support 2023-12-02 17:06:40 +01:00
Viktor Lofgren
7c8a60b8cf (search) Site info view is mostly done
Also optimize the rendering a bit to avoid having to allocate huge string buffers, writing directly to Spark's response instead.
2023-12-02 17:06:40 +01:00
Viktor Lofgren
a258f0af7a (search) Refactor search parameters to include query 2023-12-02 17:06:40 +01:00
Viktor Lofgren
01621c6344 (renderer) Make helpers configurable on a by-service basis. 2023-12-02 17:06:40 +01:00
Viktor Lofgren
1dafa0c74d (mqapi/control) Repair repartition endpoint, deprecate notify endpoints.
The repartition endpoint was mis-addressing its mqapi notifications, omitting the proper nodeId.  In fixing this, it became apparent that having both @MqRequest and @MqNotification is a serious footgun, and the two should be unified into a single API where the caller isn't burdened with knowledge of the remote end's implementation specifics.
2023-11-27 16:01:12 +01:00
Viktor Lofgren
dd507a3808 (db) Fix migrations, bump flyway to 10.0.1
Tricky problem, creating a procedure apparently needs delimiter shenanigans in Flyway, otherwise it will truncate the END statement and mariadb will be sad.
2023-11-21 20:04:35 +01:00
Viktor Lofgren
f58a9f46be (loader) Don't truncate the entire links table on load
This behavior is an old vestige from the days of only having a single loader process.  We'd truncate the links table because doing inserts/updates was too slow.  This was also important because we had 32 bit ID, and there's a lot of links between domains to go around...

Instead we delete the rows associated with the current node with a stored procedure PURGE_LINKS_TABLE.

We also update the PRIMARY KEY to a BIGINT.  We'll need to load the data in excess of billion times to hit an ID rollover, so it'll be fine.
2023-11-16 10:30:12 +01:00
Viktor Lofgren
858357a246 (metrics) Get prometheus up out of disrepair
* Fix bad labels
* Add nodeId where appropriate
* Hopefully fix histogram buckets for index query times
2023-11-08 14:01:28 +01:00
Viktor Lofgren
7aa2f80117 (domain) id.au should be treated as a TLD 2023-11-06 19:07:47 +01:00
Viktor Lofgren
2b77184281 (converter) Integrate atags with the topology field 2023-11-06 13:46:44 +01:00
Viktor Lofgren
0152004c42 Initial Commit Anchor Tags
* Added new (optional) model file in $WMSA_HOME/data/atags.parquet
* Converter gets a component for creating a projection of its domains onto the full atags parquet file
* New WordFlag ExternalLink
* These terms are also for now flagged as title words
* Fixed a bug where Title words aliased with UrlDomain words
* Fixed a bug in the encyclopedia sideloader that gave everything too high topology ranking
2023-11-04 14:24:17 +01:00
Viktor Lofgren
659743b39c (executor) Export Data actor allocates its own storage 2023-10-31 17:04:07 +01:00
Viktor Lofgren
5d6e0e3790 (log) Clean up logging
Don't log the PROCESS stream to executor's logs, as it will also be logged in the spawned process' log files.

Also tell the spawned process which "service" it is so that it gets a log file with a name that makes sense.
2023-10-29 15:52:17 +01:00
Viktor Lofgren
0f637fb722 (logging) Better logging configurations 2023-10-26 12:48:10 +02:00
Viktor Lofgren
97fcbdd6d9 (control) Move storage actions into the actions tab
* Also disable annoying CSS animations
2023-10-25 21:23:56 +02:00
Viktor Lofgren
d7686b665e Refactoring
* Encyclopedia sideloader; permit providing base URL.
* Storage base shows node id in GUI
* ProcessLivenessMonitorActor restarts automatically
* Clean-up of outbox code
2023-10-25 18:51:02 +02:00
Viktor Lofgren
436a55ee1e (control) Render UUID tooltip with dashes. 2023-10-24 16:37:40 +02:00
Viktor Lofgren
e4bddb4993 (control) Better UUID accessibility 2023-10-23 12:53:53 +02:00
Viktor Lofgren
758f9b5aa5 (converter) Get UUID pips out of the models
Rendering concerns shouldn't be in the models, it's poor separation of concerns and very difficult to follow.
2023-10-22 14:24:52 +02:00
Viktor Lofgren
29ce8ca0cf (db) Reduce db pool size
This is a temporary thing
2023-10-22 14:03:09 +02:00
Viktor Lofgren
12fda1a36b (control) Temporarily re-writing the data balancer to get it to work in prod
Need to clean this up later.
2023-10-22 14:03:09 +02:00
Viktor Lofgren
c6abcd91fa (control) Better use of FS states, fix bug with start/stop actors 2023-10-20 16:37:49 +02:00
Viktor Lofgren
d76d926c38 (control/executor) Add new configuration options for node
It's now possible to configure prod instance to not retain processed data.
2023-10-20 14:05:19 +02:00
Viktor Lofgren
2b3c167845 (controller) Additional configuration options for node 2023-10-20 13:13:36 +02:00
Viktor Lofgren
584bb3a648 (fs) interface cleanup 2023-10-20 12:24:18 +02:00
Viktor Lofgren
23526f6d1a (executor) Executor service now pulls DomainType list for CRAWL on "recrawl"
This is an automatic integration with the submit-site repo on github and also
crawl-queue.
2023-10-19 17:48:34 +02:00
Viktor Lofgren
23f0c79fba (control) GUI for data sets/domain types. 2023-10-19 17:48:34 +02:00
Viktor Lofgren
81dd3809e9 (*) WIP Add node affinity to EC_DOMAIN
Very messy commit due to fractalline yak shaving
2023-10-19 17:48:34 +02:00
Viktor Lofgren
84fea0fd05 (node) Nodes auto-start their monitor actors. 2023-10-16 15:33:22 +02:00
Viktor Lofgren
2df3e0f881 (node) Nodes auto-configure on start-up instead of requiring manual configuration. 2023-10-16 14:46:35 +02:00
Viktor Lofgren
39911e3acd (control) Fix incorrect storage base and clean up GUI for data 2023-10-16 13:30:26 +02:00
Viktor Lofgren
3d1c15ef99 (client) Refactor liveness monitor 2023-10-16 12:34:01 +02:00
Viktor Lofgren
f718482e98 (client) Fix tests 2023-10-16 12:12:16 +02:00
Viktor Lofgren
8dafd13cd7 (client) Fix executor tests 2023-10-16 12:02:57 +02:00
Viktor Lofgren
0b19b28a64 (file-storage) Delete unused code 2023-10-16 12:02:57 +02:00
Viktor Lofgren
16e0738731 (*) Get multi-node routing working. 2023-10-15 18:38:30 +02:00
Viktor Lofgren
eacbf87979 (control) New list and form for index nodes. 2023-10-14 21:46:52 +02:00
Viktor Lofgren
108b4cb648 (service) Keep disabled multi-noded services dormant when they are configured to be disabled. 2023-10-14 20:58:55 +02:00
Viktor Lofgren
a9dff407a1 (config/db) Clean up migrations 2023-10-14 20:34:03 +02:00
Viktor Lofgren
6308a8dfcd (control) Node configuration 2023-10-14 16:47:52 +02:00
Viktor Lofgren
4baf9527d7 (*) WIP Control GUI redesign, executor-service, multi-node mq
This turned out to be very difficult to do in small isolated steps.

* Design overhaul of the control gui using bootstrap
* Move the actors out of control-service into to a new executor-service, that can be run on multiple nodes
* Add node-affinity to message queue
2023-10-14 12:08:43 +02:00
Viktor Lofgren
199c459697 (*) Add node-affinity to services, processes and file storage. 2023-10-10 12:32:22 +02:00
Viktor Lofgren
61288c5e68 (service, client) First steps towards multiple nodedness 2023-10-09 22:13:27 +02:00
Viktor Lofgren
3889c4bdd9 (refactor) Remove features-search and update documentation 2023-10-09 15:12:30 +02:00
Viktor Lofgren
89c6d85f2f (query-service) Create new empty 'query-service' service 2023-10-08 17:31:50 +02:00
Viktor Lofgren
77ccab7d80 (index) Move linkdb to index from search.
This makes index complete in the sense that you can deploy an index instance and build a complete separate application on top of it, without having to go through the Marginalia-laden search service.
2023-10-08 16:48:35 +02:00
Viktor Lofgren
c51159672e (build) Move unit test configuration to root build.gradle 2023-10-04 12:46:22 +02:00
Viktor Lofgren
233b51e29e (test) flag DomainTypesTest as Slow to exclude from regular CI 2023-10-04 12:23:10 +02:00
Viktor Lofgren
13ee31770a (file storage) Make it possible to override the value returned by getFileStorage(type) with a JVM property. 2023-10-01 12:57:53 +02:00
Viktor Lofgren
ec6c9bca62 (common) Fix factual error in comments 2023-09-24 19:40:19 +02:00
Viktor Lofgren
dbe9235f3a (*) Upgrade to JDK21 with preview enabled.
... also move some common configuration into the root build.gradle-file.

Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work.  This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.
2023-09-24 10:38:59 +02:00
Viktor Lofgren
9338f35cd8 (doc) Remove confusingly outdated ER-diagrams 2023-09-21 15:08:27 +02:00
Viktor Lofgren
75f8ae2815 (file-storage) Use human-readable timestamps in the names of file storage directories 2023-09-21 13:22:53 +02:00
Viktor Lofgren
5c040f7a46 (crawl-spec) Parquetify crawl spec
* Crawl-specs are now parquet files
* Deprecate the crawl-job-extractor tool
2023-09-17 09:41:34 +02:00
Viktor Lofgren
35996d0adb (docs) Update the documentation up-to-date information 2023-09-14 11:33:36 +02:00
Viktor Lofgren
eaeb23d41e (refactor) Remove converting-model package completely 2023-09-14 11:21:44 +02:00
Viktor Lofgren
9e185e80ce (control-service) Add timestamp to file storages. 2023-09-02 14:01:04 +02:00
Viktor Lofgren
03d999444d (ldb) Re-add accidentally removed stmt.addBatch that breaks 2023-08-31 12:06:30 +02:00
Viktor Lofgren
763ed260c3 (ldb) Better handling of null pubYear 2023-08-30 23:08:27 +02:00
Viktor Lofgren
048f685073 (ldb) add OR IGNORE to insert status query
Otherwise it will sometimes fail because documents may appear more than once in error scenarios.
2023-08-30 10:34:01 +02:00
Viktor Lofgren
dd593c292c (loader) Minor optimizations and bugfixes.
* Reduce memory churn in LoaderIndexJournalWriter, fix bug with keyword mappings as well
* Remove remains of OldDomains
* Ensure LOADER_PROCESS_OPTS gets fed to the processes
* LinkdbStatusWriter won't execute batch after each added item post 100 items
2023-08-29 15:37:52 +02:00
Viktor Lofgren
39c1857c61 (heartbeat, reverse-index) Better heartbeat mocking, improved heartbeats for reverse index construction. 2023-08-29 13:07:55 +02:00
Viktor Lofgren
3101b74580 (index) Move to a lexicon-free index design
This is a system-wide change.  The index used to have a lexicon, mapping words to wordIds using a large in-memory hash table.   This made index-construction easier, but it
also added a fairly significant RAM penalty to both the index service and the loader.

The new design moves to 64 bit word identifiers calculated using the murmur hash of the keyword, and an index construction based on merging smaller indices.

It also became necessary half-way through to upgrade guice as its error reporting wasn't *quite* compatible with JDK20.
2023-08-28 14:02:23 +02:00
Viktor Lofgren
194a6057dd (index,control) Recoverable index backups 2023-08-25 14:57:43 +02:00
Viktor Lofgren
e710e057e2 (db) Remove EC_URL and EC_PAGE_DATA from mariadb database 2023-08-25 13:45:03 +02:00
Viktor Lofgren
460998d512 (index) Move index construction to separate process.
This provides a much cleaner separation of concerns, and makes it possible to get rid of a lot of the gunkier parts of the index service.  It will also permit lowering the Xmx on the index service a fair bit, so we can get CompressedOOps again :D
2023-08-25 12:52:54 +02:00
Viktor Lofgren
1e6800565a (system) Remove EdgeId<T> and similar objects
They seemed like a good idea at the time, but in practice they're wasting resources and not really providing the clarity I had hoped.
2023-08-24 17:46:02 +02:00
Viktor Lofgren
c909120ae1 (search) Basic working integration of linkdb in search service 2023-08-24 17:24:56 +02:00
Viktor Lofgren
9894f37412 (index) Implement new URL ID coding scheme.
Also refactor along the way.  Really needs an additional pass, these tests are very hairy.
2023-08-24 16:44:27 +02:00
Viktor Lofgren
6a04cdfddf (loader) Implement new linkdb in loader
Deprecate the LoadUrl instruction entirely. We no longer need to be told upfront about which URLs to expect, as IDs are generated from the domain id and document ordinal.

For now, we no longer store new URLs in different domains.  We need to re-implement this somehow, probably in a different job or a as a different output.
2023-08-24 13:07:54 +02:00
Viktor Lofgren
c70670bacb (common) New UrlIdCodec class
Have a single class responsible for encoding and decoding URL ids, as it's a bit finicky and used all over.
2023-08-24 11:41:07 +02:00
Viktor Lofgren
7bb3e44a76 (common) Deprecate EdgeId and similar 2023-08-24 11:16:28 +02:00
Viktor Lofgren
b958acb76a (file-storage) New File Storage type for linkdb 2023-08-24 09:06:13 +02:00
Viktor Lofgren
b22f4fbb72 (linkdb) New Module for sqlite-backed document db 2023-08-24 09:06:13 +02:00
Viktor Lofgren
ebc84c22fb Upgrade antique lombok plugin
This permits tests to run on JDK20 environments.
2023-08-23 14:34:32 +00:00
Viktor Lofgren
aa0d256d6a Upgrade code to Java 20.
* Change language version
* Upgrade Lombok to a JDK20 compatible version
2023-08-23 13:37:49 +00:00
Viktor Lofgren
15912f31d0 (control-service) Basic GUI for deleting bad links from exploration mode 2023-08-21 18:35:26 +02:00
Viktor Lofgren
6cb784df75 (minor) Improve comment 2023-08-18 11:25:36 +02:00
Viktor Lofgren
c019a029ec (flags) Documentation and preventative bugfix 2023-08-17 17:42:31 +02:00
Viktor Lofgren
46d761f34f (language) fasttext based language filter 2023-08-16 15:48:12 +02:00
Viktor Lofgren
4598c7f40f (valuation) Penalize wordpress style kebab case urls 2023-08-16 13:11:24 +02:00
Viktor Lofgren
e7192a9cad (mq) Refactor mq and actor library and move it to libraries out of common 2023-08-15 10:53:23 +02:00
Viktor Lofgren
8210e49b4e (control) Helpful tooltips for the Actor table. 2023-08-13 12:55:56 +02:00
Viktor Lofgren
d6b8b38955 (db) Add indices on SERVICE_EVENTLOG 2023-08-12 15:00:15 +02:00
Viktor Lofgren
6483308bb0 (sql) Update default value for DOMAIN_SELECTION_TYPE 2023-08-11 14:01:15 +02:00
Viktor Lofgren
7440da240d (blacklist) Fix broken SQL migration 2023-08-11 13:33:35 +02:00
Viktor Lofgren
4f8048be31 (blacklist) Blacklist management 2023-08-10 15:40:07 +02:00
Viktor Lofgren
807fb2d052 (service) Task heartbeat creates event log entries 2023-08-09 15:15:16 +02:00
Viktor Lofgren
ce293029c7 (converter) Treat adtech tracking as advertisement. 2023-08-09 14:23:53 +02:00
Viktor Lofgren
b5ed21be21 (mq) MqPersistence no longer relies on autoCommit being enabled 2023-08-09 14:23:22 +02:00
Viktor Lofgren
251fc63b42 (*) Fix merge gore 2023-08-09 13:33:28 +02:00
Viktor Lofgren
afad4f5ebb (*) last touches 2023-08-07 12:59:33 +02:00
Viktor Lofgren
4ab1cd9502 (*) last touches 2023-08-07 12:57:44 +02:00
Viktor
52e2ab45bf
Merge branch 'master' into master-control-program 2023-08-07 12:53:43 +02:00
Viktor Lofgren
715d61dfea (mq) Fix bug in notice handling where they were registered on the wrong name 2023-08-05 14:45:04 +02:00
Viktor Lofgren
c2b45bec8d (mq) Rename notify to sendNotice to avoid name clash with the java object function 2023-08-05 14:45:04 +02:00
Viktor Lofgren
cdfe284f9a (file storage) File Storage Type for EXPORT data
(file storage) File Storage Type for EXPORT data
2023-08-05 14:45:03 +02:00
Viktor Lofgren
624b78ec3a (heartbeat) Task heartbeats 2023-08-04 14:40:06 +02:00
Viktor Lofgren
f01f608474 (blacklist) Support blacklists with subdomain 2023-08-03 17:58:52 +02:00
Viktor Lofgren
659d2134ba (file-storage) Deprecate mustClean flag 2023-08-01 22:32:30 +02:00
Viktor Lofgren
867410c66b (file-storage) Automatic file storage discovery via manifest file 2023-08-01 18:05:43 +02:00
Viktor Lofgren
58556af6c7 (db) Use flwyay for database migrations. 2023-08-01 17:08:42 +02:00
Viktor Lofgren
2e29038ecd (db) Fix broken insert statement, move file storage defaults to a separate file. 2023-08-01 15:50:08 +02:00
Viktor Lofgren
c1ea60b399 (db) Default values for storage base 2023-08-01 15:05:04 +02:00
Viktor Lofgren
2f8488610a (loader) Fix bug where trailing deferred domain meta inserts weren't executed 2023-07-31 14:23:23 +02:00
Viktor Lofgren
d95f01b701 (control) Reduce log spam in control svc 2023-07-31 14:21:06 +02:00
Viktor Lofgren
c9d7635370 (control) Aborting an actor that waits on a process request terminates the running job.
(control) Aborting an actor that waits on a process request terminates the running job.
2023-07-31 14:21:06 +02:00
Viktor Lofgren
6b5fb0f841 (control) Disable the start button for actors that aren't directly initializable.
(control) Disable the start button for actors that aren't directly initializable.
2023-07-31 14:21:00 +02:00
Viktor Lofgren
5411950b87 (minor) Tidy up EdgeDomain class a bit, no functional difference 2023-07-31 10:31:29 +02:00
Viktor Lofgren
5c071ce4d3 (crawler) Clean up the code and remove unnecessary logging 2023-07-30 16:53:39 +02:00
Viktor Lofgren
866db6c63f (control) Dialog for updating message state; clean up file view. 2023-07-28 22:02:05 +02:00
Viktor Lofgren
77d5e39fe0 Make processed data Serializable 2023-07-28 18:11:19 +02:00
Viktor Lofgren
27e781761d (mq single shot inbox) Flag messages as OK if there is no recipient 2023-07-28 12:04:23 +02:00
Viktor Lofgren
92cac52813 (mq) Add indexes to MESSAGE_QUEUE 2023-07-28 12:03:51 +02:00
Viktor Lofgren
507f26ad47 (converter) Refactor converter to not keep instructions list in RAM.
(converter) Refactor converter to not keep instructions list in RAM.

(converter) Refactor converter to not keep instructions list in RAM.
2023-07-25 22:06:46 +02:00
Viktor Lofgren
09fd0a1d0e (converter) Automatically clean stale file storage records if they disappear on disk 2023-07-24 17:04:42 +02:00
Viktor Lofgren
7470c170b1 (minor) EdgeUrl.parse() should deal with null 2023-07-24 15:06:57 +02:00
Viktor Lofgren
f91d92cccb (crawler) WIP 2023-07-20 21:05:16 +02:00
Viktor Lofgren
08ca6399ec (converter) WIP 2023-07-19 17:14:45 +02:00
Viktor Lofgren
c0b5ea0e7d Revert "Less spammy default log settings"
This reverts commit f6e2216b87.
2023-07-18 19:28:42 +02:00
Viktor Lofgren
f21a3983aa Abortable processes 2023-07-18 18:40:12 +02:00
Viktor Lofgren
f6e2216b87 Less spammy default log settings 2023-07-17 21:42:13 +02:00
Viktor Lofgren
92ed513e4f Less spammy default log settings 2023-07-17 21:41:56 +02:00
Viktor Lofgren
d7ab21fe34 (*) Refactor Control Service and processes 2023-07-17 21:20:31 +02:00
Viktor Lofgren
bca4bbb6c8 (*) Refactor MQ and MQSM 2023-07-17 13:57:32 +02:00
Viktor Lofgren
e618aa34e9 (control) Name change process->fsm, new fsm:s
* FSM for spawning processes when messages appear for them
* FSM for removing data flagged for purging
2023-07-17 12:27:27 +02:00
Viktor Lofgren
c4dd9a0547 (control) Use MQFSMs to monitor and spawn processes when messages are sent to them 2023-07-16 11:58:47 +02:00