Commit Graph

76 Commits

Author SHA1 Message Date
Viktor Lofgren
09fd0a1d0e (converter) Automatically clean stale file storage records if they disappear on disk 2023-07-24 17:04:42 +02:00
Viktor Lofgren
7470c170b1 (minor) EdgeUrl.parse() should deal with null 2023-07-24 15:06:57 +02:00
Viktor Lofgren
f91d92cccb (crawler) WIP 2023-07-20 21:05:16 +02:00
Viktor Lofgren
08ca6399ec (converter) WIP 2023-07-19 17:14:45 +02:00
Viktor Lofgren
c0b5ea0e7d Revert "Less spammy default log settings"
This reverts commit f6e2216b87.
2023-07-18 19:28:42 +02:00
Viktor Lofgren
f21a3983aa Abortable processes 2023-07-18 18:40:12 +02:00
Viktor Lofgren
f6e2216b87 Less spammy default log settings 2023-07-17 21:42:13 +02:00
Viktor Lofgren
92ed513e4f Less spammy default log settings 2023-07-17 21:41:56 +02:00
Viktor Lofgren
d7ab21fe34 (*) Refactor Control Service and processes 2023-07-17 21:20:31 +02:00
Viktor Lofgren
bca4bbb6c8 (*) Refactor MQ and MQSM 2023-07-17 13:57:32 +02:00
Viktor Lofgren
e618aa34e9 (control) Name change process->fsm, new fsm:s
* FSM for spawning processes when messages appear for them
* FSM for removing data flagged for purging
2023-07-17 12:27:27 +02:00
Viktor Lofgren
c4dd9a0547 (control) Use MQFSMs to monitor and spawn processes when messages are sent to them 2023-07-16 11:58:47 +02:00
Viktor Lofgren
5ec10634d8 (mqfsm) Abortable state machine 2023-07-15 14:12:16 +02:00
Viktor Lofgren
8b74e3aa0d (*) File Storage WIP 2023-07-14 17:08:10 +02:00
Viktor Lofgren
23169ad818 (db) Model for file storage areas 2023-07-14 11:40:05 +02:00
Viktor Lofgren
d36e36c8fd (mq) Bugfix lastNMessages; use Lists.reverse properly 2023-07-14 11:39:15 +02:00
Viktor Lofgren
948d4d5f08 (control) Clean up the number of GUI views, abortable FSM tasks 2023-07-13 17:24:21 +02:00
Viktor Lofgren
1ec6f9cde2 (mq) More robust resume and recovery logic, protection against spurious state changes, minor bugfixes 2023-07-13 14:55:45 +02:00
Viktor Lofgren
a5118fe8f1 (minor) clean-up 2023-07-12 22:46:14 +02:00
Viktor Lofgren
6c88f00a9d (mqsm) guard against spurious transitions from unexpected messages 2023-07-12 22:44:05 +02:00
Viktor Lofgren
8a53e107fa (mq) Synchronous and Asynchronous inboxes. 2023-07-12 20:12:52 +02:00
Viktor Lofgren
0ed938545b (mq) Add single-shot inbox 2023-07-12 18:41:27 +02:00
Viktor Lofgren
480abfe966 (minor) Add limit to pol count in MqPersistence, fix test 2023-07-12 18:16:23 +02:00
Viktor Lofgren
89e4343fdb (minor) Fix test 2023-07-12 18:15:50 +02:00
Viktor Lofgren
8c16a2aede (work-log, minor) Clean up code 2023-07-12 18:10:05 +02:00
Viktor Lofgren
5deec63667 (work-log) Better tests 2023-07-12 18:04:06 +02:00
Viktor Lofgren
74caf9e38a (processes) Remove forEach-constructs in favor of iterators. 2023-07-12 17:47:36 +02:00
Viktor Lofgren
77261a38cd (control, WIP) MQFSM and ProcessService are sitting in a tree
We're spawning processes from the MSFSM in control service now!
2023-07-11 17:08:43 +02:00
Viktor Lofgren
4c016b0318 Process monitoring
* Also refactored the SQL tables a bit
2023-07-11 14:46:21 +02:00
Viktor Lofgren
ec7826659a (minor) Javadoc comments for MqPersistance and MqMessageState 2023-07-10 21:52:25 +02:00
Viktor Lofgren
98b5f22104 (control) WIP control service
* Set messages to OK when received so they're cleaned up properly.
2023-07-10 21:33:57 +02:00
Viktor
cbbf60a599 Better fingerprinting (#35)
* Better fingerprinting for server tech
* Many more features in FeatureExtractor
* Blog specialization
* SiteType table
2023-07-10 18:58:43 +02:00
Viktor Lofgren
d9e6c4f266 Trial integration of MQ-FSM into index service. 2023-07-06 18:04:16 +02:00
Viktor Lofgren
f0a8ca440f MQFSM Usability WIP 2023-07-06 13:33:11 +02:00
Viktor Lofgren
d89db10645 MQFSM Usability WIP 2023-07-06 13:02:16 +02:00
Viktor Lofgren
2ae0b8c159 Message queue based state machine 2023-07-04 17:42:06 +02:00
Viktor Lofgren
31ae71c7d6 Message queue WIP 2023-07-04 14:28:14 +02:00
Viktor Lofgren
62cc9df206 Embryo of new control process
* New events and heartbeat tables in mariadb
* Refactored to a cleaner Service interface
2023-07-03 10:40:32 +02:00
Viktor Lofgren
8274e8a953 JVM flags for disabling black and block-lists. 2023-06-30 17:07:47 +02:00
Viktor Lofgren
a6a66c6d8a Improve site info for unknown domains:
* Placeholder screenshot should work
* Add a link to git-repo for submitting the site for crawling
2023-06-27 15:32:11 +02:00
Viktor Lofgren
f92d8a0975 EdgeUrl conversion to/from java.net.URL 2023-06-27 10:57:54 +02:00
Viktor Lofgren
5abaf13192 Fix serialization bug with CompressedBigString 2023-06-27 10:57:54 +02:00
Viktor Lofgren
bd2c3855ed Add bits and keywords for generator classes (docs, forum, wiki). 2023-06-23 21:35:28 +02:00
Viktor Lofgren
54c2be893b TRIVIAL: Remove unused import. 2023-06-22 17:21:47 +02:00
Viktor Lofgren
b5ef67ed28 Categorize generators by type
This is a great quality signal!
Add the type as document bitflags by category.
2023-06-22 16:04:37 +02:00
Viktor Lofgren
9455100907 Throw a custom exception when WMSA_HOME isn't found 2023-06-20 11:37:52 +02:00
Viktor Lofgren
d1a004bea6 (minor) Clean up StringPool 2023-06-19 17:58:19 +02:00
Viktor Lofgren
2cda57355a More word metadata tests 2023-05-28 11:57:06 +02:00
Viktor Lofgren
d42ab19166 Issue 5: Fix bug where some IPv6 addresses blew up domain loading. 2023-04-15 14:11:08 +02:00
Viktor Lofgren
2ab26f37b8 Bug fix for document metadata encoding that breaks year based queries. 2023-04-14 16:56:49 +02:00