Commit Graph

658 Commits

Author SHA1 Message Date
Viktor Lofgren
fde1d0677e (search) Remove unnecessary dependencies 2023-11-06 12:56:32 +01:00
Viktor Lofgren
48986574ae (result-ranking) Use a weighted calculation of priority term importance 2023-11-06 12:56:21 +01:00
Viktor Lofgren
c7a6a71d07 (result-ranking) Use a weighted calculation of priority term importance 2023-11-06 12:48:23 +01:00
Viktor Lofgren
1847845151 Revert "(loader) Optimize INSERT statements"
This reverts commit 7cb92195d1.
2023-11-04 19:32:02 +01:00
Viktor Lofgren
7cb92195d1 (loader) Optimize INSERT statements
INSERT IGNORE is too slow.
2023-11-04 17:43:55 +01:00
Viktor Lofgren
72afa0341f duckdb connection may need to be synchronized? 2023-11-04 14:30:25 +01:00
Viktor Lofgren
0152004c42 Initial Commit Anchor Tags
* Added new (optional) model file in $WMSA_HOME/data/atags.parquet
* Converter gets a component for creating a projection of its domains onto the full atags parquet file
* New WordFlag ExternalLink
* These terms are also for now flagged as title words
* Fixed a bug where Title words aliased with UrlDomain words
* Fixed a bug in the encyclopedia sideloader that gave everything too high topology ranking
2023-11-04 14:24:17 +01:00
Viktor Lofgren
8e9698c9a0 (control/search) Add ability to suggest removing a site from random exploration
This is what most complaints have been about.
2023-11-02 15:29:49 +01:00
Viktor Lofgren
3047e2dd7c (screenshot-capture-tool) Make screenshot-capture-tool cooperate with docker 2023-11-01 16:38:55 +01:00
Viktor Lofgren
a8b9d21f2d (executor) Refine atag export logic
* Remove obviously uninteresting tags
* Omit URL schema for more sensible sorting
* Change the column order to put the source domain last
2023-11-01 13:23:14 +01:00
Viktor Lofgren
c77a5b7cb6 (control) GUI for atags export 2023-10-31 17:55:47 +01:00
Viktor Lofgren
23f2068e33 (executor) Actor for exporting anchor tag data from crawl data 2023-10-31 17:32:34 +01:00
Viktor Lofgren
ffadfb4149 (control) Use a partial template for the storage types tabs. 2023-10-31 17:12:14 +01:00
Viktor Lofgren
b7e38cfbae (control) Add exports view 2023-10-31 17:08:48 +01:00
Viktor Lofgren
659743b39c (executor) Export Data actor allocates its own storage 2023-10-31 17:04:07 +01:00
Viktor Lofgren
69758c5859 (control) Nicer redirects acknowledging actions 2023-10-31 16:26:29 +01:00
Viktor Lofgren
81bfd7e5fb (experiment) Utility for exporting atags 2023-10-31 16:10:21 +01:00
Viktor Lofgren
8f74dbdbb4 (crawler) Set more lenient parameters for recrawl 2023-10-30 11:35:30 +01:00
Viktor Lofgren
fd5a7eac87 (crawler) Exit crawler retriever on thread interrupted 2023-10-30 11:34:16 +01:00
Viktor Lofgren
6bac3c75cb (api) API documentation 2023-10-29 16:13:21 +01:00
Viktor Lofgren
5d6e0e3790 (log) Clean up logging
Don't log the PROCESS stream to executor's logs, as it will also be logged in the spawned process' log files.

Also tell the spawned process which "service" it is so that it gets a log file with a name that makes sense.
2023-10-29 15:52:17 +01:00
Viktor Lofgren
2871a326e6 (ctrl/exe) Clean up UX and code 2023-10-29 14:09:39 +01:00
Viktor Lofgren
abb42f0f36 (crawler) Fix bug in SQL statement
Arguments were in the wrong order in inserting fetching sites submitted to be crawled
2023-10-29 13:19:17 +01:00
Viktor Lofgren
f6fcb04817 (experiment) Repair the experiment runner 2023-10-27 16:16:50 +02:00
Viktor Lofgren
88f49834fd (docs) Update documentation 2023-10-27 12:45:39 +02:00
Viktor Lofgren
4415f52e18 (keyword-extraction) Fix broken test 2023-10-27 12:19:33 +02:00
Viktor Lofgren
98d742d634 (actor) Code cleanup 2023-10-27 12:19:20 +02:00
Viktor Lofgren
6c1ca10be7 (minor) code cleanup 2023-10-27 11:38:37 +02:00
Viktor Lofgren
aeaf2d546a (search) Fix broken redirect for flagging problems with websites 2023-10-27 11:20:49 +02:00
Viktor Lofgren
c7cb6664b4 (control) Indicate missing services with danger-color instead of having a distracting and constantly updating last-seen number 2023-10-26 18:05:22 +02:00
Viktor Lofgren
79adba9284 (index) Fix bug in dealing with quoted search terms 2023-10-26 16:28:23 +02:00
Viktor Lofgren
37b7f52f2c (minor) Reduce log severity for getTermMeta miss 2023-10-26 15:41:52 +02:00
Viktor Lofgren
c89e0ab255 (minor) Disable ~vlofgren specific debug test 2023-10-26 15:27:59 +02:00
Viktor Lofgren
f613f4f2df (array) Fix spurious search results
This was caused by a bug in the binary search algorithm causing it to sometimes return positive values when encoding a search miss.

It was also necessary to get rid of the vestiges of the old LongArray and IntArray classes to make this fix doable.
2023-10-26 15:27:02 +02:00
Viktor Lofgren
a497e4c920 (crawler) Terminate crawler after a few hours of no progress 2023-10-26 12:49:28 +02:00
Viktor Lofgren
0f637fb722 (logging) Better logging configurations 2023-10-26 12:48:10 +02:00
Viktor Lofgren
abbadc92a0 (exdecutor) Prevent TriggerAdjacencyCalculationActor from showing up in the actions tab when it isn't running 2023-10-25 21:25:07 +02:00
Viktor Lofgren
97fcbdd6d9 (control) Move storage actions into the actions tab
* Also disable annoying CSS animations
2023-10-25 21:23:56 +02:00
Viktor Lofgren
d7686b665e Refactoring
* Encyclopedia sideloader; permit providing base URL.
* Storage base shows node id in GUI
* ProcessLivenessMonitorActor restarts automatically
* Clean-up of outbox code
2023-10-25 18:51:02 +02:00
Viktor Lofgren
5de41a3a7f (search-service) Show node affinity in site info tab 2023-10-25 12:44:48 +02:00
Viktor Lofgren
84cdac83d6 (control) Move message queue monitor to control 2023-10-24 16:44:28 +02:00
Viktor Lofgren
436a55ee1e (control) Render UUID tooltip with dashes. 2023-10-24 16:37:40 +02:00
Viktor Lofgren
313cc2965c (index-creation) Print whether full or prio is created
Previous state of saying reverse index for both was pretty confusing.
2023-10-24 16:23:10 +02:00
Viktor Lofgren
95f74c5ea7 (control) Filter out heartbeats that are stopped 2023-10-24 16:09:28 +02:00
Viktor Lofgren
8d1c3c754d Testing development flow with adding a ~tilde search filter 2023-10-24 15:35:15 +02:00
Viktor Lofgren
72152f9d80 Fix bug in handling js parameters 2023-10-24 15:10:02 +02:00
Viktor Lofgren
ebd365a128 Fix exception 2023-10-24 15:04:12 +02:00
Viktor Lofgren
0406e76889 (api) Remove logging cruft 2023-10-24 13:39:05 +02:00
Viktor Lofgren
c2b28c0f8d (api) Trial streaming API 2023-10-24 13:26:46 +02:00
Viktor Lofgren
9aa5038756 (search) Remove unnecessary filtering operation 2023-10-24 11:43:47 +02:00
Viktor Lofgren
a860f8f1a8 (index/qs) GRPC API for better query peformance 2023-10-24 11:38:07 +02:00
Viktor Lofgren
487c016a32 (qs) Speed 2023-10-23 14:03:09 +02:00
Viktor Lofgren
e4bddb4993 (control) Better UUID accessibility 2023-10-23 12:53:53 +02:00
Viktor Lofgren
731afcb864 (qs) Parallel execution 2023-10-23 12:06:03 +02:00
Viktor Lofgren
efb73ff4e7 (qs) Don't blow up if an index node isn't responsive 2023-10-23 11:53:18 +02:00
Viktor Lofgren
2ed2f35a9b (actor) Rewrite of the actor prototype class using record pattern matching 2023-10-23 10:18:20 +02:00
Viktor Lofgren
119151cad3 (converter) Separtion of concerns 2023-10-22 14:35:33 +02:00
Viktor Lofgren
758f9b5aa5 (converter) Get UUID pips out of the models
Rendering concerns shouldn't be in the models, it's poor separation of concerns and very difficult to follow.
2023-10-22 14:24:52 +02:00
Viktor Lofgren
e06a8c1de2 (converter) Put upper limit on number of worker threads. 2023-10-22 14:03:09 +02:00
Viktor Lofgren
29ce8ca0cf (db) Reduce db pool size
This is a temporary thing
2023-10-22 14:03:09 +02:00
Viktor Lofgren
eb4158df0b (control) Fix start/stop FSM endpoints 2023-10-22 14:03:09 +02:00
Viktor Lofgren
12fda1a36b (control) Temporarily re-writing the data balancer to get it to work in prod
Need to clean this up later.
2023-10-22 14:03:09 +02:00
Viktor Lofgren
e927f99777 (control) JSON serializes Map<Integer> to Map<Double> and Java gets confused 2023-10-21 16:24:20 +02:00
Viktor Lofgren
044bcf55bd (control) Fix SQL in rebalance actor 2023-10-21 16:13:37 +02:00
Viktor Lofgren
e475af9f49 (control) Initialize controlActorService 2023-10-21 16:06:53 +02:00
Viktor Lofgren
c6abcd91fa (control) Better use of FS states, fix bug with start/stop actors 2023-10-20 16:37:49 +02:00
Viktor Lofgren
10fc489822 (converter) More robust filename resolution 2023-10-20 14:16:03 +02:00
Viktor Lofgren
d76d926c38 (control/executor) Add new configuration options for node
It's now possible to configure prod instance to not retain processed data.
2023-10-20 14:05:19 +02:00
Viktor Lofgren
2b3c167845 (controller) Additional configuration options for node 2023-10-20 13:13:36 +02:00
Viktor Lofgren
1d75b974b5 (loader bugfix) Set DOMAIN_METADATA appropriately 2023-10-20 13:03:27 +02:00
Viktor Lofgren
584bb3a648 (fs) interface cleanup 2023-10-20 12:24:18 +02:00
Viktor Lofgren
7b5ec6b98f (executor-service) Embed dist/ in executor-service's docker image 2023-10-19 17:48:34 +02:00
Viktor Lofgren
23526f6d1a (executor) Executor service now pulls DomainType list for CRAWL on "recrawl"
This is an automatic integration with the submit-site repo on github and also
crawl-queue.
2023-10-19 17:48:34 +02:00
Viktor Lofgren
809b3ee023 (control) Update GUI for crawl specs. They are now less important than they were before. 2023-10-19 17:48:34 +02:00
Viktor Lofgren
23f0c79fba (control) GUI for data sets/domain types. 2023-10-19 17:48:34 +02:00
Viktor Lofgren
81dd3809e9 (*) WIP Add node affinity to EC_DOMAIN
Very messy commit due to fractalline yak shaving
2023-10-19 17:48:34 +02:00
Viktor Lofgren
2bf0c4497d (*) Tool for unfcking old crawl data so that it aligns with the new style IDs 2023-10-19 17:48:34 +02:00
Viktor Lofgren
978550f809 (executor-service) Retire features-convert and move the corresponding packages into the executor service. 2023-10-16 15:43:46 +02:00
Viktor Lofgren
84fea0fd05 (node) Nodes auto-start their monitor actors. 2023-10-16 15:33:22 +02:00
Viktor Lofgren
2df3e0f881 (node) Nodes auto-configure on start-up instead of requiring manual configuration. 2023-10-16 14:46:35 +02:00
Viktor Lofgren
c98117f69d (actor) FS monitor should pick up stuff in BACKUP as well. 2023-10-16 14:37:36 +02:00
Viktor Lofgren
ede5d1f890 (actor) Give process spawners more easily recognizable names. 2023-10-16 14:19:00 +02:00
Viktor Lofgren
39911e3acd (control) Fix incorrect storage base and clean up GUI for data 2023-10-16 13:30:26 +02:00
Viktor Lofgren
3d1c15ef99 (client) Refactor liveness monitor 2023-10-16 12:34:01 +02:00
Viktor Lofgren
f718482e98 (client) Fix tests 2023-10-16 12:12:16 +02:00
Viktor Lofgren
8dafd13cd7 (client) Fix executor tests 2023-10-16 12:02:57 +02:00
Viktor Lofgren
0b19b28a64 (file-storage) Delete unused code 2023-10-16 12:02:57 +02:00
Viktor Lofgren
c245f7ce3a (control) Bootstrapify review-domains and search-to-ban views. 2023-10-15 22:04:23 +02:00
Viktor Lofgren
607d647483 (control) Remove services listing view 2023-10-15 21:48:55 +02:00
Viktor Lofgren
9a38a455c9 (control/exec) File listings in control GUI 2023-10-15 19:15:44 +02:00
Viktor Lofgren
16e0738731 (*) Get multi-node routing working. 2023-10-15 18:38:30 +02:00
Viktor Lofgren
eacbf87979 (control) New list and form for index nodes. 2023-10-14 21:46:52 +02:00
Viktor Lofgren
108b4cb648 (service) Keep disabled multi-noded services dormant when they are configured to be disabled. 2023-10-14 20:58:55 +02:00
Viktor Lofgren
a9dff407a1 (config/db) Clean up migrations 2023-10-14 20:34:03 +02:00
Viktor Lofgren
9e26109e36 (reverse-index) Don't always POST 2023-10-14 16:48:29 +02:00
Viktor Lofgren
6308a8dfcd (control) Node configuration 2023-10-14 16:47:52 +02:00
Viktor Lofgren
4baf9527d7 (*) WIP Control GUI redesign, executor-service, multi-node mq
This turned out to be very difficult to do in small isolated steps.

* Design overhaul of the control gui using bootstrap
* Move the actors out of control-service into to a new executor-service, that can be run on multiple nodes
* Add node-affinity to message queue
2023-10-14 12:08:43 +02:00
Viktor Lofgren
199c459697 (*) Add node-affinity to services, processes and file storage. 2023-10-10 12:32:22 +02:00
Viktor Lofgren
61288c5e68 (service, client) First steps towards multiple nodedness 2023-10-09 22:13:27 +02:00
Viktor Lofgren
8375237de5 (converter) Add special keyword for websites with a tilde url. 2023-10-09 17:02:32 +02:00