MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 05:18:58 +00:00

Author	SHA1	Message	Date
Viktor Lofgren	89aae93e60	(*) Lift jetty and guava-dependencies	2024-05-23 14:20:01 +02:00
Viktor Lofgren	65b74f9cab	(registry) Fix broken test	2024-05-23 14:15:01 +02:00
Viktor Lofgren	59ec70eb73	(*) Clean up code related to crawl parquet inspection	2024-05-22 12:55:08 +02:00
Viktor Lofgren	17dc00d05f	(control) Partial implementation of inspection utility for crawl data Uses duckdb and range queries to read the parquet files directly from the index partitions. UX is a bit rough but is in working order.	2024-05-20 18:02:46 +02:00
Viktor Lofgren	b867eadbef	(big-string) Remove the unused bigstring library	2024-05-18 13:40:03 +02:00
Viktor Lofgren	b7a95be731	(search) Create a small mocking framework for running the search service in isolation.	2024-05-04 11:40:59 +02:00
Viktor Lofgren	616649f040	(logs) Fix logdir location	2024-05-04 11:40:59 +02:00
Viktor Lofgren	2ad0bfda1e	(*) Fix boot orchestration for the services This corrects an annoying bug that had the system crash and burn on first start-up due to a race condition in service initialization, where the services were attempting to access the database before it was properly migrated. A fix was in principle already in place, but it was running too late and did not prevent attempts to access the as-yet uninitialized database. Move the first boot check into the MainClass instead of the Service constructor. The change also adds more appropriate docker dependencies to the services to fix rare errors resolving the hostname of the database.	2024-05-01 12:39:48 +02:00
Viktor Lofgren	08f8b6e022	(system) Log loaded properties to the console	2024-04-30 18:29:11 +02:00
Viktor Lofgren	800ed6b1e9	(zk) Terminately immediately if zookeeper isn't found This makes debugging easier	2024-04-30 18:28:49 +02:00
Viktor Lofgren	908535a3a0	(single-service) Ensure single-service spawner can specify the node	2024-04-30 18:27:46 +02:00
Viktor Lofgren	7fe2ab6f39	(file-storage) Ensure file storage root location can be overridden when running outside of docker	2024-04-30 18:26:15 +02:00
Viktor Lofgren	4772e0b59d	(service) Deprecate /public prefix on HTTP Before the gRPC migration, the system would serve both public and internal requests over HTTP, but distinguish the two using path prefixes and a few HTTP Headers (X-Public, X-Context) added by the reverse proxy to prevent misconfigurations. Since internal requests meaningfully no longer use HTTP, this convention is just an obstacle now, adding the need to always run the system behind a reverse proxy that rewrites the paths. The change removes the path prefix, and updates the docker templates to reflect the change. This will require a migration for existing systems.	2024-04-30 14:46:18 +02:00
Viktor Lofgren	89889ecbbd	(single-service) Skip starting Prometheus if it's not explicitly enabled	2024-04-25 17:54:07 +02:00
Viktor Lofgren	c8ee354d0b	(log) Make log dir configurable via environment variable	2024-04-25 15:09:18 +02:00
Viktor Lofgren	3952ef6ca5	(service) Let singleservice configure ports and bind addresses	2024-04-25 13:49:57 +02:00
Viktor Lofgren	32fe864a33	(build) Java 22 and its consequences has been a disaster for Marginalia Search Roll back to JDK 21 for now, and make Java version configurable in the root build.gradle The project has run into no less than three distinct show-stopping bugs in JDK22, across multiple vendors, and gradle still doesn't fully support it, meaning you need multiple JDK versions installed.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	6efc0f21fe	(index) Clean up data model The change set cleans up the data model for the term-level data. This used to contain a bunch of fields with document-level metadata. This data-duplication means a larger memory footprint and worse memory locality. The ranking code is also modified to not accept SearchResultKeywordScores, but rather CompiledQueryLong and CqDataInts containing only the term metadata and the frequency information needed for ranking. This is again an effort to improve memory locality.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	6a7a7009c7	(convert) Initial integration of segmentation data into the converter's keyword extraction logic	2024-04-24 14:44:17 +02:00
Viktor Lofgren	3c75057dcd	(qs) Retire NGramBloomFilter, integrate new segmentation model instead	2024-04-24 14:44:17 +02:00
Viktor Lofgren	8c559c8121	(conf) Add additional logic for discovering system root	2024-04-16 12:37:18 +02:00
Viktor Lofgren	fe8d583fdd	(sys) Upgrade to JDK22 This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.	2024-03-21 14:27:13 +01:00
Viktor Lofgren	57e6a12d08	(registry) Correct registerMonitor() behavior The previous behavior would listen to too many changes, and based on zookeeper and not curator assumptions about behavior, add an additional monitor on each invocation of each monitor, (which always trigger on service state changes), leading to each monitor re-registering and effectively doubling monitors in numbers whenever a service stopped or started, which in turn meant a lot of bizarre thrashing behavior even on changes in services that don't explicitly talk to each other. This re-registering behavior is no longer done.	2024-03-06 12:22:15 +01:00
Viktor Lofgren	46423612e3	(refac) Merge service-discovery and service modules Also adds a few tests to the server/client code.	2024-03-03 10:49:23 +01:00
Viktor Lofgren	144f967dbf	(misc) Tweak pool sizes	2024-02-28 16:23:02 +01:00
Viktor Lofgren	b31c9bb726	(docs) Update process docs	2024-02-28 15:21:33 +01:00
Viktor Lofgren	c0820b5e5c	(docs) Update service docs	2024-02-28 15:19:31 +01:00
Viktor Lofgren	65b8a1d5d9	(grpc) Reduce error spam	2024-02-28 14:44:48 +01:00
Viktor Lofgren	a0648844fb	(grpc) Reduce error spam	2024-02-28 14:35:29 +01:00
Viktor Lofgren	c4a27003c6	(docs) Fix formatting	2024-02-28 14:22:57 +01:00
Viktor Lofgren	86bbc1043e	(service) Clean up thread pool creation	2024-02-28 14:06:32 +01:00
Viktor Lofgren	a8ec59eb75	(conf) Add migration warning when ZOOKEEPER_HOSTS is not set.	2024-02-28 12:09:38 +01:00
Viktor Lofgren	9f1649636e	Clean up documentation and rename `domain-links` to `link-graph`	2024-02-28 11:40:39 +01:00
Viktor Lofgren	3a65fe8917	Add offload executor to GrpcChannelPoolFactory	2024-02-27 22:08:39 +01:00
Viktor Lofgren	e696fd9e92	(docs) Begin un-fucking the docs after refactoring	2024-02-27 21:22:21 +01:00
Viktor Lofgren	eaf836dc66	(service/grpc) Reduce thread count Netty and GRPC by default spawns an incredible number of threads on high-core CPUs, which amount to a fair bit of RAM usage. Add custom executors that throttle this behavior.	2024-02-27 21:22:21 +01:00
Viktor Lofgren	dbf64b0987	(logs) Add the option for json logging	2024-02-27 21:22:20 +01:00
Viktor Lofgren	ff0ef1eebc	(cleanup) Minor cleanups	2024-02-24 15:33:56 +01:00
Viktor Lofgren	1d34224416	(refac) Remove src/main from all source code paths. Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one. While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's modular. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.	2024-02-23 16:13:40 +01:00
Viktor Lofgren	2201b1a506	(refac) Clean up code issues	2024-02-23 11:39:19 +01:00
Viktor Lofgren	5cdb07023b	(refac) Clean up unused imports	2024-02-23 11:27:20 +01:00
Viktor Lofgren	6357d30ea0	Clean up docs	2024-02-22 19:53:20 +01:00
Viktor Lofgren	8d4ef982d0	Clean up docs	2024-02-22 19:37:59 +01:00
Viktor Lofgren	4740156cfa	Clean up docs	2024-02-22 18:18:58 +01:00
Viktor Lofgren	085137ca63	* Extract the index functionality	2024-02-22 17:31:25 +01:00
Viktor Lofgren	66c1281301	(zk-registry) epic jak shaving WIP Cleaning out a lot of old junk from the code, and one thing lead to another... * Build is improved, now constructing docker images with 'jib'. Clean build went from 3 minutes to 50 seconds. * The ProcessService's spawning is smarter. Will now just spawn a java process instead of relying on the application plugin's generated outputs. * Project is migrated to GraalVM * gRPC clients are re-written with a neat fluent/functional style. e.g. ```channelPool.call(grpcStub::method) .async(executor) // <-- optional .run(argument); ``` This change is primarily to allow handling ManagedChannel errors, but it turned out to be a pretty clean API overall. * For now the project is all in on zookeeper * Service discovery is now based on APIs and not services. Theoretically means we could ship the same code either a monolith or a service mesh. * To this end, began modularizing a few of the APIs so that they aren't strongly "living" in a service. WIP! Missing is documentation and testing, and some more breaking apart of code.	2024-02-22 14:01:23 +01:00
Viktor Lofgren	73947d9eca	(zk-registry) Filter out phantom addresses in the registry The change adds a hostname validation step to remove endpoints from the ZkServiceRegistry when they do not resolve. This is a scenario that primarily happens when running in docker, and the entire system is started and stopped.	2024-02-20 18:09:11 +01:00
Viktor Lofgren	a69c0b2718	(grpc-client) Fix warmup crash The warmup would sometimes crash during a cold start-up, because it could not get an API. Changed the warmup to just create a GrpcSingleNodeChannelPool for the node.	2024-02-20 18:03:57 +01:00
Viktor Lofgren	6c764bceeb	(doc) Update documentation for `service-discovery`	2024-02-20 16:09:49 +01:00
Viktor Lofgren	273aeb7bae	(doc) Update documentation with new gRPC service setup	2024-02-20 16:06:05 +01:00
Viktor Lofgren	d185858266	(minor) Add missing query parameter to ServiceEndpoint.toURL	2024-02-20 15:49:43 +01:00
Viktor Lofgren	453bd6064b	(minor) Add warm-up to GrpcMultiNodeChannelPool to speed up the initial messages Without doing this, connections would be created lazily, which is probably never desirable.	2024-02-20 15:45:16 +01:00
Viktor Lofgren	ee8e0497ae	(refac) Move service discovery injection to a separate guice module	2024-02-20 15:41:04 +01:00
Viktor Lofgren	30bdb4b4e9	(config) Clean up service configuration for IP addresses Adds new ways to configure the bind and external IP addresses for a service. Notably, if the environment variable WMSA_IN_DOCKER is present, the system will grab the HOSTNAME variable and announce that as the external address in the service registry. The default bind address is also changed to be 0.0.0.0 only if WMSA_IN_DOCKER is present, otherwise 127.0.0.1; as this is a more secure default.	2024-02-20 14:22:48 +01:00
Viktor Lofgren	2ee492fb74	(gRPC) Bind gRPC services to an interface By default gRPC it magically decides on an interface. The change will explicitly tell it what to use.	2024-02-20 14:22:47 +01:00
Viktor Lofgren	36a5c8b44c	(cleanup) Clean up code	2024-02-20 14:22:47 +01:00
Viktor Lofgren	07b625c58d	(query-client) Add support for fault-tolerant requests to single node services Adding a method importantCall that will retry a failing request on each route until it succeeds or the routes run out.	2024-02-20 14:16:05 +01:00
Viktor Lofgren	746a865106	(client) Fix handling of channel refreshes The previous code made an incorrect assumption that all routes refer to the same node, and would overwrite the route list on each update. This lead to storms of closing and opening channels whenever an update was received. The new code is correctly aware that we may talk to multiple nodes.	2024-02-20 14:14:09 +01:00
Viktor	f85ec28a16	Merge branch 'master' into service-discovery	2024-02-20 11:44:12 +01:00
Viktor Lofgren	0307c55f9f	(refac) Zookeeper for service-discovery, kill service-client lib (WIP) To avoid having to either hard-code or manually configure service addresses (possibly several dozen), and to reduce the project's dependency on docker to deal with routing and discovery, the option to use [Zookeeper](https://zookeeper.apache.org/) to manage services and discovery has been added. A service registry interface was added, with a Zookeeper implementation and a basic implementation that only works on docker and hard-codes everything. The last remaining REST service, the assistant-service, has been migrated to gRPC. This also proved a good time to clear out primordial technical debt from the root of the codebase. The 'service-client' library has been taken behind the barn and given a last farewell. It's replaced by a small library for managing gRPC channels. Since it's no longer used by anything, RxJava has been removed as a dependency from the project. Although the current state seems reasonably stable, this is a work-in-progress commit.	2024-02-20 11:41:14 +01:00
Viktor	d05c916491	Merge pull request #80 from MarginaliaSearch/ranking-algorithms Clean up domain ranking code	2024-02-18 09:52:34 +01:00
Viktor Lofgren	e61e7f44b9	(blacklist) Delay startup of blacklist To help services start faster, the blacklist will no longer block until it's loaded. If such a behavior is desirable, a method was added to explicitly wait for the data.	2024-02-18 09:23:20 +01:00
Viktor Lofgren	f9b6ac03c6	(api) Clean up incorrect error handling in GrpcChannelPool	2024-02-18 08:45:35 +01:00
Viktor Lofgren	296ccc5f8e	(blacklist) Clean up blacklist impl The domain blacklist blocked the start-up of each process that injected it, adding like 30 seconds to the start-up time in prod. This change moves the loading to a separate thread entirely. For threads or processes that require the blacklist to be definitely loaded, a helper method was added that blocks until that time.	2024-02-18 08:16:48 +01:00
Viktor Lofgren	92717a4832	(client) Refactor GrpcStubPool to handle error states Refactored the GRPC Stub Pool for better handling of channel SHUTDOWN state. Any disconnected channels are now re-created before returning the stub. The class was also renamed to GrpcChannelPool, as we no longer pool the stubs.	2024-02-17 14:42:26 +01:00
Viktor Lofgren	9ec262ae00	(domain-ranking) Integrate new ranking logic The change deprecates the 'algorithm' field from the domain ranking set configuration. Instead, the algorithm will be chosen based on whether influence domains are provided, and whether similarity data is present.	2024-02-16 20:22:01 +01:00
Viktor Lofgren	b15f47d80e	(db) Retire the EC_DOMAIN_LINK table Retire the EC_DOMAIN_LINK table as the data has been migrated off into a file instead.	2024-02-08 15:52:30 +01:00
Viktor	e8de468b0b	Make executor API talk GRPC (#75 ) * (executor-api) Make executor API talk GRPC The executor's REST API was very fragile and annoying to work with, lacking even basic type safety. Migrate to use GRPC instead. GRPC is a bit of a pain with how verbose it is, but that is probably a lesser evil. This is a fairly straightforward change, but it's also large so a solid round of testing is needed... The change set breaks out the GrpcStubPool previously residing in the QueryService, and makes it available to all clients. ServiceId.name was also renamed to avoid the very dangerous clash with Enum.name(). The boilerplate needed for grpc was also extracted into a common gradle file for inclusion into the appropriate build.gradle-files.	2024-02-08 13:01:12 +01:00
Viktor Lofgren	467ba5be20	(index-construction) Split repartition into two actions This change splits the previous 'repartition' action into two steps, one for recalculating the domain rankings, and one for recalculating the other ranking sets. Since only the first is necessary before the index construction, the rest can be delayed until after... To avoid issues in handling the shotgun blast of MqNotifications, Service was switched over to use a synchronous message queue instead of an asynchronous one. The change also modifies the behavior so that only node 1 will push the changes to the EC_DOMAIN database table, to avoid unnecessary db locks and contention with the loader. Additionally, the change fixes a bug where the index construction code wasn't actually picking up the rankings data. Since the index construction used to be performed by the index-service, merely saving the data to memory was enough for it to be accessible within the index-construction logic, but since it's been broken out into a separate process, the new process just injected an empty DomainRankings object instead. To fix this, DomainRankings can now be persisted to disk, and a pre-loaded version of the object is injected into the index-construction process.	2024-02-06 17:20:07 +01:00
Viktor Lofgren	98f3382cea	(minor) Fix test and improve error message	2024-01-31 11:53:41 +01:00
Viktor Lofgren	cae1bad274	(*) Add download-sample action, refactor file storage This changeset adds an action for downloading a set of sample data from downloads.marginalia.nu. It also refactors out some leaky abstractions out of FileStorageService. allocateTemporaryStorage has been renamed allocateStorage. The storage was never temporary in any scenario... It also doesn't take a storage base, as there was always only one valid option for this input. The allocateStorage method finds the appropriate base itself.	2024-01-25 13:36:30 +01:00
Viktor Lofgren	c088c25b09	(*) Fix broken test, clean up code	2024-01-24 12:50:41 +01:00
Viktor Lofgren	958d64720e	(control) Add a view for restarting aborted processes This will avoid having to dig in the message queue to perform this relatively common task. The control service was also refactored to extract common timestamp formatting logic out of the data objects and into the rendering.	2024-01-24 12:47:10 +01:00
Viktor Lofgren	805afad4fe	(control) New GUI for exporting crawl data samples Not going to win any beauty pageants, but this is pretty peripheral functionality.	2024-01-23 17:08:21 +01:00
Viktor Lofgren	c5760cd535	(test) Fix broken test	2024-01-20 13:39:40 +01:00
Viktor Lofgren	6271d5d544	(mq) Add relation tracking between MQ messages for easier tracking and debugging. The change adds a new column to the MESSAGE_QUEUE table called AUDIT_RELATED_ID. This field is populated transparently, using a dictionary mapping Thread IDs to Message IDs, populated by the inbox handlers. The existing RELATED_ID field has too many semantics associated with them, among other things the FSM code uses them this field in tracking state changes. The change set also improves the consistency of inbox names. The IndexClient was buggy and populated its outbox with a UUID. This is fixed. All Service2Service outboxes are now prefixed with 'pp:' to make them even easier to differentiate.	2024-01-18 15:08:27 +01:00
Viktor Lofgren	01b312f14c	(*) Make new index nodes accept queries by default It's a confusing default behavior. This was off for nodes n>1 before as a bandaid since querying indices with no data caused delays and errors. This has been fixed now, so there's no need to do this anymore!	2024-01-18 12:05:37 +01:00
Viktor Lofgren	2fe5705542	(control) GUI for ranking sets Still missing is some polish, forms don't have proper labels, validation is inconsistent, no error messages, etc.	2024-01-16 17:10:09 +01:00
Viktor Lofgren	36ad4c7466	(db) Add a new configuration object 'domain ranking set' for storing ranking parameters	2024-01-16 12:34:00 +01:00
Viktor Lofgren	4c62065e74	(install) Add two separate templates for the install script One template is for the full Marginalia Search style install, and the other is for a barebones install with no Marginalia-related fluff.	2024-01-13 18:27:42 +01:00
Viktor Lofgren	d28fc99119	(MainClass) ensure logging isn't loaded before service name is known This causes logs all to have names like ${sys:service-name}, instead of the service name...	2024-01-13 18:19:50 +01:00
Viktor Lofgren	7c6e18f7a7	(*) Overhaul settings and properties Use a system.properties file to configure the system. This is loaded statically by MainClass or ProcessMainClass. Update the property names to be more consistent, and update the documentations to reflect the changes.	2024-01-13 17:12:18 +01:00
Viktor Lofgren	ecd9c35233	(control) Clean up the event log * Generate fewer uninteresting event messages. * Display fewer irrelevant fields in the overview table.	2024-01-13 13:28:02 +01:00
Viktor Lofgren	8dea7217a6	(control) UX fixes, node GUI doesn't break when an executor service goes offline.	2024-01-13 12:17:30 +01:00
Viktor Lofgren	708a741960	(test) Clean up test usage of migrations Several tests were manually running migrations in a large copy-paste blob of code. This makes the test less useful as it's possible to break the code while keeping the tests green by introducing a new migration that never gets run in the tests, and it's also difficult to reason about what the tests are doing. A new test helper library is introduced with a TestMigrationLoader that can both run Flyway migrations, or load specific migrations in the cases a specific set of migrations need to be loaded. Existing tests are migrated to use the new code.	2024-01-12 15:55:50 +01:00
Viktor Lofgren	0caef1b307	(warc) Toggle for saving WARC data Add a toggle for saving the WARC data generated by the search engine's crawler. Normally this is discarded, but for debugging or archival purposes, retaining it may be of interest. The warc files are concatenated into larger archives, up to about 1 GB each. An index is also created containing filenames, domain names, offsets and sizes to help navigate these larger archives. The warc data is saved in a directory warc/ under the crawl data storage.	2024-01-12 13:45:14 +01:00
Viktor Lofgren	264e2db539	(control) UX-improvements for control service This commit overhauls a lot of the UX for the control service, adding a new actions menu to the nodes views. It has many small tweaks to make the work flow better. It also adds a new /uploads directory in each index node, from which sideloaded data can be selected. This is a bit of a breaking change, as this directory needs to exist in each index node.	2024-01-12 12:33:05 +01:00
Viktor Lofgren	734996002c	(*) install script for deploying Marginalia outside the codebase The changeset also makes the control service responsible for flyway migrations. This helps reduce the number of places the database configuration needs to be spread out. These automatic migrations can be disabled with -DdisableFlyway=true. The commit also adds curl to the docker container, to enable docker health checks and interdependencies.	2024-01-11 12:40:03 +01:00
Viktor Lofgren	55c9501e57	(search) Serve proper content type for static resources	2024-01-10 10:46:51 +01:00
Viktor Lofgren	f592c9f04d	(search) Fix acknowledgement page for domain complaints rendering as plain text This was caused by incorrect usage of the renderInto() function, which was always buggy and should never be used. This method is removed with this change.	2024-01-10 09:26:34 +01:00
Viktor Lofgren	fbad625126	(linkdb) Add delegating implementation of DomainLinkDb This facilitates switching between SQL and File-backed implementations on the fly while migrating from one to the other.	2024-01-08 19:56:33 +01:00
Viktor Lofgren	edc1acbb7e	(*) Replace EC_DOMAIN_LINK table with files and in-memory caching The EC_DOMAIN_LINK MariaDB table stores links between domains. This is problematic, as both updating and querying this table is very slow in relation to how small the data is (~10 GB). This slowness is largely caused by the database enforcing ACID guarantees we don't particularly need. This changeset replaces the EC_DOMAIN_LINK table with a file in each index node containing 32 bit integer pairs corresponding to links between two domains. This file is loaded in memory in each node, and can be queried via the Query Service. A migration step is needed before this file is created in each node. Until that happens, the actual data is loaded from the EC_DOMAIN_LINK table, but accessed as though it was a file. The changeset also migrates/renames the links.db file to documents.db to avoid naming confusion between the two.	2024-01-08 15:53:13 +01:00
Viktor Lofgren	6aee27a3f1	(*) Fix bug in EdgeDomain where it would permit domains with a trailing period, DNS style.	2023-12-29 16:36:01 +01:00
Viktor Lofgren	a7cd490593	(minor) Remove dead code.	2023-12-19 18:58:33 +01:00
Viktor Lofgren	bde68ba48b	Merge branch 'master' into asn-info	2023-12-17 14:00:23 +01:00
Viktor Lofgren	bf44805e69	(*) Rename EdgeDomain$domain into topDomain This variable had a very confusing name, and was dangerously easy to use in the wrong place with the result of getting something that only works as expected half the time. Ideally this class needs an overhaul, the assumptions it makes about domain names aren't great.	2023-12-17 14:00:07 +01:00
Viktor Lofgren	bcad6492d6	(sideloader) Fix integration problems with sideloaders In encyclopedia, add a class "mw-content-text" that the WikiSpecialization class is looking for during pruning to give the articles a more fair treatment. Also add generator keywords based on the generator type provided, to ensure that these documents show up in appropriate filters. Further, add a new document flag value 'Sideloaded' to be able to distinguish these entries.	2023-12-17 13:28:17 +01:00
Viktor Lofgren	d7bd540683	(*) Replace the ip2location IP geolocation data with ASN information from apnic.net. Doesn't really make sense to use ip2location as a middle man for information that is already freely available...	2023-12-16 21:55:04 +01:00
Viktor Lofgren	b74a3ebd85	(crawler) WIP integration of WARC files into the crawler process. At this stage, the crawler will use the WARCs to resume a crawl if it terminates incorrectly. This is a WIP commit, since the warc files are not fully incorporated into the work flow, they are deleted after the domain is crawled. The commit also includes fairly invasive refactoring of the crawler classes, to accomplish better separation of concerns.	2023-12-11 19:32:58 +01:00
Viktor Lofgren	45987a1d98	Merge branch 'master' into warc	2023-12-11 14:32:35 +01:00

1 2 3 4 5 ...

344 Commits