Viktor Lofgren
616649f040
(logs) Fix logdir location
2024-05-04 11:40:59 +02:00
Viktor
ac3c692b5f
Merge pull request #92 from MarginaliaSearch/no-docker-v2
...
(WIP) Changes to make the system runnable outside of docker
2024-05-01 13:00:56 +02:00
Viktor Lofgren
6087f9635c
(qs) Move index.html out of public directory
...
It was put there to simulate the /public interface paradigm that is now deprecated.
2024-05-01 12:56:12 +02:00
Viktor Lofgren
2ad0bfda1e
(*) Fix boot orchestration for the services
...
This corrects an annoying bug that had the system crash and burn on first start-up due to a race condition in service initialization, where the services were attempting to access the database before it was properly migrated.
A fix was in principle already in place, but it was running too late and did not prevent attempts to access the as-yet uninitialized database. Move the first boot check into the MainClass instead of the Service constructor.
The change also adds more appropriate docker dependencies to the services to fix rare errors resolving the hostname of the database.
2024-05-01 12:39:48 +02:00
Viktor Lofgren
cf8b12bcdc
Update install.sh with refined service descriptions
2024-05-01 12:07:30 +02:00
Viktor Lofgren
08f8b6e022
(system) Log loaded properties to the console
2024-04-30 18:29:11 +02:00
Viktor Lofgren
800ed6b1e9
(zk) Terminately immediately if zookeeper isn't found
...
This makes debugging easier
2024-04-30 18:28:49 +02:00
Viktor Lofgren
df93e57a9a
(install) Add new option to install locally outside of docker
2024-04-30 18:28:21 +02:00
Viktor Lofgren
908535a3a0
(single-service) Ensure single-service spawner can specify the node
2024-04-30 18:27:46 +02:00
Viktor Lofgren
7fe2ab6f39
(file-storage) Ensure file storage root location can be overridden when running outside of docker
2024-04-30 18:26:15 +02:00
Viktor Lofgren
c9ee0c909e
(download-sample) Set +x permissions on directories created during this job
2024-04-30 18:25:07 +02:00
Viktor Lofgren
38aedb50ac
(converter) Do not suppress exceptions in the converter
2024-04-30 18:24:35 +02:00
Viktor Lofgren
4772e0b59d
(service) Deprecate /public prefix on HTTP
...
Before the gRPC migration, the system would serve both public and internal requests over HTTP, but distinguish the two using path prefixes and a few HTTP Headers (X-Public, X-Context) added by the reverse proxy to prevent misconfigurations.
Since internal requests meaningfully no longer use HTTP, this convention is just an obstacle now, adding the need to always run the system behind a reverse proxy that rewrites the paths.
The change removes the path prefix, and updates the docker templates to reflect the change. This will require a migration for existing systems.
2024-04-30 14:46:18 +02:00
Viktor Lofgren
9c49e876d5
(conf) Update the setup.sh script to also be able to perform model upgrades
2024-04-29 17:46:20 +02:00
Viktor Lofgren
152007cd5c
(docker) Add missing zookeeper service to full marginalia config
2024-04-29 11:44:53 +02:00
Viktor Lofgren
70e2e41955
(crawler) Content type prober should not swallow exceptions
2024-04-27 18:27:23 +02:00
Viktor Lofgren
4d71c776fc
(crawler) Modify crawl set growth to grow small domains faster than larger ones
2024-04-27 17:36:27 +02:00
Viktor
0f41105436
Merge pull request #90 from MarginaliaSearch/run-outside-docker
...
Run outside of Docker
2024-04-25 18:55:26 +02:00
Viktor
2d49071e96
Merge branch 'master' into run-outside-docker
2024-04-25 18:53:26 +02:00
Viktor Lofgren
89889ecbbd
(single-service) Skip starting Prometheus if it's not explicitly enabled
2024-04-25 17:54:07 +02:00
Viktor Lofgren
41576e74d4
(doc) Clean up ROADMAP.md
2024-04-25 15:53:46 +02:00
Viktor Lofgren
c8ee354d0b
(log) Make log dir configurable via environment variable
2024-04-25 15:09:18 +02:00
Viktor Lofgren
4e5f069809
(build) Migrate ssr to the new root setting schema of java lang version
2024-04-25 15:08:56 +02:00
Viktor Lofgren
6690e9bde8
(service) Ensure the service discovery starts early
...
This is necessary as we use zookeeper to orchestrate first-time startup of the services, to ensure that the database is properly migrated by the control service before anything else is permitted to start.
2024-04-25 15:08:33 +02:00
Viktor Lofgren
e4b34b6ee6
(index) Correctly detect the presence of an all-virtual path through the query
2024-04-25 14:01:46 +02:00
Viktor Lofgren
3952ef6ca5
(service) Let singleservice configure ports and bind addresses
2024-04-25 13:49:57 +02:00
Viktor Lofgren
463d333846
(proj) Add ROADMAP.md
2024-04-25 13:07:35 +02:00
Viktor Lofgren
7eb5e6aa66
(crawler) Abort recrawl if error count is too high
2024-04-24 21:46:40 +02:00
Viktor Lofgren
282022d64e
(crawler) Remove unnecessary double-fetch of the root document
2024-04-24 14:44:39 +02:00
Viktor Lofgren
91a98a8807
(crawler) Reduce log noise from timeouts in SoftIfModifiedSinceProber
2024-04-24 14:44:39 +02:00
Viktor Lofgren
32fe864a33
(build) Java 22 and its consequences has been a disaster for Marginalia Search
...
Roll back to JDK 21 for now, and make Java version configurable in the root build.gradle
The project has run into no less than three distinct show-stopping bugs in JDK22, across multiple vendors, and gradle still doesn't fully support it, meaning you need multiple JDK versions installed.
2024-04-24 14:44:39 +02:00
Viktor Lofgren
e1c9313396
(crawler) Emulate if-modified-since for domains that don't support the header
...
This will help reduce the strain on some server software, in particular Discourse.
2024-04-24 14:44:39 +02:00
Viktor Lofgren
f430a084e8
(crawler) Remove accidental log spam
2024-04-24 14:44:39 +02:00
Viktor Lofgren
a86b596897
(crawler) Code quality
2024-04-24 14:44:39 +02:00
Viktor Lofgren
6dd87b0378
(crawler) Use the probe-result to reduce the likelihood of crawling both http and https
...
This should drastically reduce the number of fetched documents on many domains
2024-04-24 14:44:39 +02:00
Viktor Lofgren
c9f029c214
(crawler) Strip W/-prefix from the etag when supplied as If-None-Match
2024-04-24 14:44:39 +02:00
Viktor Lofgren
6b88db10ad
(crawler) Ensure all appropriate headers are recorded on the request
2024-04-24 14:44:39 +02:00
Viktor Lofgren
8a891c2159
(crawler/converter) Remove legacy junk from parquet migration
2024-04-24 14:44:39 +02:00
Viktor Lofgren
ad2ac8eee3
(query) Mark flaky test, correct assert on test
2024-04-24 14:44:39 +02:00
Viktor Lofgren
f46733a47a
(ranking) TermCoherenceFactory should be run for size=2 queries
2024-04-24 14:44:39 +02:00
Viktor Lofgren
934167323d
(converter) Stopgap fix for some cases of lost crawl data due to HTTP 304. The root cause needs further investigation.
2024-04-24 14:44:39 +02:00
Viktor Lofgren
64baa41e64
(query) Always generate an ngram alternative, suppresses generation of multiple identical query branches
2024-04-24 14:44:39 +02:00
Viktor Lofgren
5165cf6d15
(ranking) Set regularMask correctly
2024-04-24 14:44:39 +02:00
Viktor Lofgren
4489b21528
(ranking) Cleanup
2024-04-24 14:44:39 +02:00
Viktor Lofgren
f623b37577
(ranking) Suppress NaN:s in ranking output
2024-04-24 14:44:39 +02:00
Viktor Lofgren
f4a2fea451
(ranking, bugfix) Use bm25NgramWeight and not full weight for bM25N
2024-04-24 14:44:39 +02:00
Viktor Lofgren
a748fc5448
(index, bugfix) Pass url quality to query service
2024-04-24 14:44:39 +02:00
Viktor Lofgren
0dcca0cb83
(index) Fix TCF bug where the ngram terms would be considered instead of the regular ones due to a logical derp
2024-04-24 14:44:39 +02:00
Viktor Lofgren
b80a83339b
(qs) Additional info in query debug UI
2024-04-24 14:44:39 +02:00
Viktor Lofgren
eb74d08f2a
(qs) Additional info in query debug UI
2024-04-24 14:44:39 +02:00