mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-23 13:09:00 +00:00
data:image/s3,"s3://crabby-images/c765d/c765d5283f4176ac41b612e7ae83ed62e7ddf9a1" alt="Viktor Lofgren"
This commit extracts several previously hardcoded configuration properties, and makes then available through system.properties. The documentation is updated to reflect the change. Dead code was also removed in the process. CrawlSpecGenerator is left feeling a bit over-engineered still, since it's built for a more general case, where all other implementations but the current one are removed, but we'll leave it like this for now as it's fairly readable still.
3.1 KiB
3.1 KiB
System Properties
These are JVM system properties used by each service. These properties can either
be loaded from a file or passed in as command line arguments, using $JAVA_OPTS
.
The system will look for a properties file in conf/properties/system.properties
,
within the install dir, as specified by $WMSA_HOME
.
A template is available in ../run/template/conf/properties/system.properties.
Global
flag | values | description |
---|---|---|
blacklist.disable | boolean | Disables the IP blacklist |
flyway.disable | boolean | Disables automatic Flyway migrations |
Crawler Properties
flag | values | description |
---|---|---|
crawler.userAgentString | string | Sets the user agent string used by the crawler |
crawler.userAgentIdentifier | string | Sets the user agent identifier used by the crawler, e.g. what it looks for in robots.txt |
crawler.poolSize | integer | Sets the number of threads used by the crawler, more is faster, but uses more RAM |
crawler.initialUrlsPerDomain | integer | Sets the initial number of URLs to crawl per domain (when crawling from spec) |
crawler.maxUrlsPerDomain | integer | Sets the maximum number of URLs to crawl per domain (when recrawling) |
crawler.minUrlsPerDomain | integer | Sets the minimum number of URLs to crawl per domain (when recrawling) |
crawler.crawlSetGrowthFactor | double | If 100 documents were fetched last crawl, increase the goal to 100 x (this value) this time |
ip-blocklist.disabled | boolean | Disables the IP blocklist |
Converter Properties
flag | values | description |
---|---|---|
converter.sideloadThreshold | integer | Threshold value, in number of documents per domain, where a simpler processing method is used which uses less RAM. 10,000 is a good value for ~32GB RAM |
Marginalia Application Specific
flag | values | description |
---|---|---|
search.websiteUrl | string | Overrides the website URL used in rendering |
control.hideMarginaliaApp | boolean | Hides the Marginalia application from the control GUI results |