From e696fd9e92f30c63f9600f86e61c9b99b96f01ab Mon Sep 17 00:00:00 2001 From: Viktor Lofgren Date: Tue, 27 Feb 2024 21:15:49 +0100 Subject: [PATCH] (docs) Begin un-fucking the docs after refactoring --- code/common/db/readme.md | 4 ++-- code/common/linkdb/readme.md | 14 +++++++------- code/common/model/readme.md | 12 ++++++------ code/common/renderer/readme.md | 2 +- code/common/service-discovery/readme.md | 10 +++++----- code/common/service/readme.md | 4 ++-- code/features-convert/adblock/readme.md | 2 +- code/features-convert/data-extractors/readme.md | 6 +++--- .../keyword-extraction/readme.md | 4 ++-- code/features-convert/pubdate/readme.md | 2 +- .../summary-extraction/readme.md | 2 +- code/features-crawl/crawl-blocklist/readme.md | 6 +++--- code/features-crawl/link-parser/readme.md | 2 +- code/index/index-forward/readme.md | 8 ++++---- code/index/index-journal/readme.md | 10 +++++----- code/index/index-reverse/readme.md | 6 +++--- code/index/query/readme.md | 6 +++--- code/index/readme.md | 10 +++++----- code/libraries/array/readme.md | 6 +++--- .../braille-block-punch-cards/readme.md | 2 +- code/libraries/btree/readme.md | 6 +++--- code/libraries/easy-lsh/readme.md | 2 +- code/libraries/guarded-regex/readme.md | 2 +- code/libraries/language-processing/readme.md | 4 ++-- code/libraries/random-write-funnel/readme.md | 8 ++++---- code/libraries/term-frequency-dict/readme.md | 2 +- code/process-models/crawl-spec/readme.md | 6 +++--- code/process-models/crawling-model/readme.md | 16 ++++++++-------- code/process-models/processed-data/readme.md | 10 +++++----- code/processes/converting-process/readme.md | 14 +++++++------- code/processes/crawling-process/readme.md | 6 +++--- .../index-constructor-process/readme.md | 2 +- code/processes/loading-process/readme.md | 2 +- code/services-application/api-service/readme.md | 2 +- .../search-service/readme.md | 6 +++--- code/services-core/assistant-service/readme.md | 2 +- code/services-core/control-service/readme.md | 2 +- code/services-core/executor-service/readme.md | 2 +- code/services-core/query-service/readme.md | 2 +- 39 files changed, 107 insertions(+), 107 deletions(-) diff --git a/code/common/db/readme.md b/code/common/db/readme.md index ae683741..07b6191c 100644 --- a/code/common/db/readme.md +++ b/code/common/db/readme.md @@ -17,14 +17,14 @@ It's well documented and these are probably the only four tasks you'll ever need If you are not running the system via docker, you need to provide alternative connection details than the defaults (TODO: how?). -The migration files are in [resources/db/migration](src/main/resources/db/migration). The file name convention +The migration files are in [resources/db/migration](resources/db/migration). The file name convention incorporates the project's cal-ver versioning; and are applied in lexicographical order. VYY_MM_v_nnn__description.sql ## Central Paths -* [migrations](src/main/resources/db/migration) - Flyway migrations +* [migrations](resources/db/migration) - Flyway migrations ## See Also diff --git a/code/common/linkdb/readme.md b/code/common/linkdb/readme.md index ab86b931..b5a4e8fe 100644 --- a/code/common/linkdb/readme.md +++ b/code/common/linkdb/readme.md @@ -4,11 +4,11 @@ The domain link database contains information about links between domains. It is a static in-memory database loaded from a binary file. -* [DomainLinkDb](src/main/java/nu/marginalia/linkdb/DomainLinkDb.java) -* * [FileDomainLinkDb](src/main/java/nu/marginalia/linkdb/FileDomainLinkDb.java) -* * [SqlDomainLinkDb](src/main/java/nu/marginalia/linkdb/SqlDomainLinkDb.java) -* [DomainLinkDbWriter](src/main/java/nu/marginalia/linkdb/DomainLinkDbWriter.java) -* [DomainLinkDbLoader](src/main/java/nu/marginalia/linkdb/DomainLinkDbLoader.java) +* [DomainLinkDb](java/nu/marginalia/linkdb/DomainLinkDb.java) +* * [FileDomainLinkDb](java/nu/marginalia/linkdb/FileDomainLinkDb.java) +* * [SqlDomainLinkDb](java/nu/marginalia/linkdb/SqlDomainLinkDb.java) +* [DomainLinkDbWriter](java/nu/marginalia/linkdb/DomainLinkDbWriter.java) +* [DomainLinkDbLoader](java/nu/marginalia/linkdb/DomainLinkDbLoader.java) ## Document Database @@ -21,8 +21,8 @@ is not in the MariaDB database is that this would make updates to this information take effect in production immediately, even before the information was searchable. -* [DocumentLinkDbWriter](src/main/java/nu/marginalia/linkdb/DocumentDbWriter.java) -* [DocumentLinkDbLoader](src/main/java/nu/marginalia/linkdb/DocumentDbReader.java) +* [DocumentLinkDbWriter](java/nu/marginalia/linkdb/DocumentDbWriter.java) +* [DocumentLinkDbLoader](java/nu/marginalia/linkdb/DocumentDbReader.java) ## See Also diff --git a/code/common/model/readme.md b/code/common/model/readme.md index 84337753..d07bb4fa 100644 --- a/code/common/model/readme.md +++ b/code/common/model/readme.md @@ -4,9 +4,9 @@ This package contains common models to the search engine ## Central Classes -* [EdgeDomain](src/main/java/nu/marginalia/model/EdgeDomain.java) -* [EdgeUrl](src/main/java/nu/marginalia/model/EdgeUrl.java) -* [DocumentMetadata](src/main/java/nu/marginalia/model/idx/DocumentMetadata.java) -* [DocumentFlags](src/main/java/nu/marginalia/model/idx/DocumentFlags.java) -* [WordMetadata](src/main/java/nu/marginalia/model/idx/WordMetadata.java) -* [WordFlags](src/main/java/nu/marginalia/model/idx/WordFlags.java) \ No newline at end of file +* [EdgeDomain](java/nu/marginalia/model/EdgeDomain.java) +* [EdgeUrl](java/nu/marginalia/model/EdgeUrl.java) +* [DocumentMetadata](java/nu/marginalia/model/idx/DocumentMetadata.java) +* [DocumentFlags](java/nu/marginalia/model/idx/DocumentFlags.java) +* [WordMetadata](java/nu/marginalia/model/idx/WordMetadata.java) +* [WordFlags](java/nu/marginalia/model/idx/WordFlags.java) \ No newline at end of file diff --git a/code/common/renderer/readme.md b/code/common/renderer/readme.md index 3c34830e..ff80af06 100644 --- a/code/common/renderer/readme.md +++ b/code/common/renderer/readme.md @@ -4,4 +4,4 @@ Renders handlebar-style templates for the user-facing services. ## Central Classes -* [Mustache Renderer](src/main/java/nu/marginalia/renderer/MustacheRenderer.java) \ No newline at end of file +* [Mustache Renderer](java/nu/marginalia/renderer/MustacheRenderer.java) \ No newline at end of file diff --git a/code/common/service-discovery/readme.md b/code/common/service-discovery/readme.md index 7e6ab016..5e9fe24a 100644 --- a/code/common/service-discovery/readme.md +++ b/code/common/service-discovery/readme.md @@ -71,11 +71,11 @@ lifecycle, listen to lifecycle notifications and so on. ## gRPC Channel Pool -From the [GrpcChannelPoolFactory](src/main/java/nu/marginalia/service/client/GrpcChannelPoolFactory.java), two types of channel pools can be created +From the [GrpcChannelPoolFactory](java/nu/marginalia/service/client/GrpcChannelPoolFactory.java), two types of channel pools can be created that are aware of the service registry: -* [GrpcMultiNodeChannelPool](src/main/java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java) - This pool permits 1-n style communication with partitioned services -* [GrpcSingleNodeChannelPool](src/main/java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java) - This pool permits 1-1 style communication with non-partitioned services. +* [GrpcMultiNodeChannelPool](java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java) - This pool permits 1-n style communication with partitioned services +* [GrpcSingleNodeChannelPool](java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java) - This pool permits 1-1 style communication with non-partitioned services. if multiple instances are running, it will use one of them and fall back to another if the first is not available. @@ -145,5 +145,5 @@ Future> response = channelPool ### Central Classes -* [ServiceRegistryIf](src/main/java/nu/marginalia/service/discovery/ServiceRegistryIf.java) -* [ZkServiceRegistry](src/main/java/nu/marginalia/service/discovery/ZkServiceRegistry.java) \ No newline at end of file +* [ServiceRegistryIf](java/nu/marginalia/service/discovery/ServiceRegistryIf.java) +* [ZkServiceRegistry](java/nu/marginalia/service/discovery/ZkServiceRegistry.java) \ No newline at end of file diff --git a/code/common/service/readme.md b/code/common/service/readme.md index 04d216e2..14abfb07 100644 --- a/code/common/service/readme.md +++ b/code/common/service/readme.md @@ -50,5 +50,5 @@ Further the new service needs to be added to the `ServiceId` enum in [service-di ## Central Classes -* [MainClass](src/main/java/nu/marginalia/service/MainClass.java) bootstraps all executables -* [Service](src/main/java/nu/marginalia/service/server/Service.java) base class for all services. \ No newline at end of file +* [MainClass](java/nu/marginalia/service/MainClass.java) bootstraps all executables +* [Service](java/nu/marginalia/service/server/Service.java) base class for all services. \ No newline at end of file diff --git a/code/features-convert/adblock/readme.md b/code/features-convert/adblock/readme.md index 1df54936..32919300 100644 --- a/code/features-convert/adblock/readme.md +++ b/code/features-convert/adblock/readme.md @@ -5,4 +5,4 @@ uses it to identify if a document has ads. ## Central Classes -* [AdblockSimulator](src/main/java/nu/marginalia/adblock/AdblockSimulator.java) \ No newline at end of file +* [AdblockSimulator](java/nu/marginalia/adblock/AdblockSimulator.java) \ No newline at end of file diff --git a/code/features-convert/data-extractors/readme.md b/code/features-convert/data-extractors/readme.md index d8c9fc0d..ea318e9f 100644 --- a/code/features-convert/data-extractors/readme.md +++ b/code/features-convert/data-extractors/readme.md @@ -2,6 +2,6 @@ Contains converter-*like* extraction jobs that operate on crawled data to produc ## Important classes -* [AtagExporter](src/main/java/nu/marginalia/extractor/AtagExporter.java) - extracts anchor texts from the crawled data. -* [FeedExporter](src/main/java/nu/marginalia/extractor/FeedExporter.java) - tries to find RSS/Atom feeds within the crawled data. -* [TermFrequencyExporter](src/main/java/nu/marginalia/extractor/TermFrequencyExporter.java) - exports the 'TF' part of TF-IDF. \ No newline at end of file +* [AtagExporter](java/nu/marginalia/extractor/AtagExporter.java) - extracts anchor texts from the crawled data. +* [FeedExporter](java/nu/marginalia/extractor/FeedExporter.java) - tries to find RSS/Atom feeds within the crawled data. +* [TermFrequencyExporter](java/nu/marginalia/extractor/TermFrequencyExporter.java) - exports the 'TF' part of TF-IDF. \ No newline at end of file diff --git a/code/features-convert/keyword-extraction/readme.md b/code/features-convert/keyword-extraction/readme.md index 17ad8600..a9c04962 100644 --- a/code/features-convert/keyword-extraction/readme.md +++ b/code/features-convert/keyword-extraction/readme.md @@ -6,8 +6,8 @@ functions based on [POS tags](https://www.ling.upenn.edu/courses/Fall_2003/ling0 ## Central Classes -* [DocumentKeywordExtractor](src/main/java/nu/marginalia/keyword/DocumentKeywordExtractor.java) -* [KeywordMetadata](src/main/java/nu/marginalia/keyword/KeywordMetadata.java) +* [DocumentKeywordExtractor](java/nu/marginalia/keyword/DocumentKeywordExtractor.java) +* [KeywordMetadata](java/nu/marginalia/keyword/KeywordMetadata.java) ## See Also diff --git a/code/features-convert/pubdate/readme.md b/code/features-convert/pubdate/readme.md index 40f28710..add657ee 100644 --- a/code/features-convert/pubdate/readme.md +++ b/code/features-convert/pubdate/readme.md @@ -4,4 +4,4 @@ Contains advanced haruspicy for figuring out when a document was published. ## Central Classes -* [PubDateSniffer](src/main/java/nu/marginalia/pubdate/PubDateSniffer.java) \ No newline at end of file +* [PubDateSniffer](java/nu/marginalia/pubdate/PubDateSniffer.java) \ No newline at end of file diff --git a/code/features-convert/summary-extraction/readme.md b/code/features-convert/summary-extraction/readme.md index 1aa38a42..b617d947 100644 --- a/code/features-convert/summary-extraction/readme.md +++ b/code/features-convert/summary-extraction/readme.md @@ -21,5 +21,5 @@ order of a 100,000,000 documents with a time budget of a couple of hours. ## Central Classes -* [SummaryExtractor](src/main/java/nu/marginalia/summary/SummaryExtractor.java) +* [SummaryExtractor](java/nu/marginalia/summary/SummaryExtractor.java) diff --git a/code/features-crawl/crawl-blocklist/readme.md b/code/features-crawl/crawl-blocklist/readme.md index 9db3912b..777f4260 100644 --- a/code/features-crawl/crawl-blocklist/readme.md +++ b/code/features-crawl/crawl-blocklist/readme.md @@ -4,6 +4,6 @@ Contains tools for blocking links from crawling. ## Central Classes -* [GeoIpBlocklist](src/main/java/nu/marginalia/ip_blocklist/GeoIpBlocklist.java) - country blocking -* [IpBlocklist](src/main/java/nu/marginalia/ip_blocklist/IpBlockList.java) - CIDR-based blocking -* [UrlBlocklist](src/main/java/nu/marginalia/ip_blocklist/UrlBlocklist.java) - URL pattern blocking \ No newline at end of file +* [GeoIpBlocklist](java/nu/marginalia/ip_blocklist/GeoIpBlocklist.java) - country blocking +* [IpBlocklist](java/nu/marginalia/ip_blocklist/IpBlockList.java) - CIDR-based blocking +* [UrlBlocklist](java/nu/marginalia/ip_blocklist/UrlBlocklist.java) - URL pattern blocking \ No newline at end of file diff --git a/code/features-crawl/link-parser/readme.md b/code/features-crawl/link-parser/readme.md index 55289227..2893ba87 100644 --- a/code/features-crawl/link-parser/readme.md +++ b/code/features-crawl/link-parser/readme.md @@ -5,4 +5,4 @@ pathological links, etc. ## Central Classes -* [LinkParser](src/main/java/nu/marginalia/link_parser/LinkParser.java) \ No newline at end of file +* [LinkParser](java/nu/marginalia/link_parser/LinkParser.java) \ No newline at end of file diff --git a/code/index/index-forward/readme.md b/code/index/index-forward/readme.md index 545fbf1e..39e272e5 100644 --- a/code/index/index-forward/readme.md +++ b/code/index/index-forward/readme.md @@ -8,8 +8,8 @@ The `id` file contains a list of sorted document ids, and the `data` file contai metadata for each document id, in the same order as the `id` file, with a fixed size record containing data associated with each document id. -Each record contains a binary encoded [DocumentMetadata](../../common/model/src/main/java/nu/marginalia/model/idx/DocumentMetadata.java) object, -as well as a [HtmlFeatures](../../common/model/src/main/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask. +Each record contains a binary encoded [DocumentMetadata](../../common/model/java/nu/marginalia/model/idx/DocumentMetadata.java) object, +as well as a [HtmlFeatures](../../common/model/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask. Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory, @@ -17,5 +17,5 @@ so it's relatively easy to construct. ## Central Classes -* [ForwardIndexConverter](src/main/java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index. -* [ForwardIndexReader](src/main/java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index. \ No newline at end of file +* [ForwardIndexConverter](java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index. +* [ForwardIndexReader](java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index. \ No newline at end of file diff --git a/code/index/index-journal/readme.md b/code/index/index-journal/readme.md index 24ed9c43..af7059b3 100644 --- a/code/index/index-journal/readme.md +++ b/code/index/index-journal/readme.md @@ -16,9 +16,9 @@ are designed to handle this transparently via their *Paging* implementation. ## Central Classes ### Model -* [IndexJournalEntry](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntry.java) -* [IndexJournalEntryHeader](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntryHeader.java) -* [IndexJournalEntryData](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntryData.java) +* [IndexJournalEntry](java/nu/marginalia/index/journal/model/IndexJournalEntry.java) +* [IndexJournalEntryHeader](java/nu/marginalia/index/journal/model/IndexJournalEntryHeader.java) +* [IndexJournalEntryData](java/nu/marginalia/index/journal/model/IndexJournalEntryData.java) ### I/O -* [IndexJournalReader](src/main/java/nu/marginalia/index/journal/reader/IndexJournalReader.java) -* [IndexJournalWriter](src/main/java/nu/marginalia/index/journal/writer/IndexJournalWriter.java) \ No newline at end of file +* [IndexJournalReader](java/nu/marginalia/index/journal/reader/IndexJournalReader.java) +* [IndexJournalWriter](java/nu/marginalia/index/journal/writer/IndexJournalWriter.java) \ No newline at end of file diff --git a/code/index/index-reverse/readme.md b/code/index/index-reverse/readme.md index a27371d6..fcc4fcfc 100644 --- a/code/index/index-reverse/readme.md +++ b/code/index/index-reverse/readme.md @@ -34,9 +34,9 @@ to form a finalized reverse index. ![Illustration of the data layout of the finalized index](index.svg) ## Central Classes -* [ReversePreindex](src/main/java/nu/marginalia/index/construction/ReversePreindex.java) intermediate reverse index state. -* [ReverseIndexConstructor](src/main/java/nu/marginalia/index/construction/ReverseIndexConstructor.java) constructs the index. -* [ReverseIndexReader](src/main/java/nu/marginalia/index/ReverseIndexReader.java) interrogates the index. +* [ReversePreindex](java/nu/marginalia/index/construction/ReversePreindex.java) intermediate reverse index state. +* [ReverseIndexConstructor](java/nu/marginalia/index/construction/ReverseIndexConstructor.java) constructs the index. +* [ReverseIndexReader](java/nu/marginalia/index/ReverseIndexReader.java) interrogates the index. ## See Also diff --git a/code/index/query/readme.md b/code/index/query/readme.md index 3334cada..7386339c 100644 --- a/code/index/query/readme.md +++ b/code/index/query/readme.md @@ -12,11 +12,11 @@ interfaces are implemented within the index-service module. ## Central Classes -* [IndexQuery](src/main/java/nu/marginalia/index/query/IndexQuery.java) -* [query/filter](src/main/java/nu/marginalia/index/query/filter/) +* [IndexQuery](java/nu/marginalia/index/query/IndexQuery.java) +* [query/filter](java/nu/marginalia/index/query/filter/) ## See Also * [index/index-reverse](../index-reverse) implements many of these interfaces. * [libraries/array](../../libraries/array) -* [libraries/array/.../LongQueryBuffer](../../libraries/array/src/main/java/nu/marginalia/array/buffer/LongQueryBuffer.java) \ No newline at end of file +* [libraries/array/.../LongQueryBuffer](../../libraries/array/java/nu/marginalia/array/buffer/LongQueryBuffer.java) \ No newline at end of file diff --git a/code/index/readme.md b/code/index/readme.md index a16c8515..bc44c7d8 100644 --- a/code/index/readme.md +++ b/code/index/readme.md @@ -29,7 +29,7 @@ results higher. ## Central Classes -* [ResultValuator](src/main/java/nu/marginalia/ranking/results/ResultValuator.java) +* [ResultValuator](java/nu/marginalia/ranking/results/ResultValuator.java) --- @@ -53,14 +53,14 @@ for creating a ranking algorithm that is focused on a particular segment of the ## Central Classes -* [PageRankDomainRanker](src/main/java/nu/marginalia/ranking/domains/PageRankDomainRanker.java) - Ranks domains using the +* [PageRankDomainRanker](java/nu/marginalia/ranking/domains/PageRankDomainRanker.java) - Ranks domains using the PageRank or Personalized PageRank algorithm depending on whether a list of influence domains is provided. ### Data sources -* [LinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/LinkGraphSource.java) - fetches the link graph -* [InvertedLinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/InvertedLinkGraphSource.java) - fetches the inverted link graph -* [SimilarityGraphSource](src/main/java/nu/marginalia/ranking/domains/data/SimilarityGraphSource.java) - fetches the similarity graph from the database +* [LinkGraphSource](java/nu/marginalia/ranking/domains/data/LinkGraphSource.java) - fetches the link graph +* [InvertedLinkGraphSource](java/nu/marginalia/ranking/domains/data/InvertedLinkGraphSource.java) - fetches the inverted link graph +* [SimilarityGraphSource](java/nu/marginalia/ranking/domains/data/SimilarityGraphSource.java) - fetches the similarity graph from the database Note that the similarity graph needs to be precomputed and stored in the database for the similarity graph source to be available. diff --git a/code/libraries/array/readme.md b/code/libraries/array/readme.md index 42417b42..7e44b3c6 100644 --- a/code/libraries/array/readme.md +++ b/code/libraries/array/readme.md @@ -32,8 +32,8 @@ try (var array = LongArrayFactory.mmapForWritingConfined(Path.of("/tmp/test"), 1 ## Query Buffers -The classes [IntQueryBuffer](src/main/java/nu/marginalia/array/buffer/IntQueryBuffer.java) -and [LongQueryBuffer](src/main/java/nu/marginalia/array/buffer/LongQueryBuffer.java) are used +The classes [IntQueryBuffer](java/nu/marginalia/array/buffer/IntQueryBuffer.java) +and [LongQueryBuffer](java/nu/marginalia/array/buffer/LongQueryBuffer.java) are used heavily in the search engine's query processing. They are dual-pointer buffers that offer tools for filtering data. @@ -75,7 +75,7 @@ buffer.finalizeFiltering(); Especially noteworthy are the operations `retain()` and `reject()` in -[IntArraySearch](src/main/java/nu/marginalia/array/algo/IntArraySearch.java) and [LongArraySearch](src/main/java/nu/marginalia/array/algo/LongArraySearch.java). +[IntArraySearch](java/nu/marginalia/array/algo/IntArraySearch.java) and [LongArraySearch](java/nu/marginalia/array/algo/LongArraySearch.java). They keep or remove all items in the buffer that exist in the referenced range of the array, which must be sorted. diff --git a/code/libraries/braille-block-punch-cards/readme.md b/code/libraries/braille-block-punch-cards/readme.md index 1785a2fc..2923ef6b 100644 --- a/code/libraries/braille-block-punch-cards/readme.md +++ b/code/libraries/braille-block-punch-cards/readme.md @@ -6,4 +6,4 @@ This is The Way when it comes to representing bit masks to humans. ## Central Classes -* [BrailleBlockPunchCards](src/main/java/nu/marginalia/bbpc/BrailleBlockPunchCards.java) \ No newline at end of file +* [BrailleBlockPunchCards](java/nu/marginalia/bbpc/BrailleBlockPunchCards.java) \ No newline at end of file diff --git a/code/libraries/btree/readme.md b/code/libraries/btree/readme.md index 446195a8..95a10e7f 100644 --- a/code/libraries/btree/readme.md +++ b/code/libraries/btree/readme.md @@ -4,11 +4,11 @@ This package contains a small library for creating and reading a static b-tree i Both binary indices (i.e. sets) are supported, as well as arbitrary multiple-of-keysize key-value mappings where the data is interlaced with the keys in the leaf nodes. This is a fairly low-level datastructure. -The b-trees are specified through a [BTreeContext](src/main/java/nu/marginalia/btree/model/BTreeContext.java) +The b-trees are specified through a [BTreeContext](java/nu/marginalia/btree/model/BTreeContext.java) which contains information about the data and index layout. -The b-trees are written through a [BTreeWriter](src/main/java/nu/marginalia/btree/BTreeWriter.java) and -read with a [BTreeReader](src/main/java/nu/marginalia/btree/BTreeReader.java). +The b-trees are written through a [BTreeWriter](java/nu/marginalia/btree/BTreeWriter.java) and +read with a [BTreeReader](java/nu/marginalia/btree/BTreeReader.java). ## Demo diff --git a/code/libraries/easy-lsh/readme.md b/code/libraries/easy-lsh/readme.md index 7bae8da9..2b2409e8 100644 --- a/code/libraries/easy-lsh/readme.md +++ b/code/libraries/easy-lsh/readme.md @@ -5,7 +5,7 @@ for document deduplication. Hashes are compared using their hamming distance. ## Central Classes -* [EasyLSH](src/main/java/nu/marginalia/lsh/EasyLSH.java) +* [EasyLSH](java/nu/marginalia/lsh/EasyLSH.java) ## Demo diff --git a/code/libraries/guarded-regex/readme.md b/code/libraries/guarded-regex/readme.md index 42d0ca08..ddef661c 100644 --- a/code/libraries/guarded-regex/readme.md +++ b/code/libraries/guarded-regex/readme.md @@ -34,4 +34,4 @@ void ifTheThingDoTheThing(String str) { ## Central Classes -* [GuardedRegexFactory](src/main/java/nu/marginalia/gregex/GuardedRegexFactory.java) \ No newline at end of file +* [GuardedRegexFactory](java/nu/marginalia/gregex/GuardedRegexFactory.java) \ No newline at end of file diff --git a/code/libraries/language-processing/readme.md b/code/libraries/language-processing/readme.md index 08965755..5b12a27d 100644 --- a/code/libraries/language-processing/readme.md +++ b/code/libraries/language-processing/readme.md @@ -4,8 +4,8 @@ This library contains various tools used in language processing. ## Central Classes -* [SentenceExtractor](src/main/java/nu/marginalia/language/sentence/SentenceExtractor.java) - -Creates a [DocumentLanguageData](src/main/java/nu/marginalia/language/model/DocumentLanguageData.java) from a text, containing +* [SentenceExtractor](java/nu/marginalia/language/sentence/SentenceExtractor.java) - +Creates a [DocumentLanguageData](java/nu/marginalia/language/model/DocumentLanguageData.java) from a text, containing its words, how they stem, POS tags, and so on. ## See Also diff --git a/code/libraries/random-write-funnel/readme.md b/code/libraries/random-write-funnel/readme.md index fc02b955..219e1439 100644 --- a/code/libraries/random-write-funnel/readme.md +++ b/code/libraries/random-write-funnel/readme.md @@ -2,12 +2,12 @@ This micro-library with strategies for solving the problem of [write amplificati writing large files out of order to disk. It offers a simple API to write data to a file in a random order, while localizing the writes. -Several strategies are available from the [RandomFileAssembler](src/main/java/nu/marginalia/rwf/RandomFileAssembler.java) +Several strategies are available from the [RandomFileAssembler](java/nu/marginalia/rwf/RandomFileAssembler.java) interface. * Writing to a memory mapped file (non-solution, for small files) * Writing to a memory buffer (for systems with enough memory) -* [RandomWriteFunnel](src/main/java/nu/marginalia/rwf/RandomWriteFunnel.java) - Not bound by memory. +* [RandomWriteFunnel](java/nu/marginalia/rwf/RandomWriteFunnel.java) - Not bound by memory. The data is written in a native byte order. @@ -41,5 +41,5 @@ catch (IOException ex) { ## Central Classes -* [RandomFileAssembler](src/main/java/nu/marginalia/rwf/RandomFileAssembler.java) -* [RandomWriteFunnel](src/main/java/nu/marginalia/rwf/RandomWriteFunnel.java) \ No newline at end of file +* [RandomFileAssembler](java/nu/marginalia/rwf/RandomFileAssembler.java) +* [RandomWriteFunnel](java/nu/marginalia/rwf/RandomWriteFunnel.java) \ No newline at end of file diff --git a/code/libraries/term-frequency-dict/readme.md b/code/libraries/term-frequency-dict/readme.md index 32912f0d..810c3751 100644 --- a/code/libraries/term-frequency-dict/readme.md +++ b/code/libraries/term-frequency-dict/readme.md @@ -5,7 +5,7 @@ the TF-IDF score of a keyword. ## Central Classes -* [TermFrequencyDict](src/main/java/nu/marginalia/term_frequency_dict/TermFrequencyDict.java) +* [TermFrequencyDict](java/nu/marginalia/term_frequency_dict/TermFrequencyDict.java) ## See Also diff --git a/code/process-models/crawl-spec/readme.md b/code/process-models/crawl-spec/readme.md index 63bcec96..cd59f23c 100644 --- a/code/process-models/crawl-spec/readme.md +++ b/code/process-models/crawl-spec/readme.md @@ -8,9 +8,9 @@ A crawl spec is a list of domains to be crawled. It is a parquet file with the Crawl specs are used to define the scope of a crawl in the absence of known domains. -The [CrawlSpecRecord](src/main/java/nu/marginalia/model/crawlspec/CrawlSpecRecord.java) class is +The [CrawlSpecRecord](java/nu/marginalia/model/crawlspec/CrawlSpecRecord.java) class is used to represent a record in the crawl spec. -The [CrawlSpecRecordParquetFileReader](src/main/java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileReader.java) -and [CrawlSpecRecordParquetFileWriter](src/main/java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileWriter.java) +The [CrawlSpecRecordParquetFileReader](java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileReader.java) +and [CrawlSpecRecordParquetFileWriter](java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileWriter.java) classes are used to read and write the crawl spec parquet files. diff --git a/code/process-models/crawling-model/readme.md b/code/process-models/crawling-model/readme.md index ac0d0906..3bb9cb58 100644 --- a/code/process-models/crawling-model/readme.md +++ b/code/process-models/crawling-model/readme.md @@ -15,27 +15,27 @@ removed in the future. ## Central Classes -* [CrawledDocument](src/main/java/nu/marginalia/crawling/model/CrawledDocument.java) -* [CrawledDomain](src/main/java/nu/marginalia/crawling/model/CrawledDomain.java) +* [CrawledDocument](java/nu/marginalia/crawling/model/CrawledDocument.java) +* [CrawledDomain](java/nu/marginalia/crawling/model/CrawledDomain.java) ### Serialization These serialization classes automatically negotiate the serialization format based on the file extension. -Data is accessed through a [SerializableCrawlDataStream](src/main/java/nu/marginalia/crawling/io/SerializableCrawlDataStream.java), +Data is accessed through a [SerializableCrawlDataStream](java/nu/marginalia/crawling/io/SerializableCrawlDataStream.java), which is a somewhat enhanced Iterator that can be used to read data. -* [CrawledDomainReader](src/main/java/nu/marginalia/crawling/io/CrawledDomainReader.java) -* [CrawledDomainWriter](src/main/java/nu/marginalia/crawling/io/CrawledDomainWriter.java) +* [CrawledDomainReader](java/nu/marginalia/crawling/io/CrawledDomainReader.java) +* [CrawledDomainWriter](java/nu/marginalia/crawling/io/CrawledDomainWriter.java) ### Parquet Serialization -The parquet serialization is done using the [CrawledDocumentParquetRecordFileReader](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileReader.java) -and [CrawledDocumentParquetRecordFileWriter](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileWriter.java) classes, +The parquet serialization is done using the [CrawledDocumentParquetRecordFileReader](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileReader.java) +and [CrawledDocumentParquetRecordFileWriter](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileWriter.java) classes, which read and write parquet files respectively. -The model classes are serialized to parquet using the [CrawledDocumentParquetRecord](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecord.java) +The model classes are serialized to parquet using the [CrawledDocumentParquetRecord](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecord.java) The record has the following fields: diff --git a/code/process-models/processed-data/readme.md b/code/process-models/processed-data/readme.md index 4bc8c857..e7f5cebb 100644 --- a/code/process-models/processed-data/readme.md +++ b/code/process-models/processed-data/readme.md @@ -4,11 +4,11 @@ reading and writing parquet files with the output from the Main models: -* [DocumentRecord](src/main/java/nu/marginalia/model/processed/DocumentRecord.java) -* * [DocumentRecordKeywordsProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java) -* * [DocumentRecordMetadataProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java) -* [DomainLinkRecord](src/main/java/nu/marginalia/model/processed/DomainLinkRecord.java) -* [DomainRecord](src/main/java/nu/marginalia/model/processed/DomainRecord.java) +* [DocumentRecord](java/nu/marginalia/model/processed/DocumentRecord.java) +* * [DocumentRecordKeywordsProjection](java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java) +* * [DocumentRecordMetadataProjection](java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java) +* [DomainLinkRecord](java/nu/marginalia/model/processed/DomainLinkRecord.java) +* [DomainRecord](java/nu/marginalia/model/processed/DomainRecord.java) Since parquet is a column based format, some of the readable models are projections that only read parts of the input file. diff --git a/code/processes/converting-process/readme.md b/code/processes/converting-process/readme.md index 3a79c481..936ca7fe 100644 --- a/code/processes/converting-process/readme.md +++ b/code/processes/converting-process/readme.md @@ -38,16 +38,16 @@ https://www.marginalia.nu/log/93_atags/ ## Central Classes -* [ConverterMain](src/main/java/nu/marginalia/converting/ConverterMain.java) orchestrates the conversion process. -* [DocumentProcessor](src/main/java/nu/marginalia/converting/processor/DocumentProcessor.java) converts a single document. -* - [HtmlDocumentProcessorPlugin](src/main/java/nu/marginalia/converting/processor/plugin/HtmlDocumentProcessorPlugin.java) +* [ConverterMain](java/nu/marginalia/converting/ConverterMain.java) orchestrates the conversion process. +* [DocumentProcessor](java/nu/marginalia/converting/processor/DocumentProcessor.java) converts a single document. +* - [HtmlDocumentProcessorPlugin](java/nu/marginalia/converting/processor/plugin/HtmlDocumentProcessorPlugin.java) has HTML-specific logic related to a document, keywords and identifies features such as whether it has javascript. -* * - [HtmlProcessorSpecializations](src/main/java/nu/marginalia/converting/processor/plugin/specialization/HtmlProcessorSpecializations.java) -* * - [XenForoSpecialization](src/main/java/nu/marginalia/converting/processor/plugin/specialization/XenForoSpecialization.java) ... -* - [PlainTextDocumentProcessorPlugin](src/main/java/nu/marginalia/converting/processor/plugin/PlainTextDocumentProcessorPlugin.java) +* * - [HtmlProcessorSpecializations](java/nu/marginalia/converting/processor/plugin/specialization/HtmlProcessorSpecializations.java) +* * - [XenForoSpecialization](java/nu/marginalia/converting/processor/plugin/specialization/XenForoSpecialization.java) ... +* - [PlainTextDocumentProcessorPlugin](java/nu/marginalia/converting/processor/plugin/PlainTextDocumentProcessorPlugin.java) has plain text-specific logic related to a document... -* [DomainProcessor](src/main/java/nu/marginalia/converting/processor/DomainProcessor.java) converts each document and +* [DomainProcessor](java/nu/marginalia/converting/processor/DomainProcessor.java) converts each document and generates domain-wide metadata such as link graphs. ## See Also diff --git a/code/processes/crawling-process/readme.md b/code/processes/crawling-process/readme.md index a595bf1d..0f72cb87 100644 --- a/code/processes/crawling-process/readme.md +++ b/code/processes/crawling-process/readme.md @@ -31,10 +31,10 @@ On top of organic links, the crawler can use sitemaps and rss-feeds to discover ## Central Classes -* [CrawlerMain](src/main/java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling. -* [CrawlerRetreiver](src/main/java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java) +* [CrawlerMain](java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling. +* [CrawlerRetreiver](java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java) visits known addresses from a domain and downloads each document. -* [HttpFetcher](src/main/java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java) +* [HttpFetcher](java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java) fetches URLs. ## See Also diff --git a/code/processes/index-constructor-process/readme.md b/code/processes/index-constructor-process/readme.md index 9457551b..ecf791b8 100644 --- a/code/processes/index-constructor-process/readme.md +++ b/code/processes/index-constructor-process/readme.md @@ -16,5 +16,5 @@ This is a very light-weight module that delegates the actual work to the modules Their respective readme files contain more information about the indexes themselves and how they are constructed. -The process is glued together within [IndexConstructorMain](src/main/java/nu/marginalia/index/IndexConstructorMain.java), +The process is glued together within [IndexConstructorMain](java/nu/marginalia/index/IndexConstructorMain.java), which is the only class of interest in this module. diff --git a/code/processes/loading-process/readme.md b/code/processes/loading-process/readme.md index ec0c12fd..4a5cf735 100644 --- a/code/processes/loading-process/readme.md +++ b/code/processes/loading-process/readme.md @@ -6,4 +6,4 @@ the index-service. ## Central Classes -* [LoaderMain](src/main/java/nu/marginalia/loading/LoaderMain.java) main class. \ No newline at end of file +* [LoaderMain](java/nu/marginalia/loading/LoaderMain.java) main class. \ No newline at end of file diff --git a/code/services-application/api-service/readme.md b/code/services-application/api-service/readme.md index 33b36b08..8e48c9bb 100644 --- a/code/services-application/api-service/readme.md +++ b/code/services-application/api-service/readme.md @@ -4,4 +4,4 @@ The API service acts as a gateway for public API requests, it deals with API key ## Central Classes -* [ApiService](src/main/java/nu/marginalia/api/ApiService.java) handles REST requests and delegates to the appropriate handling classes. \ No newline at end of file +* [ApiService](java/nu/marginalia/api/ApiService.java) handles REST requests and delegates to the appropriate handling classes. \ No newline at end of file diff --git a/code/services-application/search-service/readme.md b/code/services-application/search-service/readme.md index 02362a55..d7afe4b5 100644 --- a/code/services-application/search-service/readme.md +++ b/code/services-application/search-service/readme.md @@ -14,13 +14,13 @@ to the user. ## Central classes -* [SearchService](src/main/java/nu/marginalia/search/SearchService.java) receives requests and delegates to the +* [SearchService](java/nu/marginalia/search/SearchService.java) receives requests and delegates to the appropriate services. -* [CommandEvaluator](src/main/java/nu/marginalia/search/command/CommandEvaluator.java) interprets a user query and acts +* [CommandEvaluator](java/nu/marginalia/search/command/CommandEvaluator.java) interprets a user query and acts upon it, dealing with special operations like `browse:` or `site:`. -* [SearchQueryIndexService](src/main/java/nu/marginalia/search/svc/SearchQueryIndexService.java) passes a parsed search query to the index service, and +* [SearchQueryIndexService](java/nu/marginalia/search/svc/SearchQueryIndexService.java) passes a parsed search query to the index service, and then decorates the search results so that they can be rendered. ## See Also diff --git a/code/services-core/assistant-service/readme.md b/code/services-core/assistant-service/readme.md index 899ac8fc..1c387c83 100644 --- a/code/services-core/assistant-service/readme.md +++ b/code/services-core/assistant-service/readme.md @@ -4,4 +4,4 @@ The assistant service helps the search service by offering various peripheral fu ## Central Classes -* [AssistantService](src/main/java/nu/marginalia/assistant/AssistantService.java) handles REST requests and delegates to the appropriate handling classes. \ No newline at end of file +* [AssistantService](java/nu/marginalia/assistant/AssistantService.java) handles REST requests and delegates to the appropriate handling classes. \ No newline at end of file diff --git a/code/services-core/control-service/readme.md b/code/services-core/control-service/readme.md index 73d2742e..5da87273 100644 --- a/code/services-core/control-service/readme.md +++ b/code/services-core/control-service/readme.md @@ -15,7 +15,7 @@ Conceptually the application is broken into three parts: ## Central Classes -* [ControlService](src/main/java/nu/marginalia/control/ControlService.java) +* [ControlService](java/nu/marginalia/control/ControlService.java) ## See Also diff --git a/code/services-core/executor-service/readme.md b/code/services-core/executor-service/readme.md index 33e612df..1f05c3a4 100644 --- a/code/services-core/executor-service/readme.md +++ b/code/services-core/executor-service/readme.md @@ -9,7 +9,7 @@ much of the executor's functionality. ## Central Classes -* [ExecutorActorControlService](src/main/java/nu/marginalia/actor/ExecutorActorControlService.java) +* [ExecutorActorControlService](java/nu/marginalia/actor/ExecutorActorControlService.java) ## See Also diff --git a/code/services-core/query-service/readme.md b/code/services-core/query-service/readme.md index d2ba1961..0aa07c05 100644 --- a/code/services-core/query-service/readme.md +++ b/code/services-core/query-service/readme.md @@ -15,7 +15,7 @@ The web interface also offers a JSON API for machine-based queries. ## Central Classes -This module is almost entirely boilerplate, except the [QueryBasicInterface](src/main/java/nu/marginalia/query/QueryBasicInterface.java) +This module is almost entirely boilerplate, except the [QueryBasicInterface](java/nu/marginalia/query/QueryBasicInterface.java) class, which offers a REST API for querying the index. Much of the guts of the query service are in the [query-service](../../functions/search-query)