(docs) Begin un-fucking the docs after refactoring

This commit is contained in:
Viktor Lofgren 2024-02-27 21:15:49 +01:00
parent c943954bb4
commit e696fd9e92
39 changed files with 107 additions and 107 deletions

View File

@ -17,14 +17,14 @@ It's well documented and these are probably the only four tasks you'll ever need
If you are not running the system via docker, you need to provide alternative connection details than If you are not running the system via docker, you need to provide alternative connection details than
the defaults (TODO: how?). the defaults (TODO: how?).
The migration files are in [resources/db/migration](src/main/resources/db/migration). The file name convention The migration files are in [resources/db/migration](resources/db/migration). The file name convention
incorporates the project's cal-ver versioning; and are applied in lexicographical order. incorporates the project's cal-ver versioning; and are applied in lexicographical order.
VYY_MM_v_nnn__description.sql VYY_MM_v_nnn__description.sql
## Central Paths ## Central Paths
* [migrations](src/main/resources/db/migration) - Flyway migrations * [migrations](resources/db/migration) - Flyway migrations
## See Also ## See Also

View File

@ -4,11 +4,11 @@ The domain link database contains information about links
between domains. It is a static in-memory database loaded between domains. It is a static in-memory database loaded
from a binary file. from a binary file.
* [DomainLinkDb](src/main/java/nu/marginalia/linkdb/DomainLinkDb.java) * [DomainLinkDb](java/nu/marginalia/linkdb/DomainLinkDb.java)
* * [FileDomainLinkDb](src/main/java/nu/marginalia/linkdb/FileDomainLinkDb.java) * * [FileDomainLinkDb](java/nu/marginalia/linkdb/FileDomainLinkDb.java)
* * [SqlDomainLinkDb](src/main/java/nu/marginalia/linkdb/SqlDomainLinkDb.java) * * [SqlDomainLinkDb](java/nu/marginalia/linkdb/SqlDomainLinkDb.java)
* [DomainLinkDbWriter](src/main/java/nu/marginalia/linkdb/DomainLinkDbWriter.java) * [DomainLinkDbWriter](java/nu/marginalia/linkdb/DomainLinkDbWriter.java)
* [DomainLinkDbLoader](src/main/java/nu/marginalia/linkdb/DomainLinkDbLoader.java) * [DomainLinkDbLoader](java/nu/marginalia/linkdb/DomainLinkDbLoader.java)
## Document Database ## Document Database
@ -21,8 +21,8 @@ is not in the MariaDB database is that this would make updates to
this information take effect in production immediately, even before this information take effect in production immediately, even before
the information was searchable. the information was searchable.
* [DocumentLinkDbWriter](src/main/java/nu/marginalia/linkdb/DocumentDbWriter.java) * [DocumentLinkDbWriter](java/nu/marginalia/linkdb/DocumentDbWriter.java)
* [DocumentLinkDbLoader](src/main/java/nu/marginalia/linkdb/DocumentDbReader.java) * [DocumentLinkDbLoader](java/nu/marginalia/linkdb/DocumentDbReader.java)
## See Also ## See Also

View File

@ -4,9 +4,9 @@ This package contains common models to the search engine
## Central Classes ## Central Classes
* [EdgeDomain](src/main/java/nu/marginalia/model/EdgeDomain.java) * [EdgeDomain](java/nu/marginalia/model/EdgeDomain.java)
* [EdgeUrl](src/main/java/nu/marginalia/model/EdgeUrl.java) * [EdgeUrl](java/nu/marginalia/model/EdgeUrl.java)
* [DocumentMetadata](src/main/java/nu/marginalia/model/idx/DocumentMetadata.java) * [DocumentMetadata](java/nu/marginalia/model/idx/DocumentMetadata.java)
* [DocumentFlags](src/main/java/nu/marginalia/model/idx/DocumentFlags.java) * [DocumentFlags](java/nu/marginalia/model/idx/DocumentFlags.java)
* [WordMetadata](src/main/java/nu/marginalia/model/idx/WordMetadata.java) * [WordMetadata](java/nu/marginalia/model/idx/WordMetadata.java)
* [WordFlags](src/main/java/nu/marginalia/model/idx/WordFlags.java) * [WordFlags](java/nu/marginalia/model/idx/WordFlags.java)

View File

@ -4,4 +4,4 @@ Renders handlebar-style templates for the user-facing services.
## Central Classes ## Central Classes
* [Mustache Renderer](src/main/java/nu/marginalia/renderer/MustacheRenderer.java) * [Mustache Renderer](java/nu/marginalia/renderer/MustacheRenderer.java)

View File

@ -71,11 +71,11 @@ lifecycle, listen to lifecycle notifications and so on.
## gRPC Channel Pool ## gRPC Channel Pool
From the [GrpcChannelPoolFactory](src/main/java/nu/marginalia/service/client/GrpcChannelPoolFactory.java), two types of channel pools can be created From the [GrpcChannelPoolFactory](java/nu/marginalia/service/client/GrpcChannelPoolFactory.java), two types of channel pools can be created
that are aware of the service registry: that are aware of the service registry:
* [GrpcMultiNodeChannelPool](src/main/java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java) - This pool permits 1-n style communication with partitioned services * [GrpcMultiNodeChannelPool](java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java) - This pool permits 1-n style communication with partitioned services
* [GrpcSingleNodeChannelPool](src/main/java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java) - This pool permits 1-1 style communication with non-partitioned services. * [GrpcSingleNodeChannelPool](java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java) - This pool permits 1-1 style communication with non-partitioned services.
if multiple instances are running, it will use one of them and fall back if multiple instances are running, it will use one of them and fall back
to another if the first is not available. to another if the first is not available.
@ -145,5 +145,5 @@ Future<List<Response>> response = channelPool
### Central Classes ### Central Classes
* [ServiceRegistryIf](src/main/java/nu/marginalia/service/discovery/ServiceRegistryIf.java) * [ServiceRegistryIf](java/nu/marginalia/service/discovery/ServiceRegistryIf.java)
* [ZkServiceRegistry](src/main/java/nu/marginalia/service/discovery/ZkServiceRegistry.java) * [ZkServiceRegistry](java/nu/marginalia/service/discovery/ZkServiceRegistry.java)

View File

@ -50,5 +50,5 @@ Further the new service needs to be added to the `ServiceId` enum in [service-di
## Central Classes ## Central Classes
* [MainClass](src/main/java/nu/marginalia/service/MainClass.java) bootstraps all executables * [MainClass](java/nu/marginalia/service/MainClass.java) bootstraps all executables
* [Service](src/main/java/nu/marginalia/service/server/Service.java) base class for all services. * [Service](java/nu/marginalia/service/server/Service.java) base class for all services.

View File

@ -5,4 +5,4 @@ uses it to identify if a document has ads.
## Central Classes ## Central Classes
* [AdblockSimulator](src/main/java/nu/marginalia/adblock/AdblockSimulator.java) * [AdblockSimulator](java/nu/marginalia/adblock/AdblockSimulator.java)

View File

@ -2,6 +2,6 @@ Contains converter-*like* extraction jobs that operate on crawled data to produc
## Important classes ## Important classes
* [AtagExporter](src/main/java/nu/marginalia/extractor/AtagExporter.java) - extracts anchor texts from the crawled data. * [AtagExporter](java/nu/marginalia/extractor/AtagExporter.java) - extracts anchor texts from the crawled data.
* [FeedExporter](src/main/java/nu/marginalia/extractor/FeedExporter.java) - tries to find RSS/Atom feeds within the crawled data. * [FeedExporter](java/nu/marginalia/extractor/FeedExporter.java) - tries to find RSS/Atom feeds within the crawled data.
* [TermFrequencyExporter](src/main/java/nu/marginalia/extractor/TermFrequencyExporter.java) - exports the 'TF' part of TF-IDF. * [TermFrequencyExporter](java/nu/marginalia/extractor/TermFrequencyExporter.java) - exports the 'TF' part of TF-IDF.

View File

@ -6,8 +6,8 @@ functions based on [POS tags](https://www.ling.upenn.edu/courses/Fall_2003/ling0
## Central Classes ## Central Classes
* [DocumentKeywordExtractor](src/main/java/nu/marginalia/keyword/DocumentKeywordExtractor.java) * [DocumentKeywordExtractor](java/nu/marginalia/keyword/DocumentKeywordExtractor.java)
* [KeywordMetadata](src/main/java/nu/marginalia/keyword/KeywordMetadata.java) * [KeywordMetadata](java/nu/marginalia/keyword/KeywordMetadata.java)
## See Also ## See Also

View File

@ -4,4 +4,4 @@ Contains advanced haruspicy for figuring out when a document was published.
## Central Classes ## Central Classes
* [PubDateSniffer](src/main/java/nu/marginalia/pubdate/PubDateSniffer.java) * [PubDateSniffer](java/nu/marginalia/pubdate/PubDateSniffer.java)

View File

@ -21,5 +21,5 @@ order of a 100,000,000 documents with a time budget of a couple of hours.
## Central Classes ## Central Classes
* [SummaryExtractor](src/main/java/nu/marginalia/summary/SummaryExtractor.java) * [SummaryExtractor](java/nu/marginalia/summary/SummaryExtractor.java)

View File

@ -4,6 +4,6 @@ Contains tools for blocking links from crawling.
## Central Classes ## Central Classes
* [GeoIpBlocklist](src/main/java/nu/marginalia/ip_blocklist/GeoIpBlocklist.java) - country blocking * [GeoIpBlocklist](java/nu/marginalia/ip_blocklist/GeoIpBlocklist.java) - country blocking
* [IpBlocklist](src/main/java/nu/marginalia/ip_blocklist/IpBlockList.java) - CIDR-based blocking * [IpBlocklist](java/nu/marginalia/ip_blocklist/IpBlockList.java) - CIDR-based blocking
* [UrlBlocklist](src/main/java/nu/marginalia/ip_blocklist/UrlBlocklist.java) - URL pattern blocking * [UrlBlocklist](java/nu/marginalia/ip_blocklist/UrlBlocklist.java) - URL pattern blocking

View File

@ -5,4 +5,4 @@ pathological links, etc.
## Central Classes ## Central Classes
* [LinkParser](src/main/java/nu/marginalia/link_parser/LinkParser.java) * [LinkParser](java/nu/marginalia/link_parser/LinkParser.java)

View File

@ -8,8 +8,8 @@ The `id` file contains a list of sorted document ids, and the `data` file contai
metadata for each document id, in the same order as the `id` file, with a fixed metadata for each document id, in the same order as the `id` file, with a fixed
size record containing data associated with each document id. size record containing data associated with each document id.
Each record contains a binary encoded [DocumentMetadata](../../common/model/src/main/java/nu/marginalia/model/idx/DocumentMetadata.java) object, Each record contains a binary encoded [DocumentMetadata](../../common/model/java/nu/marginalia/model/idx/DocumentMetadata.java) object,
as well as a [HtmlFeatures](../../common/model/src/main/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask. as well as a [HtmlFeatures](../../common/model/java/nu/marginalia/model/crawl/HtmlFeature.java) bitmask.
Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same Unlike the reverse index, the forward index is not split into two tiers, and the data is in the same
order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory, order as it is in the source data, and the cardinality of the document IDs is assumed to fit in memory,
@ -17,5 +17,5 @@ so it's relatively easy to construct.
## Central Classes ## Central Classes
* [ForwardIndexConverter](src/main/java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index. * [ForwardIndexConverter](java/nu/marginalia/index/forward/ForwardIndexConverter.java) constructs the index.
* [ForwardIndexReader](src/main/java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index. * [ForwardIndexReader](java/nu/marginalia/index/forward/ForwardIndexReader.java) interrogates the index.

View File

@ -16,9 +16,9 @@ are designed to handle this transparently via their *Paging* implementation.
## Central Classes ## Central Classes
### Model ### Model
* [IndexJournalEntry](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntry.java) * [IndexJournalEntry](java/nu/marginalia/index/journal/model/IndexJournalEntry.java)
* [IndexJournalEntryHeader](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntryHeader.java) * [IndexJournalEntryHeader](java/nu/marginalia/index/journal/model/IndexJournalEntryHeader.java)
* [IndexJournalEntryData](src/main/java/nu/marginalia/index/journal/model/IndexJournalEntryData.java) * [IndexJournalEntryData](java/nu/marginalia/index/journal/model/IndexJournalEntryData.java)
### I/O ### I/O
* [IndexJournalReader](src/main/java/nu/marginalia/index/journal/reader/IndexJournalReader.java) * [IndexJournalReader](java/nu/marginalia/index/journal/reader/IndexJournalReader.java)
* [IndexJournalWriter](src/main/java/nu/marginalia/index/journal/writer/IndexJournalWriter.java) * [IndexJournalWriter](java/nu/marginalia/index/journal/writer/IndexJournalWriter.java)

View File

@ -34,9 +34,9 @@ to form a finalized reverse index.
![Illustration of the data layout of the finalized index](index.svg) ![Illustration of the data layout of the finalized index](index.svg)
## Central Classes ## Central Classes
* [ReversePreindex](src/main/java/nu/marginalia/index/construction/ReversePreindex.java) intermediate reverse index state. * [ReversePreindex](java/nu/marginalia/index/construction/ReversePreindex.java) intermediate reverse index state.
* [ReverseIndexConstructor](src/main/java/nu/marginalia/index/construction/ReverseIndexConstructor.java) constructs the index. * [ReverseIndexConstructor](java/nu/marginalia/index/construction/ReverseIndexConstructor.java) constructs the index.
* [ReverseIndexReader](src/main/java/nu/marginalia/index/ReverseIndexReader.java) interrogates the index. * [ReverseIndexReader](java/nu/marginalia/index/ReverseIndexReader.java) interrogates the index.
## See Also ## See Also

View File

@ -12,11 +12,11 @@ interfaces are implemented within the index-service module.
## Central Classes ## Central Classes
* [IndexQuery](src/main/java/nu/marginalia/index/query/IndexQuery.java) * [IndexQuery](java/nu/marginalia/index/query/IndexQuery.java)
* [query/filter](src/main/java/nu/marginalia/index/query/filter/) * [query/filter](java/nu/marginalia/index/query/filter/)
## See Also ## See Also
* [index/index-reverse](../index-reverse) implements many of these interfaces. * [index/index-reverse](../index-reverse) implements many of these interfaces.
* [libraries/array](../../libraries/array) * [libraries/array](../../libraries/array)
* [libraries/array/.../LongQueryBuffer](../../libraries/array/src/main/java/nu/marginalia/array/buffer/LongQueryBuffer.java) * [libraries/array/.../LongQueryBuffer](../../libraries/array/java/nu/marginalia/array/buffer/LongQueryBuffer.java)

View File

@ -29,7 +29,7 @@ results higher.
## Central Classes ## Central Classes
* [ResultValuator](src/main/java/nu/marginalia/ranking/results/ResultValuator.java) * [ResultValuator](java/nu/marginalia/ranking/results/ResultValuator.java)
--- ---
@ -53,14 +53,14 @@ for creating a ranking algorithm that is focused on a particular segment of the
## Central Classes ## Central Classes
* [PageRankDomainRanker](src/main/java/nu/marginalia/ranking/domains/PageRankDomainRanker.java) - Ranks domains using the * [PageRankDomainRanker](java/nu/marginalia/ranking/domains/PageRankDomainRanker.java) - Ranks domains using the
PageRank or Personalized PageRank algorithm depending on whether a list of influence domains is provided. PageRank or Personalized PageRank algorithm depending on whether a list of influence domains is provided.
### Data sources ### Data sources
* [LinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/LinkGraphSource.java) - fetches the link graph * [LinkGraphSource](java/nu/marginalia/ranking/domains/data/LinkGraphSource.java) - fetches the link graph
* [InvertedLinkGraphSource](src/main/java/nu/marginalia/ranking/domains/data/InvertedLinkGraphSource.java) - fetches the inverted link graph * [InvertedLinkGraphSource](java/nu/marginalia/ranking/domains/data/InvertedLinkGraphSource.java) - fetches the inverted link graph
* [SimilarityGraphSource](src/main/java/nu/marginalia/ranking/domains/data/SimilarityGraphSource.java) - fetches the similarity graph from the database * [SimilarityGraphSource](java/nu/marginalia/ranking/domains/data/SimilarityGraphSource.java) - fetches the similarity graph from the database
Note that the similarity graph needs to be precomputed and stored in the database for Note that the similarity graph needs to be precomputed and stored in the database for
the similarity graph source to be available. the similarity graph source to be available.

View File

@ -32,8 +32,8 @@ try (var array = LongArrayFactory.mmapForWritingConfined(Path.of("/tmp/test"), 1
## Query Buffers ## Query Buffers
The classes [IntQueryBuffer](src/main/java/nu/marginalia/array/buffer/IntQueryBuffer.java) The classes [IntQueryBuffer](java/nu/marginalia/array/buffer/IntQueryBuffer.java)
and [LongQueryBuffer](src/main/java/nu/marginalia/array/buffer/LongQueryBuffer.java) are used and [LongQueryBuffer](java/nu/marginalia/array/buffer/LongQueryBuffer.java) are used
heavily in the search engine's query processing. heavily in the search engine's query processing.
They are dual-pointer buffers that offer tools for filtering data. They are dual-pointer buffers that offer tools for filtering data.
@ -75,7 +75,7 @@ buffer.finalizeFiltering();
Especially noteworthy are the operations `retain()` and `reject()` in Especially noteworthy are the operations `retain()` and `reject()` in
[IntArraySearch](src/main/java/nu/marginalia/array/algo/IntArraySearch.java) and [LongArraySearch](src/main/java/nu/marginalia/array/algo/LongArraySearch.java). [IntArraySearch](java/nu/marginalia/array/algo/IntArraySearch.java) and [LongArraySearch](java/nu/marginalia/array/algo/LongArraySearch.java).
They keep or remove all items in the buffer that exist in the referenced range of the array, They keep or remove all items in the buffer that exist in the referenced range of the array,
which must be sorted. which must be sorted.

View File

@ -6,4 +6,4 @@ This is The Way when it comes to representing bit masks to humans.
## Central Classes ## Central Classes
* [BrailleBlockPunchCards](src/main/java/nu/marginalia/bbpc/BrailleBlockPunchCards.java) * [BrailleBlockPunchCards](java/nu/marginalia/bbpc/BrailleBlockPunchCards.java)

View File

@ -4,11 +4,11 @@ This package contains a small library for creating and reading a static b-tree i
Both binary indices (i.e. sets) are supported, as well as arbitrary multiple-of-keysize key-value mappings where the data is Both binary indices (i.e. sets) are supported, as well as arbitrary multiple-of-keysize key-value mappings where the data is
interlaced with the keys in the leaf nodes. This is a fairly low-level datastructure. interlaced with the keys in the leaf nodes. This is a fairly low-level datastructure.
The b-trees are specified through a [BTreeContext](src/main/java/nu/marginalia/btree/model/BTreeContext.java) The b-trees are specified through a [BTreeContext](java/nu/marginalia/btree/model/BTreeContext.java)
which contains information about the data and index layout. which contains information about the data and index layout.
The b-trees are written through a [BTreeWriter](src/main/java/nu/marginalia/btree/BTreeWriter.java) and The b-trees are written through a [BTreeWriter](java/nu/marginalia/btree/BTreeWriter.java) and
read with a [BTreeReader](src/main/java/nu/marginalia/btree/BTreeReader.java). read with a [BTreeReader](java/nu/marginalia/btree/BTreeReader.java).
## Demo ## Demo

View File

@ -5,7 +5,7 @@ for document deduplication. Hashes are compared using their hamming distance.
## Central Classes ## Central Classes
* [EasyLSH](src/main/java/nu/marginalia/lsh/EasyLSH.java) * [EasyLSH](java/nu/marginalia/lsh/EasyLSH.java)
## Demo ## Demo

View File

@ -34,4 +34,4 @@ void ifTheThingDoTheThing(String str) {
## Central Classes ## Central Classes
* [GuardedRegexFactory](src/main/java/nu/marginalia/gregex/GuardedRegexFactory.java) * [GuardedRegexFactory](java/nu/marginalia/gregex/GuardedRegexFactory.java)

View File

@ -4,8 +4,8 @@ This library contains various tools used in language processing.
## Central Classes ## Central Classes
* [SentenceExtractor](src/main/java/nu/marginalia/language/sentence/SentenceExtractor.java) - * [SentenceExtractor](java/nu/marginalia/language/sentence/SentenceExtractor.java) -
Creates a [DocumentLanguageData](src/main/java/nu/marginalia/language/model/DocumentLanguageData.java) from a text, containing Creates a [DocumentLanguageData](java/nu/marginalia/language/model/DocumentLanguageData.java) from a text, containing
its words, how they stem, POS tags, and so on. its words, how they stem, POS tags, and so on.
## See Also ## See Also

View File

@ -2,12 +2,12 @@ This micro-library with strategies for solving the problem of [write amplificati
writing large files out of order to disk. It offers a simple API to write data to a file in a writing large files out of order to disk. It offers a simple API to write data to a file in a
random order, while localizing the writes. random order, while localizing the writes.
Several strategies are available from the [RandomFileAssembler](src/main/java/nu/marginalia/rwf/RandomFileAssembler.java) Several strategies are available from the [RandomFileAssembler](java/nu/marginalia/rwf/RandomFileAssembler.java)
interface. interface.
* Writing to a memory mapped file (non-solution, for small files) * Writing to a memory mapped file (non-solution, for small files)
* Writing to a memory buffer (for systems with enough memory) * Writing to a memory buffer (for systems with enough memory)
* [RandomWriteFunnel](src/main/java/nu/marginalia/rwf/RandomWriteFunnel.java) - Not bound by memory. * [RandomWriteFunnel](java/nu/marginalia/rwf/RandomWriteFunnel.java) - Not bound by memory.
The data is written in a native byte order. The data is written in a native byte order.
@ -41,5 +41,5 @@ catch (IOException ex) {
## Central Classes ## Central Classes
* [RandomFileAssembler](src/main/java/nu/marginalia/rwf/RandomFileAssembler.java) * [RandomFileAssembler](java/nu/marginalia/rwf/RandomFileAssembler.java)
* [RandomWriteFunnel](src/main/java/nu/marginalia/rwf/RandomWriteFunnel.java) * [RandomWriteFunnel](java/nu/marginalia/rwf/RandomWriteFunnel.java)

View File

@ -5,7 +5,7 @@ the TF-IDF score of a keyword.
## Central Classes ## Central Classes
* [TermFrequencyDict](src/main/java/nu/marginalia/term_frequency_dict/TermFrequencyDict.java) * [TermFrequencyDict](java/nu/marginalia/term_frequency_dict/TermFrequencyDict.java)
## See Also ## See Also

View File

@ -8,9 +8,9 @@ A crawl spec is a list of domains to be crawled. It is a parquet file with the
Crawl specs are used to define the scope of a crawl in the absence of known domains. Crawl specs are used to define the scope of a crawl in the absence of known domains.
The [CrawlSpecRecord](src/main/java/nu/marginalia/model/crawlspec/CrawlSpecRecord.java) class is The [CrawlSpecRecord](java/nu/marginalia/model/crawlspec/CrawlSpecRecord.java) class is
used to represent a record in the crawl spec. used to represent a record in the crawl spec.
The [CrawlSpecRecordParquetFileReader](src/main/java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileReader.java) The [CrawlSpecRecordParquetFileReader](java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileReader.java)
and [CrawlSpecRecordParquetFileWriter](src/main/java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileWriter.java) and [CrawlSpecRecordParquetFileWriter](java/nu/marginalia/io/crawlspec/CrawlSpecRecordParquetFileWriter.java)
classes are used to read and write the crawl spec parquet files. classes are used to read and write the crawl spec parquet files.

View File

@ -15,27 +15,27 @@ removed in the future.
## Central Classes ## Central Classes
* [CrawledDocument](src/main/java/nu/marginalia/crawling/model/CrawledDocument.java) * [CrawledDocument](java/nu/marginalia/crawling/model/CrawledDocument.java)
* [CrawledDomain](src/main/java/nu/marginalia/crawling/model/CrawledDomain.java) * [CrawledDomain](java/nu/marginalia/crawling/model/CrawledDomain.java)
### Serialization ### Serialization
These serialization classes automatically negotiate the serialization format based on the These serialization classes automatically negotiate the serialization format based on the
file extension. file extension.
Data is accessed through a [SerializableCrawlDataStream](src/main/java/nu/marginalia/crawling/io/SerializableCrawlDataStream.java), Data is accessed through a [SerializableCrawlDataStream](java/nu/marginalia/crawling/io/SerializableCrawlDataStream.java),
which is a somewhat enhanced Iterator that can be used to read data. which is a somewhat enhanced Iterator that can be used to read data.
* [CrawledDomainReader](src/main/java/nu/marginalia/crawling/io/CrawledDomainReader.java) * [CrawledDomainReader](java/nu/marginalia/crawling/io/CrawledDomainReader.java)
* [CrawledDomainWriter](src/main/java/nu/marginalia/crawling/io/CrawledDomainWriter.java) * [CrawledDomainWriter](java/nu/marginalia/crawling/io/CrawledDomainWriter.java)
### Parquet Serialization ### Parquet Serialization
The parquet serialization is done using the [CrawledDocumentParquetRecordFileReader](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileReader.java) The parquet serialization is done using the [CrawledDocumentParquetRecordFileReader](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileReader.java)
and [CrawledDocumentParquetRecordFileWriter](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileWriter.java) classes, and [CrawledDocumentParquetRecordFileWriter](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecordFileWriter.java) classes,
which read and write parquet files respectively. which read and write parquet files respectively.
The model classes are serialized to parquet using the [CrawledDocumentParquetRecord](src/main/java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecord.java) The model classes are serialized to parquet using the [CrawledDocumentParquetRecord](java/nu/marginalia/crawling/parquet/CrawledDocumentParquetRecord.java)
The record has the following fields: The record has the following fields:

View File

@ -4,11 +4,11 @@ reading and writing parquet files with the output from the
Main models: Main models:
* [DocumentRecord](src/main/java/nu/marginalia/model/processed/DocumentRecord.java) * [DocumentRecord](java/nu/marginalia/model/processed/DocumentRecord.java)
* * [DocumentRecordKeywordsProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java) * * [DocumentRecordKeywordsProjection](java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java)
* * [DocumentRecordMetadataProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java) * * [DocumentRecordMetadataProjection](java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java)
* [DomainLinkRecord](src/main/java/nu/marginalia/model/processed/DomainLinkRecord.java) * [DomainLinkRecord](java/nu/marginalia/model/processed/DomainLinkRecord.java)
* [DomainRecord](src/main/java/nu/marginalia/model/processed/DomainRecord.java) * [DomainRecord](java/nu/marginalia/model/processed/DomainRecord.java)
Since parquet is a column based format, some of the readable models are projections Since parquet is a column based format, some of the readable models are projections
that only read parts of the input file. that only read parts of the input file.

View File

@ -38,16 +38,16 @@ https://www.marginalia.nu/log/93_atags/
## Central Classes ## Central Classes
* [ConverterMain](src/main/java/nu/marginalia/converting/ConverterMain.java) orchestrates the conversion process. * [ConverterMain](java/nu/marginalia/converting/ConverterMain.java) orchestrates the conversion process.
* [DocumentProcessor](src/main/java/nu/marginalia/converting/processor/DocumentProcessor.java) converts a single document. * [DocumentProcessor](java/nu/marginalia/converting/processor/DocumentProcessor.java) converts a single document.
* - [HtmlDocumentProcessorPlugin](src/main/java/nu/marginalia/converting/processor/plugin/HtmlDocumentProcessorPlugin.java) * - [HtmlDocumentProcessorPlugin](java/nu/marginalia/converting/processor/plugin/HtmlDocumentProcessorPlugin.java)
has HTML-specific logic related to a document, keywords and identifies features such as whether it has javascript. has HTML-specific logic related to a document, keywords and identifies features such as whether it has javascript.
* * - [HtmlProcessorSpecializations](src/main/java/nu/marginalia/converting/processor/plugin/specialization/HtmlProcessorSpecializations.java) * * - [HtmlProcessorSpecializations](java/nu/marginalia/converting/processor/plugin/specialization/HtmlProcessorSpecializations.java)
* * - [XenForoSpecialization](src/main/java/nu/marginalia/converting/processor/plugin/specialization/XenForoSpecialization.java) ... * * - [XenForoSpecialization](java/nu/marginalia/converting/processor/plugin/specialization/XenForoSpecialization.java) ...
* - [PlainTextDocumentProcessorPlugin](src/main/java/nu/marginalia/converting/processor/plugin/PlainTextDocumentProcessorPlugin.java) * - [PlainTextDocumentProcessorPlugin](java/nu/marginalia/converting/processor/plugin/PlainTextDocumentProcessorPlugin.java)
has plain text-specific logic related to a document... has plain text-specific logic related to a document...
* [DomainProcessor](src/main/java/nu/marginalia/converting/processor/DomainProcessor.java) converts each document and * [DomainProcessor](java/nu/marginalia/converting/processor/DomainProcessor.java) converts each document and
generates domain-wide metadata such as link graphs. generates domain-wide metadata such as link graphs.
## See Also ## See Also

View File

@ -31,10 +31,10 @@ On top of organic links, the crawler can use sitemaps and rss-feeds to discover
## Central Classes ## Central Classes
* [CrawlerMain](src/main/java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling. * [CrawlerMain](java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling.
* [CrawlerRetreiver](src/main/java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java) * [CrawlerRetreiver](java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java)
visits known addresses from a domain and downloads each document. visits known addresses from a domain and downloads each document.
* [HttpFetcher](src/main/java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java) * [HttpFetcher](java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java)
fetches URLs. fetches URLs.
## See Also ## See Also

View File

@ -16,5 +16,5 @@ This is a very light-weight module that delegates the actual work to the modules
Their respective readme files contain more information about the indexes themselves Their respective readme files contain more information about the indexes themselves
and how they are constructed. and how they are constructed.
The process is glued together within [IndexConstructorMain](src/main/java/nu/marginalia/index/IndexConstructorMain.java), The process is glued together within [IndexConstructorMain](java/nu/marginalia/index/IndexConstructorMain.java),
which is the only class of interest in this module. which is the only class of interest in this module.

View File

@ -6,4 +6,4 @@ the index-service.
## Central Classes ## Central Classes
* [LoaderMain](src/main/java/nu/marginalia/loading/LoaderMain.java) main class. * [LoaderMain](java/nu/marginalia/loading/LoaderMain.java) main class.

View File

@ -4,4 +4,4 @@ The API service acts as a gateway for public API requests, it deals with API key
## Central Classes ## Central Classes
* [ApiService](src/main/java/nu/marginalia/api/ApiService.java) handles REST requests and delegates to the appropriate handling classes. * [ApiService](java/nu/marginalia/api/ApiService.java) handles REST requests and delegates to the appropriate handling classes.

View File

@ -14,13 +14,13 @@ to the user.
## Central classes ## Central classes
* [SearchService](src/main/java/nu/marginalia/search/SearchService.java) receives requests and delegates to the * [SearchService](java/nu/marginalia/search/SearchService.java) receives requests and delegates to the
appropriate services. appropriate services.
* [CommandEvaluator](src/main/java/nu/marginalia/search/command/CommandEvaluator.java) interprets a user query and acts * [CommandEvaluator](java/nu/marginalia/search/command/CommandEvaluator.java) interprets a user query and acts
upon it, dealing with special operations like `browse:` or `site:`. upon it, dealing with special operations like `browse:` or `site:`.
* [SearchQueryIndexService](src/main/java/nu/marginalia/search/svc/SearchQueryIndexService.java) passes a parsed search query to the index service, and * [SearchQueryIndexService](java/nu/marginalia/search/svc/SearchQueryIndexService.java) passes a parsed search query to the index service, and
then decorates the search results so that they can be rendered. then decorates the search results so that they can be rendered.
## See Also ## See Also

View File

@ -4,4 +4,4 @@ The assistant service helps the search service by offering various peripheral fu
## Central Classes ## Central Classes
* [AssistantService](src/main/java/nu/marginalia/assistant/AssistantService.java) handles REST requests and delegates to the appropriate handling classes. * [AssistantService](java/nu/marginalia/assistant/AssistantService.java) handles REST requests and delegates to the appropriate handling classes.

View File

@ -15,7 +15,7 @@ Conceptually the application is broken into three parts:
## Central Classes ## Central Classes
* [ControlService](src/main/java/nu/marginalia/control/ControlService.java) * [ControlService](java/nu/marginalia/control/ControlService.java)
## See Also ## See Also

View File

@ -9,7 +9,7 @@ much of the executor's functionality.
## Central Classes ## Central Classes
* [ExecutorActorControlService](src/main/java/nu/marginalia/actor/ExecutorActorControlService.java) * [ExecutorActorControlService](java/nu/marginalia/actor/ExecutorActorControlService.java)
## See Also ## See Also

View File

@ -15,7 +15,7 @@ The web interface also offers a JSON API for machine-based queries.
## Central Classes ## Central Classes
This module is almost entirely boilerplate, except the [QueryBasicInterface](src/main/java/nu/marginalia/query/QueryBasicInterface.java) This module is almost entirely boilerplate, except the [QueryBasicInterface](java/nu/marginalia/query/QueryBasicInterface.java)
class, which offers a REST API for querying the index. class, which offers a REST API for querying the index.
Much of the guts of the query service are in the [query-service](../../functions/search-query) Much of the guts of the query service are in the [query-service](../../functions/search-query)