(docs) Update the documentation up-to-date information

This commit is contained in:
Viktor Lofgren 2023-09-14 11:33:36 +02:00
parent eaeb23d41e
commit 35996d0adb
3 changed files with 37 additions and 0 deletions

View File

@ -0,0 +1,11 @@
The link database contains information about links,
such as their ID, their URL, their title, their description,
and so forth.
The link database is a sqlite file. The reason this information
is not in the MariaDB database is that this would make updates to
this information take effect in production immediately, even before
the information was searchable.
It is constructed by the [loading-process](../../processes/loading-process), and consumed
by the [search-service](../../services-core/search-service).

View File

@ -0,0 +1,18 @@
The processed-data package contains models and logic for
reading and writing parquet files with the output from the
[converting-process](../../processes/converting-process).
Main models:
* [DocumentRecord](src/main/java/nu/marginalia/model/processed/DocumentRecord.java)
* * [DocumentRecordKeywordsProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java)
* * [DocumentRecordMetadataProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java)
* [DomainLinkRecord](src/main/java/nu/marginalia/model/processed/DomainLinkRecord.java)
* [DomainRecord](src/main/java/nu/marginalia/model/processed/DomainRecord.java)
Since parquet is a column based format, some of the readable models are projections
that only read parts of the input file.
## See Also
[third-party/parquet-floor](../../../third-party/parquet-floor)

View File

@ -6,3 +6,11 @@ Git: https://github.com/strategicblue/parquet-floor
It's basically an adaptor for Parquet I/O without
needing to pull half of Hadoop into your project.
The library has been modified with support for reading
and writing lists of values, and the default
compression has been altered to zstd.
# Further reading:
https://parquet.apache.org/docs/