From 35996d0adb02d9de7cb62f6f01af5920190144e4 Mon Sep 17 00:00:00 2001 From: Viktor Lofgren Date: Thu, 14 Sep 2023 11:33:36 +0200 Subject: [PATCH] (docs) Update the documentation up-to-date information --- code/common/linkdb/readme.md | 11 +++++++++++ code/process-models/processed-data/readme.md | 18 ++++++++++++++++++ third-party/parquet-floor/readme.md | 8 ++++++++ 3 files changed, 37 insertions(+) create mode 100644 code/common/linkdb/readme.md create mode 100644 code/process-models/processed-data/readme.md diff --git a/code/common/linkdb/readme.md b/code/common/linkdb/readme.md new file mode 100644 index 00000000..a87166bc --- /dev/null +++ b/code/common/linkdb/readme.md @@ -0,0 +1,11 @@ +The link database contains information about links, +such as their ID, their URL, their title, their description, +and so forth. + +The link database is a sqlite file. The reason this information +is not in the MariaDB database is that this would make updates to +this information take effect in production immediately, even before +the information was searchable. + +It is constructed by the [loading-process](../../processes/loading-process), and consumed +by the [search-service](../../services-core/search-service). \ No newline at end of file diff --git a/code/process-models/processed-data/readme.md b/code/process-models/processed-data/readme.md new file mode 100644 index 00000000..4bc8c857 --- /dev/null +++ b/code/process-models/processed-data/readme.md @@ -0,0 +1,18 @@ +The processed-data package contains models and logic for +reading and writing parquet files with the output from the +[converting-process](../../processes/converting-process). + +Main models: + +* [DocumentRecord](src/main/java/nu/marginalia/model/processed/DocumentRecord.java) +* * [DocumentRecordKeywordsProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java) +* * [DocumentRecordMetadataProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java) +* [DomainLinkRecord](src/main/java/nu/marginalia/model/processed/DomainLinkRecord.java) +* [DomainRecord](src/main/java/nu/marginalia/model/processed/DomainRecord.java) + +Since parquet is a column based format, some of the readable models are projections +that only read parts of the input file. + +## See Also + +[third-party/parquet-floor](../../../third-party/parquet-floor) \ No newline at end of file diff --git a/third-party/parquet-floor/readme.md b/third-party/parquet-floor/readme.md index b1e21c40..70715f1e 100644 --- a/third-party/parquet-floor/readme.md +++ b/third-party/parquet-floor/readme.md @@ -6,3 +6,11 @@ Git: https://github.com/strategicblue/parquet-floor It's basically an adaptor for Parquet I/O without needing to pull half of Hadoop into your project. + +The library has been modified with support for reading +and writing lists of values, and the default +compression has been altered to zstd. + +# Further reading: + +https://parquet.apache.org/docs/ \ No newline at end of file