(docs) Update the documentation up-to-date information

2025-02-23 13:09:00 +00:00 · 2023-09-14 11:33:36 +02:00 · 2023-09-14 11:33:36 +02:00 · 35996d0adb
commit 35996d0adb
parent eaeb23d41e
3 changed files with 37 additions and 0 deletions
--- a/code/common/linkdb/readme.md
+++ b/code/common/linkdb/readme.md
@ -0,0 +1,11 @@
+The link database contains information about links,
+such as their ID, their URL, their title, their description,
+and so forth.
+
+The link database is a sqlite file.  The reason this information
+is not in the MariaDB database is that this would make updates to
+this information take effect in production immediately, even before
+the information was searchable.
+
+It is constructed by the [loading-process](../../processes/loading-process), and consumed 
+by the [search-service](../../services-core/search-service).
--- a/code/process-models/processed-data/readme.md
+++ b/code/process-models/processed-data/readme.md
@ -0,0 +1,18 @@
+The processed-data package contains models and logic for
+reading and writing parquet files with the output from the
+[converting-process](../../processes/converting-process).
+
+Main models:
+
+* [DocumentRecord](src/main/java/nu/marginalia/model/processed/DocumentRecord.java)
+* * [DocumentRecordKeywordsProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordKeywordsProjection.java)
+* * [DocumentRecordMetadataProjection](src/main/java/nu/marginalia/model/processed/DocumentRecordMetadataProjection.java)
+* [DomainLinkRecord](src/main/java/nu/marginalia/model/processed/DomainLinkRecord.java)
+* [DomainRecord](src/main/java/nu/marginalia/model/processed/DomainRecord.java)
+
+Since parquet is a column based format, some of the readable models are projections
+that only read parts of the input file.
+
+## See Also
+
+[third-party/parquet-floor](../../../third-party/parquet-floor)
--- a/third-party/parquet-floor/readme.md
+++ b/third-party/parquet-floor/readme.md
@ -6,3 +6,11 @@ Git: https://github.com/strategicblue/parquet-floor

 It's basically an adaptor for Parquet I/O without
 needing to pull half of Hadoop into your project.
+
+The library has been modified with support for reading
+and writing lists of values, and the default
+compression has been altered to zstd. 
+
+# Further reading:
+
+https://parquet.apache.org/docs/