(doc) Fix outdated links in documentation

2025-02-23 21:18:58 +00:00 · 2024-09-22 13:56:17 +02:00 · 2024-09-22 13:56:17 +02:00 · 9c292a4f62
commit 9c292a4f62
parent edb42836da
5 changed files with 6 additions and 16 deletions
--- a/code/index/index-reverse/readme.md
+++ b/code/index/index-reverse/readme.md
@ -13,7 +13,7 @@ a binary index that only offers information about which documents has a specific
 The priority index is also compressed, while the full index at this point is not.

 [1] See WordFlags in [common/model](../../common/model/) and
-KeywordMetadata in [features-convert/keyword-extraction](../../features-convert/keyword-extraction).
+KeywordMetadata in [converting-process/ft-keyword-extraction](../../processes/converting-process/ft-keyword-extraction).

 ## Construction

--- a/code/libraries/language-processing/readme.md
+++ b/code/libraries/language-processing/readme.md
@ -10,5 +10,5 @@ its words, how they stem, POS tags, and so on.

 ## See Also

-[features-convert/keyword-extraction](../../features-convert/keyword-extraction) uses this code to identify which keywords
+[converting-process/ft-keyword-extraction](../../processes/converting-process/ft-keyword-extraction) uses this code to identify which keywords
 are important.
--- a/code/processes/converting-process/readme.md
+++ b/code/processes/converting-process/readme.md
@ -49,7 +49,3 @@ has HTML-specific logic related to a document, keywords and identifies features

 * [DomainProcessor](java/nu/marginalia/converting/processor/DomainProcessor.java) converts each document and 
 generates domain-wide metadata such as link graphs.
-
-## See Also
-
-* [features-convert](../../features-convert/)
--- a/code/processes/crawling-process/readme.md
+++ b/code/processes/crawling-process/readme.md
@ -35,8 +35,4 @@ On top of organic links, the crawler can use sitemaps and rss-feeds to discover
 * [CrawlerRetreiver](java/nu/marginalia/crawl/retreival/CrawlerRetreiver.java)
  visits known addresses from a domain and downloads each document.
 * [HttpFetcher](java/nu/marginalia/crawl/retreival/fetcher/HttpFetcherImpl.java)
-  fetches URLs.
-
-## See Also
-
-* [features-crawl](../../features-crawl/)
+  fetches URLs.
--- a/code/processes/readme.md
+++ b/code/processes/readme.md
@ -3,15 +3,13 @@
 ## 1. Crawl Process

 The [crawling-process](crawling-process/) fetches website contents, temporarily saving them as WARC files, and then
-re-converts them into parquet models.  Both are described in [crawling-model](../process-models/crawling-model/).
-
-The operation is optionally defined by a [crawl specification](../process-models/crawl-spec), which can be created in the control GUI.
+re-converts them into parquet models.  Both are described in [crawling-process/model](crawling-process/model/).

 ## 2. Converting Process

 The [converting-process](converting-process/) reads crawl data from the crawling step and 
 processes them, extracting keywords and metadata and saves them as parquet files 
-described in [processed-data](../process-models/processed-data/).
+described in [converting-process/model](converting-process/model/).

 ## 3. Loading Process

@ -51,7 +49,7 @@ Schematically the crawling and loading process looks like this:
    +------------+  features, links, URLs
          |
    //==================\\
-    || Parquet:         ||  Processed
+    || Slop   :         ||  Processed
    ||  Documents[]     ||  Files
    ||  Domains[]       ||
    ||  Links[]         ||