diff --git a/code/processes/crawling-process/readme.md b/code/processes/crawling-process/readme.md
index e04725d8..a595bf1d 100644
--- a/code/processes/crawling-process/readme.md
+++ b/code/processes/crawling-process/readme.md
@@ -4,11 +4,19 @@ The crawling process downloads HTML and saves them into per-domain snapshots.  T
 and ignores other types of documents, such as PDFs.  Crawling is done on a domain-by-domain basis, and the crawler
 does not follow links to other domains within a single job.
 
+The crawler stores data from crawls in-progress in a WARC file.  Once the crawl is complete, the WARC file is
+converted to a parquet file, which is then used by the [converting process](../converting-process/).  The intermediate
+WARC file is not used by any other process, but kept to be able to recover the state of a crawl in case of a crash or
+other failure.
+
+If configured so, these crawls may be retained.  This is not the default behavior, as the WARC format is not very dense,
+and the parquet files are much more efficient.  However, the WARC files are useful for debugging and integration with
+other tools.
+
 ## Robots Rules
 
 A significant part of the crawler is dealing with `robots.txt` and similar, rate limiting headers; especially when these
-are not served in a standard way (which is very common).  [RFC9390](https://www.rfc-editor.org/rfc/rfc9309.html) as well 
-as Google's [Robots.txt Specifications](https://developers.google.com/search/docs/advanced/robots/robots_txt) are good references.
+are not served in a standard way (which is very common).  [RFC9390](https://www.rfc-editor.org/rfc/rfc9309.html) as well as Google's [Robots.txt Specifications](https://developers.google.com/search/docs/advanced/robots/robots_txt) are good references.
 
 ## Re-crawling
 
@@ -21,7 +29,6 @@ documents from each domain, to avoid wasting time and resources on domains that
 
 On top of organic links, the crawler can use sitemaps and rss-feeds to discover new documents.
 
-
 ## Central Classes
 
 * [CrawlerMain](src/main/java/nu/marginalia/crawl/CrawlerMain.java) orchestrates the crawling.
diff --git a/code/processes/readme.md b/code/processes/readme.md
index 0722502a..acfe5a39 100644
--- a/code/processes/readme.md
+++ b/code/processes/readme.md
@@ -2,10 +2,10 @@
 
 ## 1. Crawl Process
 
-The [crawling-process](crawling-process/) fetches website contents and saves them
-as compressed JSON models described in [crawling-model](../process-models/crawling-model/).
+The [crawling-process](crawling-process/) fetches website contents, temporarily saving them as WARC files, and then
+re-converts them into parquet models.  Both are described in [crawling-model](../process-models/crawling-model/).
 
-The operation is specified by a [crawl specification](../process-models/crawl-spec), which can be created in the control GUI.
+The operation is optionally defined by a [crawl specification](../process-models/crawl-spec), which can be created in the control GUI.
 
 ## 2. Converting Process
 
@@ -32,21 +32,13 @@ the data generated by the loader.
 Schematically the crawling and loading process looks like this:
 
 ```
-    //====================\\
-    || Compressed JSON:   ||  Specifications
-    || ID, Domain, Urls[] ||  File
-    || ID, Domain, Urls[] ||
-    || ID, Domain, Urls[] ||
-    ||      ...           ||
-    \\====================//
-          |
     +-----------+  
     |  CRAWLING |  Fetch each URL and 
     |    STEP   |  output to file
     +-----------+
           |
     //========================\\
-    ||  Compressed JSON:      || Crawl
+    ||  Parquet:              || Crawl
     ||  Status, HTML[], ...   || Files
     ||  Status, HTML[], ...   ||
     ||  Status, HTML[], ...   ||