MarginaliaSearch/code/libraries/random-write-funnel
Viktor Lofgren d986f90074 (index) Fix consistency between RandomFileAssembler implementations
The RandomFileAssembler implementations, introduced in commit 53c575db3f were all acting subtly differently.  The RWF implementation wrote BigEndian longs instead of the native endianness used by the other implementations (and expected by the index construction code), further the mmap implementation exposed a bug in LongArray.write() that caused it to create a larger file than necessary.

A test was built to ensure the output of these implementations is equivalent.
2024-02-05 21:01:32 +01:00
..
src (index) Fix consistency between RandomFileAssembler implementations 2024-02-05 21:01:32 +01:00
build.gradle (index-construction) Make random-write file strategy configurable 2024-02-05 12:31:15 +01:00
readme.md Move all code to a code directory. 2023-03-07 17:14:32 +01:00

Random Write Funnel

This micro-library solves the problem of write amplification when writing large files out of order to disk. It does this by bucketing the writes into several temporary files, which are then evaluated to construct the larger file with a more predictable order of writes.

Even though it effectively writes 2.5x as much data to disk than simply attempting to construct the file directly, it is much faster than thrashing an SSD with dozens of gigabytes of small random writes.

Demo

try (var rfw = new RandomWriteFunnel(tmpPath, expectedSize);
     var out = Files.newByteChannel(outputFile, StandardOpenOption.WRITE)) 
{
    rwf.put(addr1, data1);
    rwf.put(addr2, data2);
    // ...
    rwf.put(addr1e33, data1e33);
    
    rwf.write(out);
}
catch (IOException ex) {
    //
}

Central Classes