MarginaliaSearch/code/libraries/btree
Viktor Lofgren daf2a8df54 (btree) Roll back optimization of queryDataWithIndex
It had been previously assumed that re-writing this function in the style of retain() would make it faster, but it had the opposite effect.

The reason why retain is so fast due to properties of the data that hold true when intersecting document lists, where long runs of adjacent documents are expected, but not when looking up the data associated with the already intersected documents, where the data is more sparse.
2024-05-19 11:29:28 +02:00
..
java/nu/marginalia/btree (btree) Roll back optimization of queryDataWithIndex 2024-05-19 11:29:28 +02:00
test/nu/marginalia/btree (btree) Roll back optimization of queryDataWithIndex 2024-05-19 11:29:28 +02:00
build.gradle (build) Java 22 and its consequences has been a disaster for Marginalia Search 2024-04-24 13:54:04 +02:00
readme.md (docs) Begin un-fucking the docs after refactoring 2024-02-27 21:22:21 +01:00

BTree

This package contains a small library for creating and reading a static b-tree in as implicit pointer-less datastructure. Both binary indices (i.e. sets) are supported, as well as arbitrary multiple-of-keysize key-value mappings where the data is interlaced with the keys in the leaf nodes. This is a fairly low-level datastructure.

The b-trees are specified through a BTreeContext which contains information about the data and index layout.

The b-trees are written through a BTreeWriter and read with a BTreeReader.

Demo

BTreeContext ctx = new BTreeContext(
        4,  // num layers max
        1,  // entry size, 1 = the leaf node has just just the key
        BTreeBlockSize.BS_4096); // page size

// Allocate a memory area to work in, see the array library for how to do this with files
LongArray array = LongArray.allocate(8192);

// Write a btree at offset 123 in the area
long[] items = new long[400];
BTreeWriter writer = new BTreeWriter(array, ctx);
final int offsetInFile = 123;

long btreeSize = writer.write(offsetInFile, items.length, slice -> {
    // here we *must* write items.length * entry.size words in slice
    // these items must be sorted!!

    for (int i = 0; i < items.length; i++) {
        slice.set(i, items[i]);
    }
});

// Read the BTree

BTreeReader reader = new BTreeReader(array, ctx, offsetInFile);
reader.findEntry(items[0]);

Useful Resources

Youtube: Abdul Bari, 10.2 B Trees and B+ Trees. How they are useful in Databases. This isn't exactly the design implemented in this library, but very well presented and a good refresher.