fs2 - File Structures 2

by Tim Henderson (tadh@case.edu)

Licensed under the GNU GPL version 3 or at your option any later version. If you need another licensing option please contact me directly.

What is this?

A B+ Tree implementation
A list implemenation supporting O(1) Append, Pop, Get and Set operations.
A command to generate type specific wrappers around the above structures. It's generic, in Go, kinda.
A platform for implementing memory mapped high performance file structures in Go.

Why did you make this?

In my academic research some of the algorithms I work on (such as frequent subgraph mining) have exponential characteristics in their memory usage. In order to run these algorithms on larger data sets they need to be able to transparently cache less popular parts of the data to disk. However, in general it is best to keep as much data in memory as possible.

Of course, there are many options for such data stores. I could have used a off the shelf database, however I also want to explore ways to achieve higher performance than those solutions offer.

Have you worked on this type of system before?

I have! In the golang world I believe I was the first to implement a disk backed B+ Tree. Here is an early commit from my file-structures repository. Note the date: February 21, 2010. Go was made public in November of 2009 (the first weekly was November 06, 2009). I started work on the B+ Tree in January of 2010.

This particular experiment is a follow up to my work in the file-structures repository. I have used those structures successfully many times but I want to experiment with new ways of doing things to achieve better results. Besides my disk backed versions of these structures you can also find good implementations of in memory version in my data-structures repository.

B+ Tree

docs

This is a disk backed low level "key-value store". The closest thing similar to what it offers is Bolt DB. My blog post is a great place to start to learn more about the ubiquitous B+ Tree.

Features

Variable length size key or fixed sized keys. Fixed sized keys should be kept relatively short, less than 1024 bytes (the shorter the better). Variable length keys can be up to 2^31 - 1 bytes long.
Variable length values or fixed sized values. Fixed sized values should also be kept short, less than 1024 bytes. Variable length values can be up to 2^31 - 1 bytes long.
Duplicate key support. Duplicates are kept out of the index and only occur in the leaves.
Data is only written to disk when you tell it (or when need due to OS level page management).
Simple (but low level) interface.
Can operate in either a anonymous memory map or in a file backed memory map. If you plan to have a very large tree (even one that never needs to be persisted) it is recommend you use a file backed memory map. The OS treats pages in the file cache different than pages which are not backed by files.
The command fs2-generic can generate a wrapper specialized to your data type. Typing saved! To use go install github.com/timtadh/fs2/fs2-generic. Get help with fs2-generic --help

Limitations

Not thread safe and therefore no transactions which you only need with multiple threads.
Maximum fixed key/value size is ~1350 bytes.
Maximum variable length key/value size is 2^31 - 1
This is not a database. You could make it into a database or build a database on top of it.

Quick Start

usage docs on godoc

Importing

import (
    "github.com/timtadh/fs2/bptree"
    "github.com/timtadh/fs2/fmap"
)

Creating a new B+ Tree (fixed key size, variable length value size).

bf, err := fmap.CreateBlockFile("/path/to/file")
if err != nil {
    log.Fatal(err)
}
defer bf.Close()
bpt, err := bptree.New(bf, 8, -1)
if err != nil {
    log.Fatal(err)
}
// do stuff with bpt

Opening a B+ Tree

bf, err := fmap.OpenBlockFile("/path/to/file")
if err != nil {
    log.Fatal(err)
}
defer bf.Close()
bpt, err := bptree.Open(bf)
if err != nil {
    log.Fatal(err)
}
// do stuff with bpt

Add a key/value pair. Note, since this is low level you have to serialize your keys and values. The length of the []byte representing the key must exactly match the key size of the B+ Tree. You can find out what that was set to by called bpt.KeySize()

import (
    "encoding/binary"
)

var key uint64 = 12
value := "hello world"
kBytes := make([]byte, 8)
binary.PutUvarint(kBytes, key)
err := bpt.Add(kBytes, []byte(value))
if err != nil {
    log.Fatal(err)
}

As you can see it can be a little verbose to serialize and de-serialize your keys and values. So be sure to wrap that up in utility functions or even to wrap the interface of the B+ Tree so that client code does not have to think about it.

Since a B+Tree is a "multi-map" meaning there may be more than one value per key. There is no "Get" method. To retrieve the values associated with a key use the Find method.

{
    var key, value []byte
    kvi, err := bpt.Find(kBytes)
    if err != nil {
        log.Fatal(err)
    }
    for key, value, err, kvi = kvi(); kvi != nil; key, value, err, kvi = kvi() {
        // do stuff with the keys and values
    }
    if err != nil {
        log.Fatal(err)
    }
}

That interface is easy to misuse if you do not check the error values as show in the example above. An easier interface is provided for all of the iterators (Range, Find, Keys, Values, Iterate) called the Do interface.

err = bpt.DoFind(kBytes, func(key, value []byte) error {
    // do stuff with the keys and values
    return nil
})
if err != nil {
    log.Fatal(err)
}

It is recommended that you always use the Do* interfaces. The other is provided if the cost of extra method calls is too high.

Removal is also slightly more complicated due to the duplicate keys. This example will remove all key/value pairs associated with the given key:

err = bpt.Remove(kBytes, func(value []byte) bool {
    return true
})
if err != nil {
    log.Fatal(err)
}

to remove just the one I added earlier do:

err = bpt.Remove(kBytes, func(v []byte) bool {
    return bytes.Equal(v, []byte(value))
})
if err != nil {
    log.Fatal(err)
}

That wraps up the basic usage. If you want to ensure that the bytes you have written are in fact on disk you have 2 options

call bf.Sync() - Note this uses the async mmap interface under the hood. The bytes are not guaranteed to hit the disk after this returns but they will go there soon.
call bf.Close()

MMList

docs

A Memory Mapped List. This list works more like a stack and less like a queue. It is not a good thing to build a job queue on. It is a good thing to build a large set of items which can be efficiently randomly sampled. It uses the same varchar system that the B+Tree uses so it can store variably sized items up to 2^31 - 1 bytes long.

Operations

Size O(1)
Append O(1)
Pop O(1)
Get O(1)
Set O(1)
Swap O(1)
SwapDelete O(1)

I will consider implementing a Delete method. However, it will be O(n) since this is implemented a bit like an ArrayList under the hood. The actual way it works is there is a B+Tree which indexes to list index blocks. The list index blocks hold pointers (511 of them) to varchar locations. I considered having a restricted 2 level index but that would have limited the size of the list to a maximum of ~1 billion items which was uncomfortably small to me. In the future the implementation may change to use something more like an ISAM index which will be a bit more compact for this use case.

Quickstart

package main

import (
    "binary"
    "log"
)

import (
    "github.com/timtadh/fs2/fmap"
    "github.com/timtadh/fs2/mmlist"
)

func main() {
    file, err := fmap.CreateBlockFile("/tmp/file")
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()
    list, err := mmlist.New(file)
    if err != nil {
        log.Fatal(err)
    }
    idx, err := list.Append([]byte("hello"))
    if err != nil {
        log.Fatal(err)
    }
    if d, err := list.Get(idx); err != nil {
        log.Fatal(err)
    } else if !bytes.Equal([]byte("hello"), d) {
        log.Fatal("bytes where not hello")
    }
    if err := list.Set(idx, "bye!"); err != nil {
        log.Fatal(err)
    }
    if d, err := list.Get(idx); err != nil {
        log.Fatal(err)
    } else if !bytes.Equal([]byte("bye!"), d) {
        log.Fatal("bytes where not bye!")
    }
    if d, err := list.Pop(); err != nil {
        log.Fatal(err)
    } else if !bytes.Equal([]byte("bye!"), d) {
        log.Fatal("bytes where not bye!")
    }
}

`fs2-generic`

A command to generate type specialized wrappers around fs2 structures.

Since Go does not support generics and is not going to support generics anytime soon, this program will produce a wrapper specialized to the supplied types. It is essentially manually implementing type specialized generics in a very limited form. All fs2 structures operate on sequences of bytes, aka []byte, because they memory mapped and file backed structures. Therefore, the supplied types must be serializable to be used as keys and values in an fs2 structure.

How to install

Assuming you already have the code downloaded and in your GOPATH just run:

$ go install github.com/timtadh/fs2/fs2-generic

How to generate a wrapper for the B+ Tree

$ fs2-generic \
    --output=src/output/package/file.go \
    --package-name=package \
    bptree \
        --key-type=my/package/name/Type \
        --key-serializer=my/package/name/Func \
        --key-deserializer=my/package/name/Func \
        --value-type=my/package/name/Type \
        --value-serializer=my/package/name/Func \
        --value-deserializer=my/package/name/Func

How to generate a wrapper for the MMList

$ fs2-generic \
    --output=src/output/package/file.go \
    --package-name=package \
    mmlist \
        --item-type=my/package/name/Type \
        --item-serializer=my/package/name/Func \
        --item-deserializer=my/package/name/Func

Variations

Supplying a pointer type:

--key-type=*my/package/name/Type
--value-type=*my/package/name/Type

Serializer Type Signature (let T be a type parameter)

func(T) ([]byte)

Deserializer Type Signature (let T be a type parameter)

func([]byte) T

Fixed sized types can have their sizes specified with

--key-size=<# of bytes>
--value-size=<# of bytes>

If the generated file is going into the same package that the types and function are declared in one should drop the package specifiers

$ fs2-generic \
    --output=src/output/package/file.go \
    --package-name=package \
    bptree \
        --key-type=KeyType \
        --key-serializer=SerializeKey \
        --key-deserializer=DeserializeKey \
        --value-type=ValueType \
        --value-serializer=SerializeValue \
        --value-deserializer=DeserializeValue

If nil is not a valid "empty" value for your type (for instance it is an integer, a float, or a struct value) then your must supply a valid "empty" value. Here is an example of a tree with int32 keys and float64 values:

$ fs2-generic \
    --output=src/output/package/file.go \
    --package-name=package \
    bptree \
        --key-type=int32 \
        --key-size=4 \
        --key-empty=0 \
        --key-serializer=SerializeInt32 \
        --key-deserializer=DeserializeInt32 \
        --value-type=float64 \
        --value-size=8 \
        --value-empty=0.0 \
        --value-serializer=SerializeFloat64 \
        --value-deserializer=DeserializeFloat64

Using with `go generate`

The fs2-generic command can be used on conjunction with go generate. To do so simply create a .go file in the package where the generated code should live. For example, let's pretend that we want to create a B+Tree with 3 dimension integer points as keys and float64's as values. Lets create a package structure for that (assuming you are in the root of your $GOPATH)

mkdir ./src/edu/cwru/eecs/pointbptree/
touch ./src/edu/cwru/eecs/pointbptree/types.go

types.go should then have the point defined + functions for serialization. Below is the full example. Note the top line specifies how to generate the file ./src/edu/cwru/eecs/pointbptree/wrapper.go. To generate it run go generate edu/cwru/eecs/pointbptree.

//go:generate fs2-generic --output=wrapper.go --package-name=pointbptree bptree --key-type=*Point --key-size=12 --key-empty=nil --key-serializer=SerializePoint --key-deserializer=DeserializePoint --value-type=float64 --value-size=8 --value-empty=0.0 --value-serializer=SerializeFloat64 --value-deserializer=DeserializeFloat64
package pointbptree

import (
    "encoding/binary"
    "math"
)

type Point struct {
    X, Y, Z int32
}

func SerializePoint(p *Point) []byte {
    bytes := make([]byte, 4*3)
    binary.BigEndian.PutUint32(bytes[0:04], uint32(p.X))
    binary.BigEndian.PutUint32(bytes[4:08], uint32(p.Y))
    binary.BigEndian.PutUint32(bytes[8:12], uint32(p.Z))
    return bytes
}

func DeserializePoint(bytes []byte) *Point {
    return &Point{
        X: int32(binary.BigEndian.Uint32(bytes[0:04])),
        Y: int32(binary.BigEndian.Uint32(bytes[4:08])),
        Z: int32(binary.BigEndian.Uint32(bytes[8:12])),
    }
}

func SerializeFloat64(f float64) []byte {
    bytes := make([]byte, 8)
    binary.BigEndian.PutUint64(bytes, math.Float64bits(f))
    return bytes
}

func DeserializeFloat64(bytes []byte) float64 {
    return math.Float64frombits(binary.BigEndian.Uint64(bytes))
}

FMap

docs

FMap provides a block oriented interface for implementing memory mapped file structures. It is block oriented because memory mapped structures should be block aligned. By making the interface block oriented, the programmer is forced to write the structures in a block oriented fashion. I use it with fs2/slice which provides a simple way to cast []byte to other types of pointers. You can accomplish a similar thing with just using the reflect package but you might find fs2/slice more convenient.

FMap provides an interface for creating both anonymous and file backed memory maps. It supports resizing the memory maps dynamically via allocation and free methods. Note, when an allocation occurs the underlying file and memory map may resize using mremap with the flag MREMAP_MAYMOVE. So don't let pointers escape your memory map! Keep everything as file offsets and be happy!

Memory Mapped IO versus Read/Write

A key motivation of this work is to explore memory mapped IO versus a read/write interface in the context of Go. I have two hypotheses:

The operating system is good at page management generally. While, we know more about how to manage the structure of B+Trees, VarChar stores, and Linear Hash tables than the OS there is no indication that from Go you can achieve better performance. Therefore, I hypothesize that leaving it to the OS will lead to a smaller working set and a faster data structure in general.
You can make Memory Mapping performant in Go. There are many challenges here. The biggest of which is that there are no dynamically size array TYPES in go. The size of the array is part of the type, you have to use slices. This creates complications when hooking up structures which contain slices to mmap allocated blocks of memory. I hypothesize that this repository can achieve good (enough) performance here.

In my past experience using the read/write interface I have encountered two challenges:

When using the read/write interface one needs to block and cache management. In theory databases which bypass the OS cache management get better performance. In practice, there are challenges achieving this from a garbage collected language.
Buffer management is a related problem. In the past I have relied on Go's built in memory management scheme. This often become a bottle neck. To solve this problem, one must implement custom allocators and buffer management subsystems.

Memory mapped IO avoids both of these problems by delegating them to the operating system. If the OS does a good job, then this system will perform well. If it does a bad job it will perform poorly. The reason why systems such as Oracle circumvent all OS level functions for page management is the designers believe: a) they can do it better, and b) it provides consistent performance across platforms.

Memory mapped IO in Go has several challenges.

You have to subvert type and memory safety.
There is no dynamically sized arrays. Therefore, everything has to use slices. This means that you can't just point a struct at a memory mapped block and expect it work if it has slices in it. Instead, some book keeping needs to be done to hook up the slices properly. This adds overhead.

The results so far:

It can be done
Integrating (partial) runtime checking for safety can be achieved through the use of the "do" interface.
The performance numbers look like they are as good or better than the Linear Hash table I implemented in my file-structures repository.

Related Projects

file-structures - A collection of file-structures includes: B+Tree, BTree, Linear Hash Table, VarChar Store.
data-structures - A collection of in memory data structures. Includes a B+Tree.
boltdb - a mmap'ed b+ tree based key/value store.
goleveldb - another database written in go
cznic/b - an in memory b+ tree
xiang90/bplustree - an in memory b+ tree
your project here.

fs2 - File Structures 2

What is this?

Why did you make this?

Have you worked on this type of system before?

B+ Tree

Features

Limitations

Quick Start

MMList

Quickstart

fs2-generic

How to install

How to generate a wrapper for the B+ Tree

How to generate a wrapper for the MMList

Variations

Using with go generate

FMap

Memory Mapped IO versus Read/Write

Related Projects

`fs2-generic`

Using with `go generate`