CuttDB

A hash-based key-value database for persistent storing massive small records, initially designed to be a indexed repository for ten to hundreds of millions URLs, web pages and small documents.

Features

It is a database library to be embeded into other programs.
Keys and values are arbitrary byte arrays, value is retrieved by key.
Very good performance, especially for insert operation, even when stored massive amount of records.
Designed sophisticated cache on both index pages and records to take use of memory better.
Data on disk is always write ahead to minimize the possibility of data loss.
Bloom filter is included to accelerate the query on inexistent record
Record expiration and space recycle are supported.
Multithreading is supported.
Server side with memcached protocols(set/add/replace/get/delete) is also supported now.

Limitations

SQL or relation data model is not supported.
The index structure is based on hash, so prefix query or ordered iteration is not supported.
Transaction is not supported.
Update operation(set for exist record) is less efficient

Performance

Simple test on a machine with Core 2 Duo [email protected]/8GB RAM/7200RPM SATA/Linux 3.2.0-25

 Insert 200,000,000 records, key and value are both 8-byte strings:
 Overall: 200000000 / 200000000     (458.941 s, 435784 ops)
 Now program consumes 2.0 GB ram.


 Clean OS cache by 'echo 3 > /proc/sys/vm/drop_cache'.
 Retrieve 50000 records randomly:
 Overall: 50000 / 50000     (459.279 s, 108 ops)


 Retrieve these records again: 
 Overall: 50000 / 50000     (0.037 s, 1315789 ops)

a detailed test result please refer to 200,000,000 benchmark.

Usage

compile & use the library

$ git clone "http://cuttdb.googlecode.com/svn/trunk/" cuttdb
$ cd cuttdb/src ; make ; sudo make install
$ vim test.c
$ gcc test.c -lcuttdb; mkdir testdb; ./a.out

contents in test.c :

#include <cuttdb.h>
#include <stdio.h>

int main()
{
    CDB *db = cdb_new();
    cdb_option(db, 20000, 16, 16);
    if (cdb_open(db, "./testdb/", CDB_CREAT) < 0) {
        printf("Open Failed\n");
        return -1;
    }
    cdb_set(db, "key1", 4, "HELLO1\0", 7);

    char *val; int vsize;
    cdb_get(db, "key1", 4, (void**)&val, &vsize);
    printf("value for key1:%s[%d]\n", val, vsize);
    cdb_free_val((void**)&val);
    cdb_destroy(db);
}

Server version:

$ ./cuttdb-server -H /data/testdb -r 0 -P 1024 -n 100000 -d -t 4
$ telnet 127.0.0.1 8964
 

Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
set test_key1 0 0 10
test_value
STORED
get test_key1
VALUE test_key1 0 10
test_value
END
delete test_key1
DELETED
get test_key1
END

The program is currently used in web crawler at cutt.com

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
Design of CuttDB.pdf		Design of CuttDB.pdf
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CuttDB

Features

Limitations

Performance

Usage

compile & use the library

Server version:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

fusiyuan2010/cuttdb

Folders and files

Latest commit

History

Repository files navigation

CuttDB

Features

Limitations

Performance

Usage

compile & use the library

Server version:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages