-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer rocksdb-rs Repo to tikv org #166
Comments
What's the relations between tikv/agatedb and rocksdb-rs? Can you describe more on the motivation of building a new rocksdb implementation instead of agatedb? |
OK. I thought about creating my project based on agatedb before. But I found some critical issues preventing me from doing this.
These issues reminds me that what I need is far away from agatedb. so I have to implement a new one. But I copy the code of memtable(skiplist) from agatedb and I think it is the only thing works for me... |
agatedb is supposed to inherit all features badger supports, so it may support it at the moment, but it's supposed to support it in the end.
Indeed. This is the trade off between support compatibility and new features. But after we can divide LSM tree for regions, it may ease the pain for upgrading and downgrading. For example, using different engines for different regions.
I think this is similar to the first statement. Actually as explained the README of agatedb, the desire of writing a new engine in Rust is to explorer asynchronous support and unify thread menagements. It's part of its goal. However, as one of the author of agatedb and TiKV maintainer, I personally happy to add this project as another experimental project in TiKV org to explorer further possibility as long as it's actively working on. /cc @skyzh @zz-jason What's your opinions? /cc @tikv/maintainers |
I'm not against new data formats. I just think it's a better choice to support compatibility with the old format at first and we can upgrade the format step by step. As you mentioned, we can store most of regions in a old format and store some of them in a new format. For cloud environment, mmap is not supported by most shared storage and it means that you have to store data in memory. Obviously, we can not store all data in memory. As of now, all interfaces of this project are synchronous, which means that if I want to use asynchronous interface, I need to rewrite almost all the code. It seems that no one cares the goal of agatedb.... I don't think the badger format is essential for cloud storage. But we can also implement the badger format in the rocksdb-rs. I think the most important thing is to support asynchronous IO so that the high latency of cloud disk (or S3 storage) won't block read thread. |
rocksdb-rs is really a great project. From my perspective, it has nearly the same functionalities as agatedb, except some design goals. For example, agatedb has key-value separation and user-specified epoch built-in, while RocksDB and rocksdb-rs doesn't. Firstly, I'd like to answer some questions regarding the agatedb project.
It indeed supports SSI. Every txn will create a snapshot on the current LSM tree, and agatedb should do conflict detection when writing (though this part is not implemented yet).
That's true, and changing the SST format should be very simple. agatedb stores value pointer as a normal value in LSM, so any SST format should be okay for agatedb.
This is also true. mmap doesn't sound like a good way to run a storage engine. I'm also happy if we could welcome a new experimental storage engine project to the TiKV organization, but I have some concerns regarding this transfer...
|
I do not think it is a good idea to support SSI in a kv engine. Because the transaction model may not be compatibility with the distributed transaction model of TiKV. So I suggest to implement a no transaction KV engine and then we can improvement it with developer of transaction developers.
Good point. I have finished the basic iterator and seek and write function. I need someone help me finished blockcache and compression and some other function. You'll find that this project has quite a few features done.
I have benchmarked the write performance by single thread and I found that it can only reach 65% of RocksDB. Most of the write time comes from skiplist and the skiplist of rocksdb-rs is copied from agatedb....
TiKV will name another project |
This is why I don't want to introduce a complex transaction model in the storage engine, unless it is designed from the distributed transaction of TiDB as a whole. In the first step, we only need to design an engine with the same interface usage as rocksdb, and then we can support more functions according to TiDB's distributed transactions |
To make sure there's no significant bug , I will add more unit test for this project in the future development. |
I have no objection (nor clear accept) for this proposal. Any ideas from other members? |
I would like to also hear from @sunxiaoguang @zhangjinpeng1987
In addition, if it's going to be a serious project, changes needs to be reviewed and get at least two approvals before landing in its own master. |
@zhangjinpeng1987 @sunxiaoguang Any suggestion? |
Background
To evolve towards the next generation database engine kernel, we reimplemented RocksDB using rust. This is our project address: https://github.com/rust-lib-project/rocksdb-rs. Of course, we will not implement all the functions of RocksDB. Our purpose is to create a more general KV data engine for TiKV services, not a transactional data engine for MyRocks services. In this project, we will eliminate most of the features that TiKV in RocksDB does not use, including transactions and two-phase commit code, in order to simplify our code as much as possible and make the project easier to maintain.
For the cloud ecosystem, asynchronous IO is essential for high-latency cloud disks. Fortunately, rust provides an asynchronous framework that is easier to use than C++, allowing us to easily write asynchronous IO code. Therefore, most of the interfaces of this project are provided as asynchronous method. Thanks to all interfaces supporting asynchrony, we can merge the compaction thread pool with other background thread pools in tikv to reduce thread switching and control CPU resources more precisely.
At present, rocksdb-rs still maintains similar interface methods and functions as rocksdb, but it adopts a low-coupling architecture, which can be easily refactored into a more friendly interface for TiKV in the future.
Roadmap
At present, the project has implemented the basic functions of rocksdb, including read and write queries and background merging. But it still lacks a lot of tests, and some key functions (you can see more details in https://github.com/rust-lib-project/rocksdb-rs/issues), I hope he can get help from more developers under the tikv organization and play a bigger role.
The text was updated successfully, but these errors were encountered: