Rocksdb 写入数据后 GetApproximateSizes 获取的大小竟然为0?
作者:互联网
项目开发中需要从引擎 获取一定范围的数据大小,用作打点上报,测试过程中竟然发现写入了一部分数据之后通过GetApproximateSizes 获取写入的key的范围时取出来的数据大小竟然为0。。。难道发现了一个bug?(欣喜)
因为写入的数据是小于一个sst的data-block(默认是4K),会不会因为GetApproximateSizes 对小于一个data-block的数据大小都默认是0?对于一个严谨的引擎,这么明显的问题显然不可忍。
问题代码:
#include <iostream>
#include <string>
#include <rocksdb/db.h>
#include <rocksdb/slice.h>
#define VALUE_SIZE 100
using namespace std;
using namespace rocksdb;
void check_status(Status s, std::string op) {
if (!s.ok()) {
cout << " Excute " << op << " failed "
<< s.ToString() << endl;
exit(1);
}
}
static std::string Key(int i) {
char buf[100];
snprintf(buf, sizeof(buf), "key%06d", i);
return std::string(buf);
}
int main() {
rocksdb::DB* db;
rocksdb::Options options;
rocksdb::Status s;
options.create_if_missing = true;
options.compression = kNoCompression;
// 打开db
check_status(rocksdb::DB::Open(options, "./db", &db), "Open DB");
// 写入10条key-value,value大小是100B
for (int i = 0;i < 10; i++) {
check_status(db->Put(WriteOptions(),
Key(i),
Slice(string(VALUE_SIZE, 'a' + (i % 26)))),
"Put DB");
}
// 取其中的key范围为[1,3],获取处于这个范围的key-value大小
uint64_t size;
string start = Key(1);
string end = Key(3);
Range r(start, end);
db->GetApproximateSizes(&r, 1, &size);
cout << "Approximate size is " << size << endl;
delete db;
return 0;
}
最终的执行结果是:
Approximate size is 0
本来开开心心,很明显的问题,想要分析一下原因,向社区提一个PR,结果翻看了一下源代码就没心情了,还是自己太天真。
这个获取指定范围的key大小的接口是有一个额外参数的include_flags
:
virtual void GetApproximateSizes(const Range* ranges, int n, uint64_t* sizes,
uint8_t include_flags = INCLUDE_FILES) {
GetApproximateSizes(DefaultColumnFamily(), ranges, n, sizes, include_flags);
}
这个额外参数是用来指定从rocksdb的哪一个组件获取指定范围的key的大小,比如从memtable,或则 sst?
自己使用默认参数 写入了一小部分数据,显然没有达到触发flush的条件,都会存储在memtable,所以这里从默认的sst文件获取这个范围的key大小时显然获取不到。
可以继续看更底层的实现:
Status DBImpl::GetApproximateSizes(const SizeApproximationOptions& options,
ColumnFamilyHandle* column_family,
const Range* range, int n, uint64_t* sizes) {
......
Version* v;
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
auto cfd = cfh->cfd();
// 增加针对当前cf的引用
SuperVersion* sv = GetAndRefSuperVersion(cfd);
v = sv->current;
// 允许同时传入多个range,这里对传入的range进行遍历
for (int i = 0; i < n; i++) {
Slice start = range[i].start;
Slice limit = range[i].limit;
// Add timestamp if needed
std::string start_with_ts, limit_with_ts;
if (ts_sz > 0) {
// Maximum timestamp means including all key with any timestamp
AppendKeyWithMaxTimestamp(&start_with_ts, start, ts_sz);
// Append a maximum timestamp as the range limit is exclusive:
// [start, limit)
AppendKeyWithMaxTimestamp(&limit_with_ts, limit, ts_sz);
start = start_with_ts;
limit = limit_with_ts;
}
// Convert user_key into a corresponding internal key.
InternalKey k1(start, kMaxSequenceNumber, kValueTypeForSeek);
InternalKey k2(limit, kMaxSequenceNumber, kValueTypeForSeek);
sizes[i] = 0;
// 从sst文件中取指定key范围的大小
if (options.include_files) {
sizes[i] += versions_->ApproximateSize(
options, v, k1.Encode(), k2.Encode(), /*start_level=*/0,
/*end_level=*/-1, TableReaderCaller::kUserApproximateSize);
}
// 从memtable中取出指定key范围的大小,包括mem和imm
if (options.include_memtabtles) {
sizes[i] += sv->mem->ApproximateStats(k1.Encode(), k2.Encode()).size;
sizes[i] += sv->imm->ApproximateStats(k1.Encode(), k2.Encode()).size;
}
}
// 释放对superversion的引用
ReturnAndCleanupSuperVersion(cfd, sv);
return Status::OK();
}
再对应到从sst文件的blockbased table中取数据,需要创建blockbased的index的iter来取start-end key所属的datablock的偏移地址。
如果要从memtable 中取数据,也就是需要遍历skiplist,顺序逐层遍历跳表,找到属于start-end范围内的所有key的个数,统一计算大小。
经过上面一轮的分析,我们就知道了想要通过GetApproximateSizes 获取准确的一个区间内的key-value大小,需要同时计算memtable+sst的大小,这才足够精确。
ps: 同样的数据放在memtable和放在sst中是不一样的,因为sst中除了data-block中key-value数据,还有indexblock,还有metaindex,还有footer。所以统计同样的数据在memtable和sst中会有一些差异。
最终正确使用GetApproximateSizes()
接口的方式如下:
#include <iostream>
#include <string>
#include <rocksdb/db.h>
#include <rocksdb/slice.h>
#define VALUE_SIZE 100
using namespace std;
using namespace rocksdb;
void check_status(Status s, std::string op) {
if (!s.ok()) {
cout << " Excute " << op << " failed "
<< s.ToString() << endl;
exit(1);
}
}
static std::string Key(int i) {
char buf[100];
snprintf(buf, sizeof(buf), "key%06d", i);
return std::string(buf);
}
int main() {
rocksdb::DB* db;
rocksdb::Options options;
rocksdb::Status s;
options.create_if_missing = true;
options.compression = kNoCompression;
check_status(rocksdb::DestroyDB("./db", options),
"DestroyDB");
check_status(rocksdb::DB::Open(options, "./db", &db), "Open DB");
for (int i = 0;i < 3; i++) {
check_status(db->Put(WriteOptions(),
Key(i),
Slice(string(VALUE_SIZE, 'a' + (i % 26)))),
"Put DB");
}
uint64_t size;
string start = Key(1);
string end = Key(3);
Range r(start, end);
db->GetApproximateSizes(&r, 1, &size);
cout << "Approximate size is " << size << endl;
uint8_t include_both = DB::SizeApproximationFlags::INCLUDE_FILES |
DB::SizeApproximationFlags::INCLUDE_MEMTABLES;
db->GetApproximateSizes(&r, 1, &size, include_both);
cout << "After set memtable flag, Approximate size is " << size << endl;
db->Flush(FlushOptions());
db->GetApproximateSizes(&r, 1, &size);
cout << "After flush, Approximate size is " << size << endl;
delete db;
return 0;
}
输出如下:
Approximate size is 0
After set memtable flag, Approximate size is 238
After flush, Approximate size is 1151
好吧,不用提bug了。。。。。。
标签:rocksdb,db,Rocksdb,写入,start,key,GetApproximateSizes,include,options 来源: https://blog.csdn.net/Z_Stand/article/details/114709592