Skip to content

[Bug] Rocks db put failed due to data corruption and BE cannot start #53366

@lide-reed

Description

@lide-reed

Search before asking

  • I had searched in the issues and found no similar issues.

Version

`F20250714 14:49:34.637605 4065337 tablet_meta.cpp:494] fail to save tablet_meta. status=[E-3004]rocks db put failed, key: tabletmeta_6330624_594418417, reason: Corruption: block checksum mismatch: expected 2465524131, got 2762918744 in /data/cdw/doris/be/storage/meta/382520.sst offset 37666686 size 1298476

    0#  doris::OlapMeta::put(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    1#  doris::TabletMetaManager::save(doris::DataDir*, long, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    2#  doris::TabletMeta::_save_meta(doris::DataDir*)
    3#  doris::TabletMeta::save_meta(doris::DataDir*)
    4#  doris::Tablet::save_meta()
    5#  doris::Tablet::do_tablet_meta_checkpoint()
    6#  doris::TabletManager::do_tablet_meta_checkpoint(doris::DataDir*)
    7#  doris::ThreadPool::dispatch_thread()
    8#  doris::Thread::supervise_thread(void*)
    9#  ?
    10# ?

, tablet_id=6330624, schema_hash=594418417
*** Check failure stack trace: ***
@ 0x556ba7d7aae6 google::LogMessageFatal::~LogMessageFatal()
@ 0x556b9dab3b7e doris::TabletMeta::_save_meta()
@ 0x556b9dab3730 doris::TabletMeta::save_meta()
@ 0x556b9da3f7c4 doris::Tablet::save_meta()
@ 0x556b9da52169 doris::Tablet::do_tablet_meta_checkpoint()
@ 0x556b9da9cad0 doris::TabletManager::do_tablet_meta_checkpoint()
@ 0x556b9df7d8a8 doris::ThreadPool::dispatch_thread()
@ 0x556b9df72c31 doris::Thread::supervise_thread()
@ 0x7f880003f215 start_thread
@ 0x7f88000c1bdc __clone3
@ (nil) (unknown)
*** Query id: 0-0 ***
*** is nereids: 0 ***
*** tablet id: 0 ***
*** Aborted at 1752475774 (unix time) try "date -d @1752475774" if you are using GNU date ***
*** Current BE git commitID: 2505262 ***
*** SIGABRT unknown detail explain (@0x3e8003e0527) received by PID 4064551 (TID 4065337 OR 0x7f85fdd916c0) from PID 4064551; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) in /usr/local/service/doris/lib/be/doris_be
1# 0x00007F87FFFF1AD0 in /lib64/libc.so.6
2# __pthread_kill_implementation in /lib64/libc.so.6
3# raise in /lib64/libc.so.6
4# __GI_abort in /lib64/libc.so.6
5# 0x0000556BA7D81A5D in /usr/local/service/doris/lib/be/doris_be
6# google::LogMessage::SendToLog() in /usr/local/service/doris/lib/be/doris_be
7# google::LogMessage::Flush() in /usr/local/service/doris/lib/be/doris_be
8# google::LogMessageFatal::~LogMessageFatal() in /usr/local/service/doris/lib/be/doris_be
9# doris::TabletMeta::_save_meta(doris::DataDir*) in /usr/local/service/doris/lib/be/doris_be
10# doris::TabletMeta::save_meta(doris::DataDir*) in /usr/local/service/doris/lib/be/doris_be
11# doris::Tablet::save_meta() in /usr/local/service/doris/lib/be/doris_be
12# doris::Tablet::do_tablet_meta_checkpoint() in /usr/local/service/doris/lib/be/doris_be
13# doris::TabletManager::do_tablet_meta_checkpoint(doris::DataDir*) in /usr/local/service/doris/lib/be/doris_be
14# doris::ThreadPool::dispatch_thread() in /usr/local/service/doris/lib/be/doris_be
15# doris::Thread::supervise_thread(void*) in /usr/local/service/doris/lib/be/doris_be
16# start_thread in /lib64/libc.so.6
17# __GI___clone3 in /lib64/libc.so.6`

What's Wrong?

BE coredump and cannot start

What You Expected?

BE work well

How to Reproduce?

2.1.10 version, there are three clusters have this bug these days.

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions