Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tikv crash in compaction filter casued by IoError: No Such file or directory: While open a file for random read: /data/xxx/yyy/zzz/115748.blob #328

Open
taoseng opened this issue Sep 2, 2024 · 8 comments

Comments

@taoseng
Copy link
Contributor

taoseng commented Sep 2, 2024

This error is different with the existing older Missing blob.

@taoseng
Copy link
Contributor Author

taoseng commented Sep 2, 2024

Status BlobStorage::Get(const ReadOptions& options, const BlobIndex& index,
                        BlobRecord* record, PinnableSlice* buffer) {
  auto sfile = FindFile(index.file_number).lock();
  if (!sfile)
    return Status::Corruption("Missing blob file: " +
                              std::to_string(index.file_number));
  
// NOTE-1: the purge obselete file thread can delete the file in this time, and the next line will report the error
  
  return file_cache_->Get(options, sfile->file_number(), sfile->file_size(),
                          index.blob_handle, record, buffer);
}

@taoseng
Copy link
Contributor Author

taoseng commented Sep 2, 2024

I have checked the code here and there is indeed a race condition present

@taoseng
Copy link
Contributor Author

taoseng commented Sep 5, 2024

@v01dstar Hello, can you help confirm

@v01dstar
Copy link
Contributor

v01dstar commented Sep 6, 2024

At first glance, seems possible, allow me dig more.

@v01dstar
Copy link
Contributor

v01dstar commented Sep 6, 2024

I think this is indeed a problem, unless we set skip_value_in_compaction_filter to be true, however, we don't. I am surprise that we don't see this error in our users' environment. If I didn't miss anything, this is more than a race condition. Since compaction filter does not go through the normal read path (i.e. read with a snapshot), this should happen quite frequently.

@taoseng
Copy link
Contributor Author

taoseng commented Sep 6, 2024

I guess that in the TIDB environment, Tikv only uses Compaction Filter in WriteCF, while WriteCF only saves some transaction commit information and small values less than 256 bytes. Moreover, by default, WriteCF does not enable Titan, so it will not occur.
This issue occurs in scenarios where Tikv is used with Rawkv or directly with Titan.

@v01dstar
Copy link
Contributor

v01dstar commented Sep 6, 2024

I guess that in the TIDB environment, Tikv only uses Compaction Filter in WriteCF, while WriteCF only saves some transaction commit information and small values less than 256 bytes. Moreover, by default, WriteCF does not enable Titan, so it will not occur. This issue occurs in scenarios where Tikv is used with Rawkv or directly with Titan.

Yes, I totally missed that. I guess, you can leverate skip_value_in_compaction_filter in this case. Or you can propose a simple fix, which as you suggested, and also mentioned in the TODO, i.e. return corresponding error to the caller of Get(), and the caller (compaction filter) decide what to do.

@taoseng
Copy link
Contributor Author

taoseng commented Sep 9, 2024

i try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants