-
Notifications
You must be signed in to change notification settings - Fork 177
[feature request]: Implement Flush method. #346
Description
Hello TSDB folks!
We are chasing for a safe solution to quickly "terminate" Prometheus server without losing any monitoring data stored in memory (and WAL). By terminate, we mean killing whole instance, including potential persistent disk. We use thanos for uploading the blocks that are in tsdb-path blocks into object store, so we would like to dump in-mem HEAD block to the filesystem on demand and let Thanos to upload it. But there is no flush API for TSDB (thus no Flush endpoint for Prometheus). The example scenario would look like:
- We need to scale down Prometheus servers
- We remove all scrape targets, so nothing new is scraped (nothing new is added to TSDB)
- We hit
Flushendpoint. Head block is flushed to the filesystem and truncated in memory. - We wait until Thanos uploads all, including flushed head block.
- We terminate the instance.
The obvious workaround is TSDB Snapshot method, but that is actually not "safe". TSDB blocks are immutable and overlaps are not tolerated, so:
After we do the snapshot with withHead=true to separate directory (and making Thanos upload from those), we have indeed a portion of HEAD in the object storage (let's called it A) as we wanted. However:
- we are ultimately marking this instance as
dirty, because any new TSDB block from HEAD that got "written" into filesystem (becausedb.compact()decided so) as blockBis strictly overlapping withAand thus this instance cannot be used anymore again. - there is race condition possible that while we are doing a snapshot and uploading those, the
Bblock can be created and also uploaded by Thanos.
All of these problems make our case really difficult to handle, and just single flush logic will help us a lot here. Do you think we can enable those in TSDB (and maybe further in Prometheus?). Would you be ok to take a PR for it?
We would propose something like Flush method that will have logic similar db.compact() method, but with force db.compactor.Write of head block:
func (db *DB) Flush() error {
db.cmtx.Lock()
defer db.cmtx.Unlock()
db.mtx.RLock()
defer db.mtx.RUnlock()
// Wrap head into a range that bounds all reads to it.
head := &rangeHead{
head: db.head,
mint: mint,
maxt: maxt,
}
if _, err = db.compactor.Write(db.dir, head, mint, maxt); err != nil {
return errors.Wrap(err, "persist head block")
}
runtime.GC()
if err := db.reload(); err != nil {
return errors.Wrap(err, "reload blocks")
}
runtime.GC()
return nil
}
What do you think? @gouthamve @fabxc @krasi-georgiev
For context: We are experimenting with something that will auto-scale horizontally Prometheus servers in the highly dynamic environment (scrape targets changing a lot). We have implemented code that assigns targets to each Prometheus server automatically and scales up and down a number of Prometheus instances.