[feature request]: Implement Flush method.

Hello TSDB folks!

We are chasing for a safe solution to quickly "terminate" Prometheus server without losing any monitoring data stored in memory (and WAL). By terminate, we mean killing whole instance, including potential persistent disk. We use [thanos](https://github.com/improbable-eng/thanos) for uploading the blocks that are in `tsdb-path` blocks into object store, so we would like to dump in-mem HEAD block to the filesystem on demand and let Thanos to upload it. But there is no `flush` API for TSDB (thus no `Flush` endpoint for Prometheus). The example scenario would look like:
1. We need to scale down Prometheus servers
2. We remove all scrape targets, so nothing new is scraped (nothing new is added to TSDB)
3. We hit `Flush` endpoint. Head block is flushed to the filesystem and truncated in memory.
4. We wait until Thanos uploads all, including flushed head block.
5. We terminate the instance.

The obvious workaround is TSDB `Snapshot` method, but that is actually not "safe". TSDB blocks are immutable and overlaps are not tolerated, so:

After we do the snapshot with `withHead=true` to separate directory (and making Thanos upload from those), we have indeed a portion of HEAD in the object storage (let's called it `A`) as we wanted. However:
- we are ultimately marking this instance as `dirty`, because any new TSDB block from HEAD that got "written" into filesystem (because `db.compact()` decided so) as block `B` is strictly overlapping with `A` and thus this instance cannot be used anymore again.
- there is race condition possible that while we are doing a snapshot and uploading those, the `B` block can be created and also uploaded by Thanos.

All of these problems make our case really difficult to handle, and just single `flush` logic will help us a lot here. Do you think we can enable those in TSDB (and maybe further in Prometheus?). Would you be ok to take a PR for it?

We would propose something like `Flush` method that will have logic similar `db.compact()` method, but with force `db.compactor.Write` of  head block:
```
func (db *DB) Flush() error {
	db.cmtx.Lock()
	defer db.cmtx.Unlock()

	db.mtx.RLock()
	defer db.mtx.RUnlock()

	// Wrap head into a range that bounds all reads to it.
	head := &rangeHead{
		head: db.head,
		mint: mint,
		maxt: maxt,
	}
	if _, err = db.compactor.Write(db.dir, head, mint, maxt); err != nil {
		return  errors.Wrap(err, "persist head block")
	}

	runtime.GC()
	if err := db.reload(); err != nil {
		return  errors.Wrap(err, "reload blocks")
	}
	runtime.GC()
        return nil
}
```
What do you think? @gouthamve @fabxc @krasi-georgiev 

For context: We are experimenting with something that will auto-scale horizontally Prometheus servers in the highly dynamic environment (scrape targets changing a lot). We have implemented code that assigns targets to each Prometheus server automatically and scales up and down a number of Prometheus instances.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature request]: Implement Flush method. #346

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature request]: Implement Flush method. #346

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions