Skip to content
This repository was archived by the owner on Aug 13, 2019. It is now read-only.
This repository was archived by the owner on Aug 13, 2019. It is now read-only.

[feature request]: Implement Flush method. #346

@bwplotka

Description

@bwplotka

Hello TSDB folks!

We are chasing for a safe solution to quickly "terminate" Prometheus server without losing any monitoring data stored in memory (and WAL). By terminate, we mean killing whole instance, including potential persistent disk. We use thanos for uploading the blocks that are in tsdb-path blocks into object store, so we would like to dump in-mem HEAD block to the filesystem on demand and let Thanos to upload it. But there is no flush API for TSDB (thus no Flush endpoint for Prometheus). The example scenario would look like:

  1. We need to scale down Prometheus servers
  2. We remove all scrape targets, so nothing new is scraped (nothing new is added to TSDB)
  3. We hit Flush endpoint. Head block is flushed to the filesystem and truncated in memory.
  4. We wait until Thanos uploads all, including flushed head block.
  5. We terminate the instance.

The obvious workaround is TSDB Snapshot method, but that is actually not "safe". TSDB blocks are immutable and overlaps are not tolerated, so:

After we do the snapshot with withHead=true to separate directory (and making Thanos upload from those), we have indeed a portion of HEAD in the object storage (let's called it A) as we wanted. However:

  • we are ultimately marking this instance as dirty, because any new TSDB block from HEAD that got "written" into filesystem (because db.compact() decided so) as block B is strictly overlapping with A and thus this instance cannot be used anymore again.
  • there is race condition possible that while we are doing a snapshot and uploading those, the B block can be created and also uploaded by Thanos.

All of these problems make our case really difficult to handle, and just single flush logic will help us a lot here. Do you think we can enable those in TSDB (and maybe further in Prometheus?). Would you be ok to take a PR for it?

We would propose something like Flush method that will have logic similar db.compact() method, but with force db.compactor.Write of head block:

func (db *DB) Flush() error {
	db.cmtx.Lock()
	defer db.cmtx.Unlock()

	db.mtx.RLock()
	defer db.mtx.RUnlock()

	// Wrap head into a range that bounds all reads to it.
	head := &rangeHead{
		head: db.head,
		mint: mint,
		maxt: maxt,
	}
	if _, err = db.compactor.Write(db.dir, head, mint, maxt); err != nil {
		return  errors.Wrap(err, "persist head block")
	}

	runtime.GC()
	if err := db.reload(); err != nil {
		return  errors.Wrap(err, "reload blocks")
	}
	runtime.GC()
        return nil
}

What do you think? @gouthamve @fabxc @krasi-georgiev

For context: We are experimenting with something that will auto-scale horizontally Prometheus servers in the highly dynamic environment (scrape targets changing a lot). We have implemented code that assigns targets to each Prometheus server automatically and scales up and down a number of Prometheus instances.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions