Skip to content

Explore where to store and dashboard benchmark data #19

@fluidnumerics-joe

Description

@fluidnumerics-joe

Let's use this issue as a place to discuss where to store benchmark results and how to dashboard and plot them

Motivation

Storing benchmark data over time allows us to track how changes in Parcels influences user experience. Additionally, because we track details of compute platforms used to run the benchmarks, we have additional degrees of freedom that help characterize what we expect users to experience in terms of performance based on their choice in system. This allows us to anticipate / intelligently frame responses to performance related issues on the main parcels repository, and even perhaps guide users towards selecting hardware appropriate for running Parcels simulations.

Storage locations

Storage of benchmark data can be centrally managed either within this git repository or within another system capable of handling flat file databases (e.g. Google Sheets, Excel, Big Query, etc.).

Storing the benchmarks within this github repository could be convenient for early development. Dashboarding/exploring data could become as simple as providing a utility script to visualize and explore the data locally. The downside is in scalability. Looking into the future, their may be lots of individual benchmark runs leave us in the realm of having to work with git-lfs or some other solution to accomodate large file sizes. A workaround for a growing benchmarks dataset is to regularly trim off old runs and throw away data after a certain period of time and assume that one could use this repository to reproduce benchmarks on older versions of parcels if needed (is this reasonable to assume ?)

Storing benchmark data in something like Google Sheets, Excel, or Big Query mitigates the data size problem (or pushes it off to something even larger). With "cloud based services" we can use dashboarding tools (like Looker Studio ) to easily create dashboards that are publicly shareable and can even be embedded into documentation. Pushing data to a centrally managed system requires authentication setup, which can bring its own challenges (including dependencies on maintainers) for pushing data to the central benchmark repository.

What to do

I think it's worth making some branches of this repository to explore what it looks like to try different strategies for storing and dashboarding and use this issue as a central discussion thread on this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions