Skip to content
This repository was archived by the owner on Jun 30, 2025. It is now read-only.
This repository was archived by the owner on Jun 30, 2025. It is now read-only.

TinyBase: Status and Roadmap #766

Open
@pmrv

Description

@pmrv

Here I just want to briefly collect my todos.

What works

  • the latest iteration already looks and feels like "usual" pyiron (imo).
  • jobs and tasks can be created and used without imports
  • new database and storage interfaces to serialize and load jobs and their dependent objects
  • the new interfaces are flexible enough to support multiple types of projects, storage backends and databases
  • objects that implements the "old" HasHDF work natively with the new interfaces

What should be done

  • broker access to working directories via the project interface. Jobs should ask the project for a directory and pass it to the task. Tasks should mark themselves whether they may or may not run on remote machines.
  • expand database interface wrt to @tnecnivkcots renormalized database structure
  • clarify the precedence between project implementations and database implementations; my current thinking is that there should be a central database configured (as outlined below this may not be the same as the central database that we have now) in which projects of multiple types live (some normal, some archived, some scratch?). I.e. the database is the sole arbiter of truth. Right now however each project implementation can bring its own database.
  • more database and storage interfaces:
    • global database
    • file table database
    • the null database/project
    • archived projects, exported projects
    • S3 storage
  • revisit the executor class.
    • In the current usage the underlying state machine is not as useful as initially expected and can probably be substantially simplified
    • this will include setting on tasks information about their internal parallelism
    • rename vis-a-vis Executor submodule #765 maybe TaskExecutor?
    • remove Submitters and just use the plain executors plus the ExecutionContext
  • interaction with workflow developments. I see tinybase as mostly adding persistence and search-ability to the tools developed there. It should be quite straightforward already to use tiny jobs inside @liamhuber's nodes and vice-versa. The preferred way of integrating (nodes < tiny jobs or tiny jobs < nodes) will likely depend on exact requirements in terms of the number and calculations cost of the workflow and the nodes in question. Both ways are imo worthwhile, but this should be a bit formalized.
  • adding more spec, especially for database and storage interfaces. The respective classes already document expected use and assumptions, but it'll be useful to have it in one document somewhere
  • tests. tests. tests.
  • some more cosmetic work on creators.
  • Storable needs an auto update interface; classes implementing it should provide a dict[version_number, update_function] as a class attribute so that GenericStorage can patch these things as it goes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions