Skip to content

Long-Term Considerations Regarding Toml File Storage  #97

@simonsan

Description

@simonsan

Current situation

I initially implemented the storage model for the activity log based on the Toml file format.
When we begin a new activity, we parse the log with all entries, and then we append it in memory to the vector and write the whole activity log back into the file.

When we end or update an activity, we do the same thing.

I think that has several disadvantages, e.g. when there is an error during writing back the file, it could be damaged and the activity log destroyed. Also, it will take longer and longer to parse it, with activities becoming more and more. I need to benchmark that, it could be negligible with a few thousand activities, which is unlikely to happen, as users might archive their activity log monthly when the archival feature is implemented.

I could refactor the entire model to an event based one, so the log file is really append-only and only writes to the end of the file. But I'm actually not sure if this makes sense at this point, because I want to implement the storage in a SQLite database soonish, which would make this obsolete. Because I don't think we want to do it event based in the database, as it's much easier to query for a record and update it or even batch update records.

The reason I initially used Toml was so users can edit it within their favourite text editor, and I found that kind of useful as I used that a lot to edit activities in bartib. I think this would become less useful, when I reimplement it in a way, that only events are stored. Because then it's not as easy to determine any more, what the actual status, duration etc. of an activity really is. To determine that, we would need to parse all activities in a certain time frame and then merge the events. Which will be much more complicated.

Pros And Cons

Current TOML-Based Storage Model:

Pros:

  • Human-Readable: TOML files can be easily read and edited with a text editor, providing transparency and direct access to the data for users.
  • Ease of Implementation: Implementing storage using TOML is relatively straightforward and doesn't require additional dependencies or infrastructure.
  • Low Complexity: The current model is simple and easy to understand, making it suitable for small to medium-sized datasets.

Cons:

  • Risk of Data Corruption: Writing the entire activity log file each time an update occurs increases the risk of data corruption if there's an error during the write operation.
  • Performance Degradation: Parsing and writing the entire file can become slow and inefficient as the log grows larger, impacting overall application performance.
  • Limited Scalability: The current model may struggle to handle large datasets efficiently, especially as the number of activities increases over time.

Event-Based Append-Only Model:

Pros:

  • Improved Data Integrity: Moving to an append-only model reduces the risk of data corruption, since updates are only appended to the end of the file.
  • Better Performance: With no need to reparse the entire file, performance is improved, especially for large activity logs.
  • Scalability: The event-based model scales more effectively with growing datasets, as it doesn't suffer as much from the performance degradation associated with reparsing the entire file.

Cons:

  • Complexity: Implementing an event-based model introduces additional complexity compared to the current read-all-write-all TOML-based approach, requiring careful design and implementation.
  • Loss of Human-Readability: While the append-only model is more efficient, it sacrifices the human-readable nature of TOML files, making direct editing by users more challenging.
  • Data Retrieval Complexity: Retrieving and interpreting data from an append-only log may require more sophisticated parsing and processing logic, potentially complicating certain operations.
  • Difficulty in Database Migration: If the event log is implemented using a file-based format like TOML, it may not be directly compatible with a database-backed storage solution like SQLite. This can result in the event-based model becoming obsolete when transitioning to a database-driven architecture, requiring a rewrite or significant refactoring of the storage layer.

Direct Transition to SQLite:

Pros:

  • Data Integrity and Reliability: SQLite provides robust data storage capabilities, ensuring data integrity and reliability, even in the face of unexpected errors or interruptions.
  • Efficient Queries: SQLite's query capabilities enable efficient retrieval and manipulation of data, supporting complex queries and analysis.
  • Scalability: SQLite can handle large datasets efficiently, making it suitable for applications with growing storage requirements.

Cons:

  • Dependency and Infrastructure: SQLite introduces a dependency on an external library and requires managing database connections and transactions, adding complexity to the application.
  • Deployment Considerations: Deploying and managing SQLite databases may require additional configuration and maintenance compared to simple file-based storage solutions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-architectureArea: Related to our architectureA-storageArea: Related to our storage systemC-questionCategory: Further information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions