Simulator Roadmap #3954

alpaylan · 2025-11-13T19:23:28Z

This PR is a working doc on a roadmap for the simulator. @pedrocarlo @LeMikaelF please take a look.

LeMikaelF · 2025-11-13T19:35:46Z

simulator/ROADMAP.md

+Actionable items:
+
+- [ ] Implement generation and/or shadowing for one of the languages features in [COVERAGE.md](./COVERAGE.md)
+- [ ] At the moment, implementing a feature requires both adding a generation for it as well as


If removing shadowing is an option, I think the sooner we remove it, the easier development will become, because for every function or query shape we add to the generator, we'll need to reimplement it in the simulator, and in my short experience that's far from trivial. In the last days, I've had to fight with affinity and type coercion, column projections, and others that I forget.

I have an idea that removes shadowing, which is essentially using SQLite as our shadow. Attach the same database to both SQLite and Turso, and just use SQLite anywhere we would use the shadow table. What do you think?

So we would do writes with Turso, and reads with both Turso and SQLite. This sounds like a simple solution that would remove a lot of complexity from the simulator and would let us focus on the more important parts (query generation and properties). This sounds fantastic.

I wonder if there might be issues with concurrency or locking, especially once we start having multiple concurrent clients?

Another question is, at that point, is this oracle significantly different from differential? I know I keep circling back to this, but I don't fully comprehend what it is that the shadow oracle provides that the differential oracle doesn't.

I wonder if there might be issues with concurrency or locking, especially once we start having multiple concurrent clients?

Yeah that's the part that makes me question it the most. I don't know what happens to the database state when we're in a concurrent setting, is it possible to read the ephemeral state etc.

Another question is, at that point, is this oracle significantly different from differential? I know I keep circling back to this, but I don't fully comprehend what it is that the shadow oracle provides that the differential oracle doesn't.

I think it's a fair question, let me explain my perspective. For one thing, not all queries are deterministic. If you have a limit query, Turso actually doesn't guarantee that you'll have the same output (#2024). So we cannot rely purely on differentiality.

Another issue is locality. Differential testing bugs are very non-local, they provide little to no useful information for debugging. Properties give you much more local information.

The last one is that you want to have this infrastructure for long term, because at some point the projects will diverge, Turso will be SQLite compatible, but it won't be sqlite-rs. So having a bespoke PBT infrastructure is very useful at that point.

About your second points, that's true, I didn't think of that. If you do a SELECT ... LIMIT 5, what you need to check is that the rows are a subset of the unlimited SELECT, not that another database gives the same result.

This makes me think that doublecheck could also give false positives on queries like insert into t select * from s limit 5. I'm currently working on extending the generator to implement queries like this, but what this means is that without a way to have oracle-specific generation (like you suggested in this document), this won't work.

I'm nearing the end of my day here, I'll pick this up tomorrow.

Back to the question of shadowing. I think I need to be educated on how MVCC works, but I presume it's possible to read the ephemeral (non-committed) state from outside? It's not going to be the default behavior, but surely it's something that's accessible? Would SQLite be able to read the MVCC state at a given point from a Turso-generated database?

So MVCC is a construct that mainly exists in memory. We could maybe create some custom hooks that we can read rows from MVCC, but that would be slightly complicated I imagine.

Would SQLite be able to read the MVCC state at a given point

Physically (e.g on the disk), a Turso DB file using MVCC should theoretically have no differences from a SQLite DB. The only thing that is currently different is that we have a logical log file that is used for MVCC, but that is only used for checkpointing and for faster appends. After you close the db connection and checkpoint the WAL and Logical Log, it should just be a SQLite file.

Now, I am not sure how feasible it is to make SQLite read the in memory state from Turso while the simulation is running.

I see. The reason I thought about SQLite is because (as far as I understand), whatever happens, Turso will be compatible with SQLite file format. As such, we should always be able to read a Turso-generated file from SQLite for determining the database state. So the fact that semantics diverge isn't a problem, because we aren't thinking in terms of functionality equivalence, only state equivalence.

If I understand correctly, the double-read (Turso+SQLite) strategy you suggest would work only:

in the absence of concurrent queries (because it would be hard to synchronize SQLite with the other processes)

in the absence of MVCC

in the absence of transactions (because SQLite wouldn't see uncommitted data or the same row versions as the transaction under test)

But there are ways around these limitations.

For concurrent queries, we would probably need a custom scheduler. I don't know if we want to do that.

For MVCC, we could just validate it using a doublecheck-like strategy, where we run an interaction plan with MVCC and run it again using WAL and SQLite, checking that the resulting binaries are identical.

For transactions, we could patch SQLite to be able to specify the "end mark" used by transactions. Something like Oracle's SELECT ... AS OF :timestamp.

That the main benefit of the double-read oracle is that it could prevent us from having to maintain a shadow model, but the drawbacks may outweigh benefits.

Another idea I have is to enable shadowing on both SQLite and the custom shadow. We can keep the custom shadow for inside the concurrent sections, do SQLite on the other cases. Having a clean architecture for doing this is crucial, but I believe it's doable.

LeMikaelF · 2025-11-13T19:39:21Z

simulator/ROADMAP.md

+Actionable items:
+
+- [ ] Implement generation and/or shadowing for one of the languages features in [COVERAGE.md](./COVERAGE.md)
+- [ ] At the moment, implementing a feature requires both adding a generation for it as well as


What do you think of separating the Arbitrary trait into ArbitraryShadowed and Arbitrary?

It could be an easy refactor to rename every implementation of Arbitrary to ArbitraryShadowed and add a separate Arbitrary trait that would delegate by default to ArbitraryShadowed, and then all or most new development could go into Arbitrary.

We could add this as a GH issue and flag it with "help wanted", or "good first issue"?

I think we should first decide the do we want to keep shadowing question you posed, because it's an important one.

simulator/ROADMAP.md

LeMikaelF · 2025-11-13T20:03:58Z

simulator/ROADMAP.md

+- [ ] Multi-client generation: TODO
+- [ ] LLM-guided generation: TODO
+- [ ] Custom feedback guided generation: TODO
+- [ ] Coverage-guided generation: TODO


Does this mean AFL? I've been dying to try it out on Turso.

Fuzzing SQLite with AFL was a major success story a few years ago:

https://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html

https://sqlite.org/afl/dir

In the second link, they found 22 crashes in "30 minutes of actual work", by running the existing SQLite test cases through AFL.

What I had in mind was AFL+structured mutation/generation. I've previously build bespoke coverage infrastructure, so I know it's not that hard to do even though it might a bit slow in the way we do it. We can use the custom coverage we obtain for guiding the generation. Isn't Turso currently fuzzed? I would expect it to be.

it has custom fuzzers for many statements and behaviours, and if you look around you will find even more fuzzers that are written but I think are not actively used. But there is no centralized fuzzer.

simulator/ROADMAP.md

pedrocarlo · 2025-11-14T14:40:16Z

Nice stuff! Loved the ideas and the Coverage document.

LeMikaelF · 2025-11-16T18:51:54Z

simulator/ROADMAP.md

+
+## Fault Injection
+
+- [ ] TODO


In case you missed it, short writes are now taken care of in unreliable-libc (#3569).

There's also clock skew, but it's always simulated, it's not a separate explicit fault.

I kept the fault injection part TODO on purpose because I really haven't touched it much. We can iterate on that part even more. We should probably have a better description of how that works too.

alpaylan requested a review from penberg as a code owner November 13, 2025 19:23

github-actions bot added the simulator label Nov 13, 2025

wip: add simulator roadmap

4aac076

LeMikaelF reviewed Nov 13, 2025

View reviewed changes

simulator/ROADMAP.md Outdated Show resolved Hide resolved

alpaylan added 2 commits November 13, 2025 16:50

small updates based on comments

575541c

small updates based on comments

e2c5390

detail correctness and properties

f47293c

LeMikaelF reviewed Nov 16, 2025

View reviewed changes

LeMikaelF mentioned this pull request Nov 24, 2025

Differential testing tool #4025

Open

add detailed properties, long term simulation and data collection

e664053

Simulator Roadmap #3954

Are you sure you want to change the base?

Simulator Roadmap #3954

Conversation

alpaylan commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pedrocarlo Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pedrocarlo commented Nov 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pedrocarlo Nov 14, 2025 •

edited

Loading