-
-
Notifications
You must be signed in to change notification settings - Fork 332
Description
With the new import system and related API changes shaping up, I thought it would be good to share my vision of a new way for nodes to declare migrations.
The problem
Right now, migrations are implemented in the frontend as an every-growing list of JS/TS functions that directly modify save data on load. This system works, but it makes it difficult to implement migrations. Not only do we need to know the exact form of save data, we also need to know JS/TS, which might not be the case for some node/plugin authors.
Many migrations are also pretty similar to each other. Some breaking changes happen relatively frequently and require migrations, but we still need to write the same every time.
Declarative per-node migrations
My solution to this problem are declarative per-node migrations. Migrations are declared on the backend on the specific node that needs the migration, and the frontend is responsible for carrying out the migration.
This system has several advantages:
- Only python-knowledge is required to add migrations.
- Migrations are declared right by the node that they affect.
- Migration implementations are trivially reused.
- Adding new migrations kinds requires implementing them once, and then we can use them in all nodes easily.
- Third-party plugins can use migrations.
- Migrations are new decoupled from save data. This means that we can (1) apply migrations are different points during the load process and (2) change the save format without having to change dozens of migrations.
In code, these migrations would look something like this:
@register(
...
migrations=[
rename(old="my_old_name")
]
)
def my_node(...):Non-linear history
However, it's no all roses and sunshine. The per-node aspect of this system produces a huge issue: there is no global ordering for migrations.
Right now, migrations have a global ordering, so they form a linear history that we can simply go through. When loading a save file, we read its migration counter, and then apply all migrations after the read counter value. The migration counter is essentially a timestamp, and we use it to figure out which migrations were added after the file was saved.
However, we lose this global ordering with per-node migrations. When 2 nodes have some number of migrations, there is no sense of order between the migrations between those nodes.
This isn't always a problem, though. As long as the per-node migrations only affect and depend on the node that they are declared on, all is good. The nodes and their migrations are completely independent of each other, so we can apply them in order. Global order is only necessary when nodes depend on other nodes for a migration. However, we don't actually need a total global order, we only care about the order of migrations of those 2 nodes. This creates a DAG that describes the dependencies between nodes and migrations. We then simply need to apply a topological sort to figure out the order in which we need to apply migrations.
Example
Suppose we have 2 nodes N and M with migrations n1, n2, n3, m1, m2, and m3, such that migrations(N) = [n1, n2, n3] and migrations(M) = [m1, m2, m3].
Further, the version of a node is simply its number of migrations. So a node with one migration is v1 and a node with no migrations is v0. Versions are also only whole numbers, so don't think of them as something like semantic versions.
If the migrations of N and M are independent of each other, then we can apply them in any order as long as we keep the per-node migration order. So the dependencies between migrations look like this:
flowchart LR
n1 --> n2
n2 --> n3
m1 --> m2
m2 --> m3
The graph should make it pretty obvious in which order we can apply migrations.
But what if migration m2 needs to create an N node? Well, then m2 would depend on N, but that's only half of the story. Since N might change (via migrations), m2 actually depends on a specific version of N. Specifically, m2 depends on the latest version of N when m2 was added.
Suppose N only had migrations n1 (version 1) when migration m2 was added. So m2 creates a node N v1. But in the next release, we change and remove some inputs from N, which are migrations n2 and n3. m2 would then create an invalid node N (v3) or would have to be updated to account for n2 and n3. Instead, we can simply say that m2 always creates a Node N v1 and simply migrate the created node to the latest version of N.
So migrations not only depend on specific nodes, but specific versions of nodes. In our example, m2 depends on n1 (version 1). Somewhat counterintuitively, this one dependency adds 2 edges to our dependency graph. Since m2 needs the version of N to be exactly v1, m2 must be applied after n1 and before n2.
flowchart LR
n1 --> n2
n2 --> n3
n1 --> m2
m2 --> n2
m1 --> m2
m2 --> m3
Now, m2 can create N v1 nodes and everything will be migrated correctly.