-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import/export API #482
Comments
Sounds good to me. csv and pg_dump text format should be fairly similar to produce. Would we also support importing from csv format or would import always be with the db tooling? "Google Cloud SQL only supports import from GCS using CSV" There is already a migration tool https://doc.akka.io/docs/akka-persistence-r2dbc/current/migration.html |
The actual import/export to the database would always be done with DB tooling, and not have anything to do with the tooling I'm talking about providing here. This is about preparing the import/export from a higher level abstraction that we can offer to users, allowing them to migrate to/from anything, including non Akka persistence stores. Since, on our side, there's no DB involved, we're just writing to/from files, I don't think it's necessary for us to provide any resumption capabilities. The operation should be fairly stable, should be very fast (at least 10s of MB a second) and will always be safe to restart from the beginning. The users source/destination of their data might be a DB, but that would be up to them if they wanted to implement some sort of resumption, we can't make any assumptions about their system since we have no idea what it is. |
Is would be something like this. Export:
Import:
|
Not quite. I don't expect that there will be a "public high level format" defined by this tool. What I do expect is that this tool will be a library that defines a case class representation of an event/snapshot/durable state, and a Flow that converts a stream of that case class to the CSV file and back. So, it will be up to the user what to do with that case class. They could convert it to their own protobuf/json/whatever schema, and read/write that out. Or, they could interface directly with the database they are importing the data from, or exporting the data to, and write it directly there. So, this is what it will look like: Export:
Import:
|
The case classes that we use to represent the records could well be protobuf classes, this would give us both a public high level format, as well as giving users the power to skip outputting that format and go directly to whatever they want to go, whether that be their own format or a live database. |
It would be great if Akka persistence r2dbc offered an import/export API. If you have data that you want to get in/out of Akka persistence r2dbc, this currently requires understanding exactly how the data is stored, including concepts like slices, serialization manifests, etc. It would be more convenient if Akka persistence r2dbc provided an API that worked with streams of high level representations of events/snapshots/durable state.
While
INSERT/SELECT
may be a convenient option to support for import/export operations, the most efficient way to import/export data into postgres is using theCOPY TO/FROM
statements (this is whatpg_dump/pg_restore
uses). So, the best way to support this, I believe, would be to consume/produce streams of CSV data, rather than connecting to postgres directly. In a cloud scenario, a user could then use Alpakka to stream these out to S3/GCS etc, and then use the RDS/Cloud SQL specific mechanisms of import/exporting from these stores.My reason for suggesting CSV files over the default text format that Postgres uses is that Google Cloud SQL only supports import from GCS using CSV, it doesn't seem to support the Postgres text (or binary) format. Supporting these other formats though in addition may be an option to consider - though I think the postgres binary format, while efficient, may be complex, though it should be well documented as I believe it's basically the same protocol that postgres speaks over the wire.
It could even have an option to output in pg_dump format - ie, include schema creation statements (noting that index creation statements should always come after the data).
The text was updated successfully, but these errors were encountered: