Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL fixtures needed for tileserver operation #38

Open
davenquinn opened this issue Jul 30, 2024 · 4 comments
Open

SQL fixtures needed for tileserver operation #38

davenquinn opened this issue Jul 30, 2024 · 4 comments

Comments

@davenquinn
Copy link
Member

davenquinn commented Jul 30, 2024

There are several sets of database fixtures needed for the tileserver to operate correctly. These include a variety of resources each with a different level of coupling to the tileserver Python code:

  • Table definitions for the cache system
  • Tile utility functions
  • Function definitions that create tiles for certain layers (e.g., the carto and maps layers)

These need to be created for the tileserver to operate properly. However, there are a few factors to consider with this design:

  • Some tile layers (e.g., paleogeography layers) are defined in the Macrostrat repository. There is a two-way dependency between the tileserver code and that schema.
  • We've tried to automatically apply these fixtures on tileserver spin-up before, but we've had problems with table locks since we've run that SQL synchronously
  • We've got a new migrations system that makes applying these fixtures potentially relatively simple, but it would be tedious to define a new migration every time tile layer definitions change

It seems like the right approach would be to either

  1. Move the bulk of this SQL to the Macrostrat repository, which would come at the cost of having the tileserver be in control of its own layer definitions
  2. Re-implement more of this code as static SQL called on-demand in the tileserver codebase (which would reduce some of the 'move-fast' benefits of having function-defined layers), and we'd still need to move the table defs etc.
  3. Figure out how to get the fixtures to run properly on tileserver spin-up.
  4. some combination of the above
@davenquinn
Copy link
Member Author

davenquinn commented Jul 30, 2024

At some level this problem is analogous to some of the API design/dependency issues created by the PostgREST APIs. Architectural input is appreciated.

@brianaydemir
Copy link
Collaborator

We've tried to automatically apply these fixtures on tileserver spin-up before, but we've had problems with table locks since we've run that SQL synchronously

Figure out how to get the fixtures to run properly on tileserver spin-up.

I think I need to hear more details on what exactly is contending for the table locks. In the abstract, I'd have two tasks that can be run completely independently of each other:

  1. Applying fixtures to the database
  2. Starting and running the tileserver

Then, we run these tasks in sequence, one after the other. In a k8s Pod definition, (1) becomes an init_container and (2) becomes a normal container. K8s will guarantee that (1) completes successfully before (2). The process of applying the fixtures shouldn't step on anything else's toes (because the tileserver proper isn't running yet), and the tileserver can simply assume that the database is laid out properly when it starts (so it doesn't to worry about sequencing start-up tasks before serving "real" requests).

@davenquinn
Copy link
Member Author

davenquinn commented Jul 30, 2024

@brianaydemir in this design would it also be possible to apply the fixtures "on demand" while the pods are running? Right now we tend to apply database fixtures by command-line since they require a fairly long toolchain.

I guess a related question is — what is the right way to run migrations (including the more general ones produced by @mwestphall is working on) in production? Is it in some sort of containerized workflow as you laid out above, ensuring that the "right" state of the database is present at all times? Or do we simply run migrations manually and update containers in coordination with that? I guess it gets complicated because different containers will depend on different "subsystems" to be at their most up-to-date schema...

@brianaydemir
Copy link
Collaborator

@davenquinn As long as the database is up and running, the only things stopping you from making changes to it are the other services trying to make use of it in its current state.

I guess one thing to consider is the way we run K8s and services on it is that we have a(nother) Git repository that declaratively specifies what the state of the services on the cluster should be, so there's a general expectation that everything needed to get a service up and running is in that YAML. So, if a service requires the database be in a particular state, the service and its definition would, ideally, ensure that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants