Work out how to automate and formalise custom DB changes in medic-analytics #8

SCdF · 2016-07-22T10:33:48Z

The medic-analytics code alters the DB schema to create various tables and views. It also refreshes all existing materialised views in the database at the end of each run. It makes schema alterations by running migration scripts on startup. These scripts get baked into each release.

Tech leads need to be able to alter the DB schema per project, to create their own views (materialised or not). They may also need to add their own custom indexes to existing tables.

We need to think up some mechanism for them to do that in a way that is more formal than "here is the DB credentials go nuts". Though I'm not against it as a plan z.

This problem is somewhat like app settings. But, you can't take back a migration. This means you cannot "wholesale" replace migrations like you can with app settings.

Here are some off-the-cuff options:

Alter medic-analytics to support a configurable third-party migration directory on the server. Tech leads would upload their scripts to this directory. The code would either listen for changes and execute them, or it would occur on startup. Tech leads would self-manage their scripts in git.
Create an app for each client to contain migrations. This would be a fork of a skeleton project, that boots, check for migrations and runs them, then closes. There would be tests to make sure that these migrations sit correctly on top of medic-analtyics. It would run on medic-os like medic-analytics does, with the same credentials. This would force you to version and check in your schema changes. You would need to store them in medic-projects

alxndrsn · 2016-07-22T11:05:16Z

you can't take back a migration

Depends how they're written, surely?

This would force you to version and check in your schema changes. You would need to store them in medic-projects

👍 to this

SCdF · 2016-07-22T14:39:28Z

you can't take back a migration

Depends how they're written, surely?

Yeah that's true. I guess I was thinking about how you could potentially mess up / confuse the migration tool in the first "dropbox" idea, or a variation where you upload a zip via a UI or something, if you forgot to provide the current migrations / did them in the wrong order / whatever. For the second idea it's not relevant.

garethbowen · 2016-07-24T20:12:26Z

Tech leads need to be able to alter the DB schema per project

This makes me really concerned. It'd be great if we could have a clear line between stuff developers manage and stuff tech leads manage so we don't conflict. For example, can we put project-specific views in a separate db/schema (or whatever the pg way to do that is)?

This would leave tech leads able to run their own migrations in their own way without having to be managed by this codebase.

They may also need to add their own custom indexes to existing tables.

Most of those indexes will probably be of benefit to other projects as well and should be added to the base set. The exception would be if there's a very expensive one for a specific use case.

SCdF · 2016-07-25T09:55:55Z

It'd be great if we could have a clear line between stuff developers manage and stuff tech leads manage so we don't conflict

Does having all tech-lead stuff in an app that has tests that make sure there are no errors when applying their migrations on top of the default migrations? I suppose even then that won't simulate an ongoing system where these migrations could be interleaved and all sorts of oddness.

For example, can we put project-specific views in a separate db/schema (or whatever the pg way to do that is)?

There is a plugin called dblink that allows that, or there is us using schemas correctly (in pg, db -> schema -> table, though schema is "default" if you don't say anything).

Most of those indexes will probably be of benefit to other projects as well and should be added to the base set.

@michaelkohn what are your thoughts on the above quote?

michaelkohn · 2016-07-25T20:00:00Z

I would like to think that there will be a fair amount of re-usability on indexes, though my understanding is that it also depends on what is happening up-stream. We only really have 3 or 4 index-able base objects anyway (the couchdb table which has just one field of the raw json, and a few meta-level matviews).

We can index JSON in the couchdb table, but even for the same use case, my understanding is that the contents and structure of the JSON is dependent on how the TL created the form.... so even if we add an index to some field in the JSON in the couchdb table, if the field name is slightly different or doesn't even exist, I assume the creation of that index would fail (or otherwise be useless).

For the matviews, there are only a few fields in them anyway and they are mostly fully indexed already.

SCdF · 2016-07-25T20:20:14Z

@michaelkohn is there ever stuff like sorting or filtering in clipfolio? That would require special indexes?

michaelkohn · 2016-07-25T20:55:58Z

Not in addition to what we've already built into our queries. We build the queries that KlipFolio executes and any additional filtering or sorting is done client side / from the presentation layer.

SCdF · 2016-07-26T08:23:19Z

@michaelkohn right so to be clear, are you saying klipfolio works by running your view, downloading all data over to its servers, and then any kind of filtering or sorting it deals with?

michaelkohn · 2016-07-26T16:05:11Z

Correct.

KlipFolio executes the query we provide it, downloads the results of that query and then has the ability to sort/filter whatever data was returned by the original query.

So if the query we configure in KlipFolio is this
-> SELECT * FROM contactview_chp WHERE NAME LIKE 'MICHAEL%'

KlipFolio will receive the results of that query and do whatever addition filtering or ordering on that dataset client side, it will not hit the database again.

For most of the datasources, it doesn't actually execute the query interactively (ie when someone clicks a button). It executes the query on a schedule (most are set to refresh every 30 minutes), stores the data and uses that dataset for every dashboard view until the next time it refreshes.

It is possible to run parameterized / interactive queries, but we still configure the WHERE clause for the query. An example is for date ranges... there are some dashboards where users can enter From/To dates. In KF, we would configure the WHERE clause to look something like this...

WHERE meta.reported > '{FromDate}'

Before running the query, KF just replaces that token with the user provided date so PostgreSQL would just see it as

WHERE meta.reported > '2016-05-01' (for example)

SCdF · 2016-07-26T19:23:04Z

OK, so it sounds like we can look into multiple schemas here, something like:

db.couch2pg schema which is only writable by couch2pg, and contains the couchdb table
db.xmlforms schema which contains helper views for xmlforms support, again only writable by couch2pg
db.hlv schema (name is terrible, think of a better one) which is the only place tech leads / humans can write to, which is where high level views go.

Would need to look into how schemas work in pg. Also, this still means we'd want some kind of code that helps tech leads organise their changes, i.e. one of the options in the ticket description.

nice-snek · 2017-04-18T15:41:17Z

hi frand @garethbowen, please triage before the end of this sprint. :parrot_conga::parrot_conga::parrot_conga:

garethbowen · 2017-04-18T20:54:47Z

This is a really good idea and has lots of good information so I'm going to schedule it but not prioritise it.

medic-bot · 2018-01-31T11:22:04Z

Hi @browndav,

This ticket has not been touched in 90 days. Is it still relevant?

(See triaging old issues for more detail)

ghost · 2018-02-22T22:46:01Z

What Gareth said. Relevant; keeping this open.

nice-snek assigned garethbowen Apr 18, 2017

garethbowen removed their assignment Apr 18, 2017

medic-bot assigned ghost Jan 31, 2018

ghost removed their assignment Feb 22, 2018

garethbowen transferred this issue from medic/cht-core Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work out how to automate and formalise custom DB changes in medic-analytics #8

Work out how to automate and formalise custom DB changes in medic-analytics #8

SCdF commented Jul 22, 2016

alxndrsn commented Jul 22, 2016

SCdF commented Jul 22, 2016 •

edited

Loading

garethbowen commented Jul 24, 2016

SCdF commented Jul 25, 2016

michaelkohn commented Jul 25, 2016

SCdF commented Jul 25, 2016

michaelkohn commented Jul 25, 2016

SCdF commented Jul 26, 2016

michaelkohn commented Jul 26, 2016

SCdF commented Jul 26, 2016

nice-snek commented Apr 18, 2017

garethbowen commented Apr 18, 2017

medic-bot commented Jan 31, 2018

ghost commented Feb 22, 2018

Work out how to automate and formalise custom DB changes in medic-analytics #8

Work out how to automate and formalise custom DB changes in medic-analytics #8

Comments

SCdF commented Jul 22, 2016

alxndrsn commented Jul 22, 2016

SCdF commented Jul 22, 2016 • edited Loading

garethbowen commented Jul 24, 2016

SCdF commented Jul 25, 2016

michaelkohn commented Jul 25, 2016

SCdF commented Jul 25, 2016

michaelkohn commented Jul 25, 2016

SCdF commented Jul 26, 2016

michaelkohn commented Jul 26, 2016

SCdF commented Jul 26, 2016

nice-snek commented Apr 18, 2017

garethbowen commented Apr 18, 2017

medic-bot commented Jan 31, 2018

ghost commented Feb 22, 2018

SCdF commented Jul 22, 2016 •

edited

Loading