Skip to content

Commit

Permalink
edit
Browse files Browse the repository at this point in the history
  • Loading branch information
JohnMount committed Sep 12, 2023
1 parent 71d2589 commit 96d525d
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions Examples/data_schema/schema_check.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"However, a common missing component remains: a general \"Pythonic\" [data schema](https://en.wikipedia.org/wiki/Database_schema) definition, documentation, and invariant enforcement mechanism.\n",
"\n",
"It turns out it is quite simple to add such functionality using Python decorators. This isn't particularly useful for general functions (such as `pd.merge()`), where the function is supposed to support arbitrary data schemas. However, it can be *very* useful in adding checks and safety to specific applications and analysis workflows built on top such generic functions. In fact, it is a good way to copy schema details from external data sources such as databases or CSV into enforced application invariants. Application code that transforms fixed tables into expected exported results can benefit greatly from schema documentation and enforcement.\n",
"It turns out it is quite simple to add such functionality using Python decorators. This isn't particularly useful for general functions (such as `pd.merge()`), where the function is supposed to support arbitrary data schemas. However, it can be *very* useful in adding checks and safety to specific applications and analysis workflows built on top such generic functions. In fact, it is a good way to copy schema details from external data sources such as databases or CSV into enforced application invariants. Application code that transforms fixed tables into expected exported results can benefit greatly from such schema documentation and enforcement.\n",
"\n",
"I propose the following simple check criteria for both function signatures and data frames that applies to both inputs and outputs:\n",
"\n",
Expand Down Expand Up @@ -123,7 +123,7 @@
"source": [
"The decorator defines the types schemas of at least a subset of positional and named arguments. Declarations are either values (converted to Python types), Python types, or sets of types. A special case is dictionaries, which specify a subset of the column structure of function signatures or data frames. \"return_spec\" is reserved to name the return schema of the function.\n",
"\n",
"We are deliberately concentrating on data frames, and not the inspection of arbitrary composite Python types. This is because we what to enforce data frame or table schemas, and not inflict an arbitrary runtime type system on Python. Schemas over atomic types is remains a sweet spot for data definitions.\n",
"We are deliberately concentrating on data frames, and not the inspection of arbitrary composite Python types. This is because we what to enforce data frame or table schemas, and not inflict an arbitrary runtime type system on Python. Schemas over tables of atomic types is remains a sweet spot for data definitions.\n",
"\n",
"Our decorator documentation declares that `fn()` expects at least:\n",
"\n",
Expand Down Expand Up @@ -208,7 +208,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Or, and this is where we start to get benefits, we can call with a wrong argument type."
"Or, and this is where we start to see benefits, we can call with a wrong argument type."
]
},
{
Expand Down Expand Up @@ -241,7 +241,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"And we show that this checking pushes down into the structure of data frame arguments! In our next example we see the argument is missing a required column.\n"
"And, we show that this checking pushes down into the structure of data frame arguments! In our next example we see an argument is missing a required column.\n"
]
},
{
Expand Down Expand Up @@ -590,7 +590,7 @@
"\n",
"A downside is, the technique *can* run into what I call \"the first rule of meta-programming\". Meta-programming only works as long as it doesn't run into other meta-programming (also called the \"its only funny when I do it\" theorem). That being said, I feel these decorators can be very valuable in Python data science projects.\n",
"\n",
"This documentation and demo can be found [here](https://github.com/WinVector/data_algebra/tree/main/Examples/data_schema).\n"
"This documentation and demo can be found [here](https://github.com/WinVector/data_algebra/tree/main/Examples/data_schema)."
]
},
{
Expand Down Expand Up @@ -749,7 +749,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In conclusion: the `SchemaCheck` decoration is simple and effective tool to add schema documentation and enforcement to your analytics projects."
"In conclusion: the `SchemaCheck` decoration is a simple and effective tool to add schema documentation and enforcement to your analytics projects."
]
},
{
Expand Down

0 comments on commit 96d525d

Please sign in to comment.