-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema DSL for testing #566
Comments
This should include a schema equality utility too |
We could certainly replicate Arrow C++'s syntax here, although I am hesitant to add scope to nanoarrow or make it seem like we are trying to replace anything about Arrow C++.
We have a few places that do something like this...for integration testing we have one that is slow (and somewhat specific to the types of schemas that show up in the integration testing) but generates a nice diff: arrow-nanoarrow/src/nanoarrow/integration/c_data_integration.cc Lines 151 to 162 in 2040e74
...and in Python we have one (that should almost certainly be written in C) that performs the check but doesn't generate very useful output on failure: arrow-nanoarrow/python/src/nanoarrow/_schema.pyx Lines 349 to 402 in 2040e74
Both of those are pretty specific to exactly what we needed them for. |
I sent this to you offline as well but I'll post here too! For generating integration test JSON we had a similar situation to serializing IPC schemas and went with a helper function plus a lambda to generate the full range of data types: arrow-nanoarrow/src/nanoarrow/testing/testing_test.cc Lines 496 to 704 in 2040e74
A similar example using Arrow C++ that would be nice to replace: arrow-nanoarrow/src/nanoarrow/ipc/decoder_test.cc Lines 671 to 716 in 2040e74
|
If we keep it minimal and closely aligned with the ABI, 100-200 lines would suffice for: using namespace nanoarrow::testing::dsl;
// declare a schema (default format is +s)
UniqueSchema s = schema{
// we can make the arguments look kwarg-like
children{
{"i", "my int field's name"},
{"i", dictionary{{"u"}}, "my dictionary field's name",
metadata{
"some_key=some_value",
"some_key2=some_value2",
},
ARROW_FLAG_NULLABLE},
}
}; |
I like the idea of putting it in testing (it can move if it becomes popular). Replacing the usage in the Testing JSON generator would probably get you all the unit tests for free! |
In searching for Array equality utilities, I found that ADBC's validation utility also has a way to create schemas using nanoarrow for use in testing! |
Arrow C++ includes factories for constructing schemas, types, fields, and metadata which allow construction of even deeply nested structures to be expressive:
It should be straightforward to write equivalent factories which build a
nanoarrow::UniqueSchema
.The text was updated successfully, but these errors were encountered: