Skip to content

Accept more numeric types in Python modules #1110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

wenzeslaus
Copy link
Member

@wenzeslaus wenzeslaus commented Nov 22, 2020

There is more numeric types in SQL than just INTEGER and DOUBLE PRECISION, for example, typical one in SQLite is REAL, so db.univar (and v.db.univar) need to accept those too as numeric types. Currently, db.univar errors when column has type REAL, although it should just work. This PR fixes that.

v.db.update allows user to pass invalid numbers because it does apply quoting even for some numeric types such as REAL. This causes zeros being recorded in the table without any warning with SQLite backend. The new version in this PR skips quoting for more types and causes SQL syntax error when the value provided is not a number (e.g. when user puts an expression into value instead of query_column).

v.dissolve does not seem to have issues with SQLite unlike the other two (three), but it seems to me that a similar fix is needed. I reused the functions created for the other cases, but the v.dissolve case is little more complex since string types are involved too. If someone can test this with PostgreSQL or comment on it, that would be great. The question is if v.dissolve accepts all data types it should accept? Maybe the best course of action is to solely rely on v.reclass for this test (and an informative error message!) or drop v.dissolve from this PR completely. Let me know what you think.

Anyway, the functionality needed for db.univar and v.db.update seems general enough, so I added the utility functions to the library (grass.script.db). (In fact, the initial implementation of this is in v.db.pyupdate, so that's another use for them.) An alternative would be to somehow mimic the db_sqltype_to_Ctype() from the C library or somehow wrap that in Python, but I'm not sure about the cost/benefit ratio there.

There is some more info in the individual commit messages.

There is more numeric types in SQL than just INTEGER and DOUBLE PRECISION,
for example, typical one in SQLite is REAL.
@wenzeslaus wenzeslaus changed the title db.update and v.db.update: Accept more numeric types db.univar and v.db.univar: Accept more numeric types Nov 22, 2020
Several modules in scripts need these tests, so add them to the library.
Assuming that the intention was to disallow floats and allow int and string which
have a clear equality comparison. Now allowing anything which is not float
assuming it will work later on in v.reclass where types generated by db_sqltype_to_Ctype()
are used. (These types are the simplified C-like equivalents of SQL types
which distinguish only str, int, double, and datetime.) Consequently, datetime columns
will fail in v.reclass and not in this condition which is suboptimal, but allows more
column types to work with v.dissolve.
… the column type is SQLite REAL, so having a readble error message is now more important. Simply exit and rely on the error message from db.execute.
… nicer error message), improve how the error message is represented in v.dissolve in case v.reclass fails. It is not as good as the one in the check done ahead of time, but the text Final extraction... is not relevant for v.reclass, so simply end with message from there assuming that the most common cases of failure will produce messages somewhat relevant to the user as is the case for the DATETIME column type. Leaving the additional error message for v.extract, but removing the traceback which is unhelpful to the end user (and does not tell much anyway in this case).
@wenzeslaus wenzeslaus changed the title db.univar and v.db.univar: Accept more numeric types Accept more numeric types in Python modules Nov 22, 2020
@wenzeslaus wenzeslaus requested a review from marisn November 22, 2020 05:28
Copy link
Contributor

@marisn marisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly codebase will be in a better shape after merging this PR. It is impossible to fix all code issues at once thus 👍

]


_SQL_FLOAT_TYPES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about NUMERIC and DECIMAL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db_sqltype_to_Ctype() function returns DB_C_TYPE_STRING for NUMERIC and DECIMAL. That's implicit there. They are handled explicitly at other places in the code.

The Affinity Name Examples for SQLite has NUMERIC and DECIMAL(a,b) under (its own) NUMERIC together with BOOLEAN and DATETIME, not under REAL. NUMERIC in SQLite seems different than in PostgreSQL where NUMERIC has the same definition as DECIMAL. (I don't know if or how that's handled in the GRASS db drivers.)

Conclusion: I don't know what to make up of that, but perhaps text representation is a better fit for DECIMAL floating point number (1) (you are using DECIMAL after all). So, I would leave it as is.

Note 1: That seems to be what SQLite driver is doing in create_table.c and fetch.c, however PostgreSQL and MySQL drivers in create_table.c change DECIMAL and NUMERIC to DOUBLE.

@wenzeslaus
Copy link
Member Author

Well, the two usages we have now are Should I quote it? and Can I use it for statistics?.

The function name is now sql_type_is_numeric(), but perhaps there should be two, sql_type_needs_quotes() (same as the current sql_type_is_numeric(), just opposite) and sql_type_is_number() (like the current one, but with a more neural name and may include DECIMAL/NUMERIC).

The thing is that the C modules do their own test and if the Python module is just doing some checking before running the C module as opposed to doing the computation in Python or sending a query to the database, the test needs to fit the C module. This suggests there is even more context than just quoting versus computing. This brings back the idea of getting result of db_sqltype_to_Ctype() into Python, perhaps as a column type with some v.info -c alternative.

I think the current approach is that DECIMAL needs special treatment which is not provided outside of the actual database, so if users wants to do computation, the right course of action is to convert that to DOUBLE and than do the computation (and than convert back to DECIMAL if needed).

@wenzeslaus wenzeslaus added this to the 8.2.0 milestone Dec 2, 2021
@wenzeslaus wenzeslaus modified the milestones: 8.2.0, 8.4.0 Feb 27, 2022
@wenzeslaus wenzeslaus added the enhancement New feature or request label Jul 26, 2022
@wenzeslaus wenzeslaus modified the milestones: 8.3.0, 8.4.0 Feb 10, 2023
@wenzeslaus wenzeslaus modified the milestones: 8.4.0, 8.5.0 Apr 26, 2024
@echoix echoix added the conflicts/needs rebase Rebase to or merge with the latest base branch is needed label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conflicts/needs rebase Rebase to or merge with the latest base branch is needed enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants