Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An option to force decimals to convert to numeric instead of character? #92

Open
ajirnyi opened this issue Jun 14, 2022 · 11 comments
Open
Assignees

Comments

@ajirnyi
Copy link

ajirnyi commented Jun 14, 2022

Hi all -- thank you for your great software!

We have been using RJDBC 0.2-8 with MS SQL Server backend; unfortunately we recently ran into issues trying to upgrade to 0.2-10 (off CRAN): in many cases decimal values that were previously converted to numeric are now returned as character. Most likely this is related to this change in 0.2-9: be457e1.

In our particular case we are fine with the potential loss of precision, but since this is a large code base with many DB accesses etc., changing all the queries and SPs to add explicit casts to floats is not currently realistic, so we're stuck with 0.2-8.

Is there an easier way to force decimal values to return as numeric (perhaps with some option setting?), or if not, would you consider that as a feature request?

Thanks! :)

@s-u
Copy link
Owner

s-u commented Jun 14, 2022

Duplicate of #90.

I guess an option to fetch would be possible. My main concern is that we don't want to lose precision inadvertently - that's a lot worse than calling an extra as.numeric on the result. However, if this is an optional parameter then you can tell RJDBC that you're ok with losing precision.

@s-u s-u self-assigned this Jun 27, 2022
@TuomoNieminen
Copy link

TuomoNieminen commented Aug 26, 2022

It would be great to have the option to avoid conversion to character via e.g. options(). I understand why this change was made but it breaks a bunch of production code in our organisation and we are currently unable to update RJDBC. We use Oracle where the default integer is unfortunately numeric(38,0)... :)

@s-u
Copy link
Owner

s-u commented Aug 29, 2022

@TuomoNieminen Interesting, NUMBER(38,0) is a 127-bit integer so presumably 128-bit signed which is way beyond anything any mortal person would be using. One of the problems here is that it's not possible to know ahead of time if the values will be unrepresentable - and once you decide to fetch them as doubles you have no way to tell.

So there are two possible approaches:

  • fetch strings, but perform conversion locally before returning the vector, possibly checking for representation mismatches. One drawback here is that it is inefficient, because it means to handle a lot of strings and even checking representability is not entirely trivial.

  • fetch doubles and don't worry about non-representable numbers. Main problem with this is that any loss of precision will be entirely silent.

In addition, using options() would be really dangerous since you have no idea if someone else sets it for you as it would apply to all databases, but a perhaps better alternative is to have an argument in JDBC or dbConnect which will serve as the default to fetch. Would that work in your setup?

@s-u
Copy link
Owner

s-u commented Aug 29, 2022

@TuomoNieminen I think I have an idea how we can detect the representability depending how the driver stores the results. Can you, please, run the following test on your database:

Connect to it, (attach the rJava package) and run a query using dbSendQuery() which returns some a column of the high-precision floating type and store the query into q, e.g.

q <- dbSendQuery(conn, "SELECT myBigNumber FROM myTable;")

Then run the following code:

q@md$getColumnClassName(1L)

If the result is java.math.BigDecimal then run also the following:

q@jr$`next`()
o = q@jr$getObject(1L)
o$precision()
o$scale()

In theory, if the native representation by the driver is BigDecimal then we can check each value whether it would fit into a native representation in R. If it's already any native types then we know it's safe to use as-is.

(That said, note that you'll make everything way more efficient in Oracle if you use CAST(x as BINARY_DOUBLE) or CAST(x as TT_INTEGER)).

@TuomoNieminen
Copy link

Thanks! We ran the code you suggested and the Oracle "integer's" native representation by the driver is BigDecimal and individual precision and scale could be read with your code.

@Stivo
Copy link

Stivo commented Dec 20, 2022

We are hit by this big time as well. We have a large application and updating the code to convert every single column from decimal to double is not feasible. We could convert our whole database schema to only contain double columns, but we do have some joins on some decimal columns (mostly latitude and longitude, which could stay decimal because the precision is less than 15) and those would work slightly worse if we had double columns.
On the other hand, having an option to lose some of the precision (after all we are talking about 15 digits where it starts diverging due to the float mantissa) is very acceptable to us, in the end we show the user only the first significant digits anyway. This prevents us from using the latest library which prevents us from using mran checkpoint as well. So we have wasted days already trying to find a good solution for us. The option would solve all of that.

@s-u
Copy link
Owner

s-u commented Dec 21, 2022

@Stivo Can you please answer my question above, and paste the output here (the precision and size calls)? I never got an answer so I couldn't do what I proposed (check if the precision fits and convert if it does) since I don't have access to an Oracle database.

@Stivo
Copy link

Stivo commented Dec 22, 2022

We are not working with oracle. I can tell you that our decimal types in SQL server are at maximum (38, 2) or (38, 37) and what the current behaviour is and the old one.
New behaviour:

  • Decimal(16, 2) (no matter whether it fits or not) is converted to character
  • Decimal(15, 2) is converted to double

Old behaviour:

  • All decimal columns are interpreted as double. With really large numbers and with some fractional numbers this does lead to a loss of precision (so the errors start appear at 1 for a value of around a quadrillion). But doubles obviously can cover up to
    1.7E +/- 308 (15 digits), so even if you have a decimal with 128 digits and 0 digits after the decimal point, you can still represent it as double and will only see errors in the 0.0000000000001% range.

@Stivo
Copy link

Stivo commented Dec 22, 2022

This of course also happens in all versions and with decimals with a precision as low as 2. See this example:

conn <- poolCheckout(readonly_pool)
result <- dbGetQuery(conn,"select convert(decimal(2,1), 0.1) as result")
poolReturn(conn)
print(format(result, nsmall = 20 ))

Prints this:
result
1 0.10000000000000000555

This is just not avoidable when converting from bigdecimal to double. So I really don't see a reason why you do allow this for small values but as soon as the column type becomes larger than precision 15 you say this is not exact enough.

@s-u
Copy link
Owner

s-u commented Dec 22, 2022

@Stivo I think you may be misunderstanding the issue - the conversion is not lossy at small values, because the declared precision is maintained, i.e., with 15 digits precision the IEEE FP representation is exact:

> print(0.1, digits=15)
[1] 0.1

Anything beyond that is not, because it exceeds the precision that an IEEE double can represent in general (because of the 53-bit significand). Similarly, printing anything beyond 15 digits is meaningless as you illustrated:

> print(0.1, digits=20)
[1] 0.10000000000000000555

because the representation does not store those digits (for most numbers). However, this means that converting between decimal and FP double representation is guaranteed to be lossless up to 15 digits, but not beyond. That's where the the rule comes from.


But back to the above question - if you want to allow lossy conversion you have now two options:

  • in any RJDBC version you can set automatic conversion of numeric types:
    RJDBC:::dbSetTypeMaps(NUMERIC=function(x) as.numeric(x))
    it will force the conversion unconditionally regardless of the the precision even if the DBMS sent it as strings.
  • in RDJBC 0.3-0 or higher you can add lossy=TRUE to fetch(), dbGetQuery() etc. which will fetch the values as doubles from JDBC. This is subtly different, because the conversion is done by Java. You can also set the fetch.lossy=TRUE option to the connection or the driver so it becomes the default for all connections.

@RussellPolitzky
Copy link

I ran into this problem with DECIMAL(19,8) and found that setting the numeric type map didn’t work in my case, when using 0.2-10. A little debugging showed that this is because when using JTDS with SQL server, reading DECIMAL(19,8), I found that the type returned in the meta data is decimal, rather than numeric. As such, adding a DECIMAL type map solved the problem. For good measure, I also added a type map for NUMERIC, as per above.

RJDBC:::dbSetTypeMaps(NUMERIC=function(x) as.numeric(x))
RJDBC:::dbSetTypeMaps(DECIMAL=function(x) as.numeric(x))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants