Skip to content

Conversation

@sylwiaszunejko
Copy link

@sylwiaszunejko sylwiaszunejko commented Mar 25, 2024

In Python datetime.datetime type year has to be in range [MINYEAR, MAXYEAR]. This range is not the same as possible timestamps in scylla. Previously if timestamp was outside this range it made driver raise an Exception. It was not correct behavior. There was a work around implemented in cqlsh.

This PR introduces a Datetime type to accommodate ranges outside datetime.[MIN|MAX]YEAR. For Datetimes that cannot be represented as a datetime.datetime (because datetime.MINYEAR, datetime.MAXYEAR), this type falls back to printing milliseconds_from_epoch offset.

Something similar was introduced before for datetime.date type - scylladb@4f3c77c.

This is for sure a breaking change, because of change of the return type. The question is if it is worth to break compatibility for fixing this bug. As far as I know it was not reported by any users, in scylladb we encounter it on two occasions in tests.

This PR is based on scylladb#310 - in scylladb we are considering different approaches to this problem and it would be helpful to know your opinion on that.

This is the original issue: https://datastax-oss.atlassian.net/browse/PYTHON-441. I've left a comment there to reopen the discussion.

In Python `datetime.datetime` type year has to be in range
[MINYEAR, MAXYEAR]. This range is not the same as possible timestamps
in scylla. Previously if timestamp was outside this range it made
driver raise an Exception. It was not correct behavior. There was a
work around implemented in cqlsh.

This commit introduces a `Datetime` type to accommodate ranges outside
datetime.[MIN|MAX]YEAR. For Datetimes that cannot be represented as
a datetime.datetime (because datetime.MINYEAR, datetime.MAXYEAR),
this type falls back to printing milliseconds_from_epoch offset.
@bschoening
Copy link
Contributor

bschoening commented Jun 7, 2025

Deviating from using Python's datetime would seem unwise and make the result incompatible with other libraries. Python's MINYEAR is 1 and MAXYEAR is 9999, which aligns with ISO 8601's four digit year, which supports the same range.

@aholmberg's comment in CASSANDRA-10625 still seems relevant: Changing the return type for the driver is not tenable. What to do with out-of-range values seems application-specific to me, so it didn't seem appropriate to put the optional workaround in the driver.

Python datetime.MINYEAR

@absurdfarce
Copy link
Contributor

Thanks for the PR @sylwiaszunejko! I need to read through this PR (as well as @bschoening 's feedback above) in more detail but on a surface read I'm inclined to agree with Brad's concerns. If we're talking about changing Python return types for one or more CQL types that's something we need to be very careful about.

Have you signed the Contributor License Agreement for contributions to DataStax open source projects? If not you can find it at https://cla.datastax.com/. Thanks!

@absurdfarce
Copy link
Contributor

Apologies, finally had a chance to get back to this. I very much agree with the comments from @bschoening above; changing the returned Python type for a given CQL type is a significant change that probably would require a major version bump. We had a similar conversation for #1175 and I think the same reasoning applies here.

If there were some strongly compelling need to make such a change that would be one thing, but we don't know of anyone experiencing a problem with these limits and from the conversation on the referenced Scylla PR it sounds like there aren't any there either. Lacking any kind of compelling need the potential harm of changing the return type seems to far outweigh any benefits so my inclination is to say we can't do this as it stands.

Two thoughts occur to me on this point that might be relevant in the future @sylwiaszunejko:

  • Duck typing actually helps you here. If the new type implemented all of datetime's methods and attributes this might be more compelling. Type checks might be a problem (something I think @fruch referenced in the original PR) but we could at least have a conversation around that.

  • The Java driver implements type serialization via the codec interface, one of the benefits of which is that users can provide custom codecs to do customized type conversions. I'm certainly not proposing we move all serialization in the Python driver to codecs but some generalized notion of being able to provide custom serde functions for a given type might indeed be useful. I would argue something like that is the right way to implement this functionality.

@fruch
Copy link

fruch commented Sep 18, 2025

Thanks @absurdfarce, more or less what I assumed the answer would be

@sylwiaszunejko
Copy link
Author

Apologies, finally had a chance to get back to this. I very much agree with the comments from @bschoening above; changing the returned Python type for a given CQL type is a significant change that probably would require a major version bump. We had a similar conversation for #1175 and I think the same reasoning applies here.

If there were some strongly compelling need to make such a change that would be one thing, but we don't know of anyone experiencing a problem with these limits and from the conversation on the referenced Scylla PR it sounds like there aren't any there either. Lacking any kind of compelling need the potential harm of changing the return type seems to far outweigh any benefits so my inclination is to say we can't do this as it stands.

Two thoughts occur to me on this point that might be relevant in the future @sylwiaszunejko:

* Duck typing actually helps you here.  If the new type implemented all of datetime's methods and attributes this _might_ be more compelling.  Type checks might be a problem (something I think @fruch referenced in the original PR) but we could at least have a conversation around that.

* The Java driver implements type serialization via the codec interface, one of the benefits of which is that users [can provide custom codecs](https://apache.github.io/cassandra-java-driver/4.19.0/core/custom_codecs/) to do customized type conversions.  I'm certainly not proposing we move all serialization in the Python driver to codecs but some generalized notion of being able to provide custom serde functions for a given type might indeed be useful.  I would argue something like that is the right way to implement this functionality.

Thanks for the answer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants