Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for time conversions in tools package #2341

Open
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

markcampanelli
Copy link
Contributor

@markcampanelli markcampanelli commented Dec 23, 2024

  • Closes Add tests for time conversions in tools package #2340
  • I am familiar with the contributing guidelines
  • Tests added
  • Updates entries in docs/sphinx/source/reference for API changes.
  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

@markcampanelli markcampanelli changed the title [DRAFT] Add tests for time conversions in tools package Add tests for time conversions in tools package Dec 23, 2024
@markcampanelli
Copy link
Contributor Author

I realize no one planned/asked for this, but I think it's a good addition and ready to be reviewed by a maintainer. Increases test coverage by a whopping 0.07%!

@kandersolar kandersolar added this to the v0.11.3 milestone Dec 23, 2024
Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @markcampanelli

The tests expose a weakness in localize_to_utc: what happens if location has a timezone defined as an offset? That is permissible, and if done produces an unhelpful message:

    if zone.upper() == 'UTC':

AttributeError: 'int' object has no attribute 'upper'

'input,expected',
[
(
{
Copy link
Member

@cwhanse cwhanse Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a dict here allows the function call to be double-splatted (**input) and may make it easier to "see" each tested set of inputs and output. It's not the style used for other tests are parameterized.

I'm interested in feedback from others: is this parameterization style easier or more difficult, as a reviewer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents: a more compact representation of the changing values is helpful for reviewing, and is worth a bit of processing inside the test function itself. For example, it seems like the following:

(
    {
        "time": pd.DatetimeIndex(
            ["1974-06-22T18:30:15"],
            tz=ZoneInfo("Etc/GMT+5"),
        ),
        "location": location.Location(
            43.19262774396091, -77.58782907414867, tz="Etc/GMT+5"
        )
    },
    pd.DatetimeIndex(["1974-06-22T23:30:15"], tz=ZoneInfo("UTC")),
),
(
    {
        "time": pd.DatetimeIndex(["1974-06-22T18:30:15"]),
        "location": location.Location(
            43.19262774396091, -77.58782907414867, tz="Etc/GMT+5"
        )
    },
    pd.DatetimeIndex(["1974-06-22T23:30:15"], tz=ZoneInfo("UTC")),
),

could be replaced with something like this:

("1974-06-22T18:30:15", "Etc/GMT+5", 43, -77, "Etc/GMT+5", "1974-06-22T23:30:15+00:00"),
("1974-06-22T18:30:15", None, 43, -77, "Etc/GMT+5", "1974-06-22T23:30:15+00:00"),

And the test function could then contain the necessary calls to Location and pd.DatetimeIndex to construct the parameters themselves.

I think the latter format makes it easier to see what changes between each set of parameters. I found my eyes having to flick back and forth and subsequently getting lost with the former format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end I think this was simplified to

@pytest.mark.parametrize(
    'tz,tz_expected', [
        pytest.param('UTC', 'UTC'),
        pytest.param('Etc/GMT+5', 'Etc/GMT+5'),
        pytest.param('US/Mountain', 'US/Mountain'),
        pytest.param('America/Phoenix', 'America/Phoenix'),
        pytest.param('Asia/Kathmandu', 'Asia/Kathmandu'),
        pytest.param('Asia/Yangon', 'Asia/Yangon'),
        pytest.param(datetime.timezone.utc, 'UTC'),
        pytest.param(zoneinfo.ZoneInfo('Etc/GMT-7'), 'Etc/GMT-7'),
        pytest.param(pytz.timezone('US/Arizona'), 'US/Arizona'),
        pytest.param(-6, 'Etc/GMT+6'),
        pytest.param(-11.0, 'Etc/GMT+11'),
        pytest.param(12, 'Etc/GMT-12'),
    ],
)
...

@markcampanelli
Copy link
Contributor Author

markcampanelli commented Dec 23, 2024

Thanks @markcampanelli

The tests expose a weakness in localize_to_utc: what happens if location has a timezone defined as an offset? That is permissible, and if done produces an unhelpful message:

    if zone.upper() == 'UTC':

AttributeError: 'int' object has no attribute 'upper'

Good catch! Lemme see what I might do about fixing that here (if it's not too much scope creep).

BTW: I'm pretty sure this is the sort of issue that I solved (hopefully!) by simplifying the timezone representation in pvlib.location.Location in #2343

@markcampanelli
Copy link
Contributor Author

@cwhanse @kandersolar I addressed the bug Cliff spotted, and updated a Location test for that.

Due to the high repetition, I decided not to use test parameterization at all for the new tools tests. Same coverage.

Should be ready for another look.

pvlib/tools.py Outdated

Parameters
----------
time : datetime.datetime, pandas.DatetimeIndex,
or pandas.Series/DataFrame with a DatetimeIndex.
location : pvlib.Location object
location : pvlib.Location object (unused if time is localized)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
location : pvlib.Location object (unused if time is localized)
location : pvlib.Location object (unused if ``time`` is localized)

elif isinstance(tz, (int, float)):
self.tz = tz
self.pytz = pytz.FixedOffset(tz*60)
self.tz = f"Etc/GMT{int(-tz):+d}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to ask to revert this change. The docstring says tz can be a float. There are places that have a non-integer offset from UTC, e.g., the Marquesas are UTC−09:30UTC−09:30. localize_to_utc is changed in this PR to examine Location.pytz rather than Location.tz, so the error we discussed won't occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a point in the Location docstring that says this:

Location objects have two timezone attributes:
    * ``tz`` is a IANA timezone string.
    * ``pytz`` is a pytz timezone object.

While I appreciate my changes may break some things, I think this existing docstring suggests a much more sane approach. (Indeed, in the sequel using zoneinfo, I propose using one, and only one, internal representation of the timezone, instead of two, potentially with helper functions s to update this internal representation using int/float offsets, etc..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying revert from storing a string, but revert the int portion. If tz is truncated to an integer, how are timezones such as UTC+05:30 represented?

I'm open to doing away with accepting int or float for tz.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwhanse Sorry I misunderstood your meaning. I was aware that some locations are on half-hour offsets and such, but what is unclear to me is if/how that is represented in pytz (or zoneinfo for that matter). I do not see any fractional offsets in zoneinfo.available_timezones() and this also does not work for me: pytz.timezone("Etc/GMT+5.5").

I must be missing something here. Does something non-standard have to be done to support fractional offsets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to doing away with accepting int or float for tz.

Hi again @cwhanse. It looks like the standard "non-integral-hour" offsets are only supported in the IANA standard through place names such as "Asia/Kathmandu" and "Asia/Yangon", but not through Etc/GMT+X.Y style offsets.

Given that, I think the most trouble-free thing to do is to remove the int and float options and be sure to discuss this issue in the documentation tutorial. (Also, the only supported tz strings would be the IANA ones.) Sounds like you are amenable to this.

Keeping both fields tz (only as a string) and pytz (as a pytz timezone type), then I think I should add setters that change one when the other is changed after the Location object has already been already initialized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most trouble-free thing to do is to remove the int and float options

I'm in favor of this change. @pvlib/pvlib-maintainer @pvlib/pvlib-triage your opinions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TMY3 files represent timezone in numeric form, so this change may complicate some workflows. Outside of that, I never use this attribute (and don't know why I would), so I have no opinion either.

Remember that we usually have a deprecation period before outright removal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some more investigating on this yesterday, and I think I see an implementation that would better support the existing int and float offsets. However, this also raised some new questions about the existing implementation. I will try to push some concrete code here by end of day for people to consider, along with the new questions raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I have added support back for int and non-fractional float being passed to tz in the Location constructor. I also test that pytz and zoneinfo timezones are likewise supported for initialization, and I beefed up the tests even more.

Note that the pytz Location attribute is now the single source of truth for the time zone in a Location object (which I recommend updating to a standard-library timezone.TimeZone object in #2342). self.tz simply returns the stringified self.pytz, but one can still directly set self.tz with any tz value supported by the Location intializer. Also, the tz and pytz attributes are kept in sync now, as changing one updates the other.

Ready for another review.

@markcampanelli
Copy link
Contributor Author

The two test failures appear to be unrelated to my changes.

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should switch and regard tz as the source of truth, and pytz as the optional attribute? With an eye toward replacing pytz with python's timezone.

from the pytz and zoneinfo packages), default 'UTC'.
See http://en.wikipedia.org/wiki/List_of_tz_database_time_zones for a
list of valid name strings for IANA time zones.
ints and floats must be non-fractional N-hour offsets from UTC, which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ints and floats must be non-fractional N-hour offsets from UTC, which
ints and floats must be whole number hour offsets from UTC, which

Comment on lines 102 to 103
"floating point tz does not have zero fractional part: "
f"{tz_}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"floating point tz does not have zero fractional part: "
f"{tz_}"
"floating point tz has non-zero fractional part: "
f"{tz_}. Only whole number offsets are supported."

else:
raise TypeError(
f"invalid tz specification: {tz_}, must be an IANA time zone "
"string, a non-fractional int/float UTC offset, or a "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"string, a non-fractional int/float UTC offset, or a "
"string, a whole number int/float UTC offset, or a "

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markcampanelli let's keep this PR to the scope described in the title. Importing and adding the zoneinfo attribute feels like half of the journey to shifting from pytz to zoneinfo and I think that should be done in its own PR.

Same for fixing asv - do that in a separate PR.

@@ -26,5 +35,4 @@ Requirements

Contributors
~~~~~~~~~~~~


* Mark Campanellli (:ghuser:`markcampanelli`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Mark Campanellli (:ghuser:`markcampanelli`)
* Mark Campanelli (:ghuser:`markcampanelli`)

@markcampanelli
Copy link
Contributor Author

@pvlib/pvlib-maintainer @pvlib/pvlib-triage Can someone review my asv CI fix here?

If this looks ok, then should I move the fix to a separate PR for better visibility?

@markcampanelli
Copy link
Contributor Author

@pvlib/pvlib-maintainer @pvlib/pvlib-triage Can someone review my asv CI fix here?

If this looks ok, then should I move the fix to a separate PR for better visibility?

Moved to #2352, which now should go in before this.

pvlib/tools.py Outdated
@@ -119,21 +120,21 @@ def atand(number):

def localize_to_utc(time, location):
"""
Converts or localizes a time series to UTC.
Converts time to UTC, localizing if necessary using location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Converts time to UTC, localizing if necessary using location.
Converts ``time`` to UTC, localizing if necessary using location.

@markcampanelli
Copy link
Contributor Author

markcampanelli commented Jan 28, 2025

I'd like to say that I'm done with this PR and it's good to go content wise. I will do my best to try to check the generated API documentation for any formatting issues.

To clarify where things ended up with fixing the issues found in Location timezones:

  1. Internally, a Location object's timezone is stored "psuedo-privately" as a zoneinfo.ZoneInfo object. (Now a single source of truth.)
  2. The Location class's initializer interface should be compatible with existing code.
  3. The tz attribute is now always an IANA-suppored string (as advertised) that can be updated safely after Location object initialization.
  4. The pytz attribute is still supported for backwards compatibility with code that uses pytz timezones, but it is read only. One must set the tz attribute to change a Location's timezone, and setting this attribute still accepts a pytz-style timezone (a consistent tz-setter interface as the object's initializer). However, tz will always read out as a string.

@markcampanelli
Copy link
Contributor Author

I see that the documentation for the PR's code is generated in CI. Nice touch! Checking it now...

@markcampanelli
Copy link
Contributor Author

@pvlib/pvlib-maintainer Not sure if anyone knows how to document all the attributes in an object, not just those with a getter and setter as is apparently happening with Location here.

pvlib/location.py Outdated Show resolved Hide resolved
Comment on lines 17 to 21
* Ensure proper tz and pytz types in pvlib.location.Location. tz becomes the
single source of time-zone truth, is the single time-zone setter interface,
and the getter consistently returns an IANA string. datetime.tzinfo
subclasses are fully supported in tz setter. pytz attribute becomes read
only. (:issue:`2340`, :pull:`2341`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Ensure proper tz and pytz types in pvlib.location.Location. tz becomes the
single source of time-zone truth, is the single time-zone setter interface,
and the getter consistently returns an IANA string. datetime.tzinfo
subclasses are fully supported in tz setter. pytz attribute becomes read
only. (:issue:`2340`, :pull:`2341`)
* Ensure proper tz and pytz types in pvlib.location.Location.

Ensuring types is a bug fix. The remainder describes an API change that should be announced elsewhere. Not really an enhancement, so maybe we put the rest of this note in the "Breaking changes" section, because one can no longer set a value for .pytz.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwhanse Lmk if my whatsnew change for this looks ok.

Comment on lines +45 to +51
* To ensure that the time zone in pvlib.location.Location remains internally
consistent if/when the time zone is updated, the tz attribute becomes the
single source of time-zone truth, is the single time-zone setter interface,
and its getter consistently returns an IANA string. datetime.tzinfo
subclasses (including time zones from pytz and zoneinfo) are still fully
supported in the tz setter, but the pytz attribute becomes read only.
(:issue:`2340`, :pull:`2341`)
Copy link
Member

@cwhanse cwhanse Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* To ensure that the time zone in pvlib.location.Location remains internally
consistent if/when the time zone is updated, the tz attribute becomes the
single source of time-zone truth, is the single time-zone setter interface,
and its getter consistently returns an IANA string. datetime.tzinfo
subclasses (including time zones from pytz and zoneinfo) are still fully
supported in the tz setter, but the pytz attribute becomes read only.
(:issue:`2340`, :pull:`2341`)
* The pytz attribute of pvlib.location.Location is now read only. Use the tz attribute
to record time zone. The pytz attribute is internally updated if the tz attribute
is changed.
* The tz attribute of pvlib.location.Location now returns an IANA string.
(:issue:`2340`, :pull:`2341`)

I think the focus of this note should be on how the changes will affect user code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tests for time conversions in tools package
4 participants