Skip to content

BUG: Series.dt.tz_localize not recognizing DST transition with zoneinfo timezone when ambiguous="infer" #48442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
jamesdow21 opened this issue Sep 7, 2022 · 4 comments · Fixed by #49700
Closed
3 tasks done
Labels
Bug Timezones Timezone data dtype
Milestone

Comments

@jamesdow21
Copy link

jamesdow21 commented Sep 7, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pytz
from zoneinfo import ZoneInfo
from datetime import datetime, timedelta

dts = [datetime(2022, 11, 6, 1, 0) + timedelta(minutes = 15*i) for i in range(4)]*2 + [datetime(2022, 11, 6, 2, 0) + timedelta(minutes = 15*i) for i in range(2)]

s = pd.Series(dts)

zoneinfo_infer = s.dt.tz_localize(ZoneInfo('US/Eastern'), ambiguous = 'infer')
print('Result of `tz_localize` with zoneinfo: (all ambiguous times showing UTC -4:00')
print(zoneinfo_infer)
print('-'*80)

pytz_infer = s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer')
print('Result of `tz_localize` with pytz: (correctly showing DST transition')
print(pytz_infer)


# should raise AmbiguousTimeError, but doesn't
s.dt.tz_localize(ZoneInfo('US/Eastern'))

# does raise AmbiguousTimeError
s.dt.tz_localize(pytz.timezone('US/Eastern'))

Issue Description

On 1.5.0rc0

Series.dt.tz_localize using a zoneinfo.Zoneinfo timezone does not correctly recognize a daylight savings time transition

Looks like the cause of this is:

Expected Behavior

Usage of zoneinfo should match that of pytz or dateutil

Installed Versions

INSTALLED VERSIONS

commit : 224458e
python : 3.10.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.5.0rc0
numpy : 1.23.1
pytz : 2022.1
dateutil : 2.8.2
setuptools : 63.4.1
pip : 22.2.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : 1.0.2
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : 2022.1

@jamesdow21 jamesdow21 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 7, 2022
@jamesdow21
Copy link
Author

jamesdow21 commented Sep 7, 2022

In a weird wrinkle, localizing to pytz first and then converting to zoneinfo causes the series to display wrong but have the correct actual values

In [60]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer')
Out[60]:
0   2022-11-06 01:00:00-04:00
1   2022-11-06 01:15:00-04:00
2   2022-11-06 01:30:00-04:00
3   2022-11-06 01:45:00-04:00
4   2022-11-06 01:00:00-05:00
5   2022-11-06 01:15:00-05:00
6   2022-11-06 01:30:00-05:00
7   2022-11-06 01:45:00-05:00
8   2022-11-06 02:00:00-05:00
9   2022-11-06 02:15:00-05:00
dtype: datetime64[ns, US/Eastern]

In [61]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer')[6]
Out[61]: Timestamp('2022-11-06 01:30:00-0500', tz='US/Eastern')

In [62]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer').dt.tz_convert(ZoneInfo('US/Eastern'))
Out[62]:
0   2022-11-06 01:00:00-04:00
1   2022-11-06 01:15:00-04:00
2   2022-11-06 01:30:00-04:00
3   2022-11-06 01:45:00-04:00
4   2022-11-06 01:00:00-04:00
5   2022-11-06 01:15:00-04:00
6   2022-11-06 01:30:00-04:00
7   2022-11-06 01:45:00-04:00
8   2022-11-06 02:00:00-05:00
9   2022-11-06 02:15:00-05:00
dtype: datetime64[ns, US/Eastern]

In [63]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer').dt.tz_convert(ZoneInfo('US/Eastern'))[6]
Out[63]: Timestamp('2022-11-06 01:30:00-0500', tz='US/Eastern')

In [64]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer').dt.tz_convert(ZoneInfo('US/Eastern')).diff()
Out[64]:
0               NaT
1   0 days 00:15:00
2   0 days 00:15:00
3   0 days 00:15:00
4   0 days 00:15:00
5   0 days 00:15:00
6   0 days 00:15:00
7   0 days 00:15:00
8   0 days 00:15:00
9   0 days 00:15:00
dtype: timedelta64[ns]

But just localizing to zoneinfo causes the ambiguous values issue

In [69]: s.dt.tz_localize(ZoneInfo('US/Eastern')).diff()
Out[69]:
0                 NaT
1     0 days 00:15:00
2     0 days 00:15:00
3     0 days 00:15:00
4   -1 days +23:15:00
5     0 days 00:15:00
6     0 days 00:15:00
7     0 days 00:15:00
8     0 days 01:15:00
9     0 days 00:15:00
dtype: timedelta64[ns]

@mroeschke mroeschke added Timezones Timezone data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 7, 2022
@mroeschke
Copy link
Member

Thanks for the report and diagnosis! Pull requests to correctly set fold in the code path you highlighted would be welcome

cc @jbrockmendel

@mroeschke mroeschke added this to the 1.5 milestone Sep 7, 2022
@jamesdow21
Copy link
Author

Haven't had to work with Cython before today, but I think I understand the logic of these sections

I'm going to try my best to see if I can figure out how to get this working

@jbrockmendel
Copy link
Member

The relevant logic for ambiguous="infer" lives in pandas._libs.tslibs.tz_conversion._get_dst_hours. ATM it is only implemented for dateutil/pytz (zoneinfo goes through the if info.use_tzlocal: branch of tz_localize_to_utc). It looks like _get_dst_hours relies on result_a and result_b which themselves rely on info.deltas, which isn't exposed by zoneinfo. So fixing this will require figuring out how to accomplish the same thing using available zoneinfo APIs.

@mroeschke mroeschke changed the title BUG: Series.dt.tz_localize not recognizing DST transition with zoneinfo timezone BUG: Series.dt.tz_localize not recognizing DST transition with zoneinfo timezone when ambiguous="infer" Sep 7, 2022
@lithomas1 lithomas1 modified the milestones: 1.5, 1.5.1 Sep 19, 2022
@datapythonista datapythonista modified the milestones: 1.5.1, 1.5.2 Oct 20, 2022
@datapythonista datapythonista modified the milestones: 1.5.2, 1.5.3 Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants