Skip to content

POC/REF: de-duplicate utc->local code #46246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

jbrockmendel
Copy link
Member

For supporting non-nanosecond resolutions, it is helpful to have all the relevant logic in one place instead of repeated in a bunch of places. This introduces a Localizer class to do that.

The problem is that this causes a significant performance hit (asv results posted below).

The only thing I can think of that would account for this performance hit is that maybe in main there is some implicit inlining being done? Or the tzinfo-type checks are somehow optimized outside of the for loops?

time asv continuous -f 1.01 -E virtualenv main HEAD -b tslibs.normalize -b tslibs.tz_convert -b tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr
[...]
       before           after         ratio
     [221c3aaa]       [25e3632e]
     <main>           <ref-localizer>
+      16.2±0.5μs       37.3±0.5μs     2.31  tslibs.tz_convert.TimeTZConvert.time_tz_convert_from_utc(10000, datetime.timezone(datetime.timedelta(seconds=3600)))
+      14.8±0.7μs       29.3±0.4μs     1.97  tslibs.normalize.Normalize.time_is_date_array_normalized(10000, datetime.timezone.utc)
+      15.3±0.6μs       29.2±0.1μs     1.91  tslibs.normalize.Normalize.time_is_date_array_normalized(10000, None)
+      1.85±0.1ms      3.47±0.05ms     1.87  tslibs.normalize.Normalize.time_is_date_array_normalized(1000000, datetime.timezone(datetime.timedelta(seconds=3600)))
+      1.55±0.1ms       2.87±0.1ms     1.85  tslibs.normalize.Normalize.time_is_date_array_normalized(1000000, None)
+      1.60±0.1ms      2.92±0.05ms     1.82  tslibs.normalize.Normalize.time_is_date_array_normalized(1000000, datetime.timezone.utc)
+     2.21±0.05ms      4.00±0.07ms     1.81  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1000000, datetime.timezone.utc)
+     2.22±0.07ms      3.99±0.09ms     1.80  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1000000, None)
+      24.2±0.3μs       43.6±0.5μs     1.80  tslibs.normalize.Normalize.time_normalize_i8_timestamps(10000, None)
+        21.2±2μs       38.0±0.9μs     1.80  tslibs.normalize.Normalize.time_is_date_array_normalized(10000, datetime.timezone(datetime.timedelta(seconds=3600)))
+      24.9±0.3μs       42.0±0.9μs     1.69  tslibs.normalize.Normalize.time_normalize_i8_timestamps(10000, datetime.timezone.utc)
+     2.96±0.08ms      4.62±0.03ms     1.56  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1000000, datetime.timezone(datetime.timedelta(seconds=3600)))
+        579±10ns         891±40ns     1.54  tslibs.normalize.Normalize.time_is_date_array_normalized(100, datetime.timezone.utc)
+      35.1±0.6μs       53.6±0.8μs     1.53  tslibs.normalize.Normalize.time_normalize_i8_timestamps(10000, datetime.timezone(datetime.timedelta(seconds=3600)))
+        679±10ns         914±10ns     1.35  tslibs.normalize.Normalize.time_is_date_array_normalized(100, None)
+     2.25±0.02ms      3.01±0.01ms     1.34  tslibs.tz_convert.TimeTZConvert.time_tz_convert_from_utc(1000000, datetime.timezone(datetime.timedelta(seconds=3600)))
+      72.3±0.9μs         95.2±1μs     1.32  tslibs.tz_convert.TimeTZConvert.time_tz_convert_from_utc(10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+     11.2±0.04ms      14.6±0.09ms     1.31  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1000000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+     1.47±0.05μs       1.89±0.1μs     1.29  tslibs.tz_convert.TimeTZConvert.time_tz_convert_from_utc(100, datetime.timezone.utc)
+     4.57±0.07μs       5.88±0.8μs     1.29  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 2011, datetime.timezone(datetime.timedelta(seconds=3600)))
+        83.6±4μs          107±2μs     1.28  tslibs.normalize.Normalize.time_normalize_i8_timestamps(10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+     1.64±0.02μs       2.03±0.2μs     1.24  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 12000, None)
+     4.97±0.09μs       6.04±0.7μs     1.22  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 11000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+         426±8ns         518±20ns     1.21  tslibs.normalize.Normalize.time_is_date_array_normalized(0, datetime.timezone.utc)
+     1.75±0.05μs      2.07±0.04μs     1.19  tslibs.normalize.Normalize.time_normalize_i8_timestamps(100, datetime.timezone.utc)
+        501±10ns         592±10ns     1.18  tslibs.normalize.Normalize.time_is_date_array_normalized(1, tzlocal())
+         125±2μs          147±1μs     1.17  tslibs.tz_convert.TimeTZConvert.time_tz_convert_from_utc(10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+         128±4μs          150±5μs     1.17  tslibs.normalize.Normalize.time_normalize_i8_timestamps(10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+        511±20ns         593±20ns     1.16  tslibs.normalize.Normalize.time_is_date_array_normalized(1, None)
+     4.45±0.04μs       5.12±0.2μs     1.15  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 10000, datetime.timezone(datetime.timedelta(seconds=3600)))
+         216±1μs          246±1μs     1.14  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 3000, None)
+     4.49±0.07μs       5.10±0.1μs     1.14  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 4006, datetime.timezone(datetime.timedelta(seconds=3600)))
+         216±1μs        245±0.6μs     1.13  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 3000, datetime.timezone.utc)
+     4.53±0.02μs      5.08±0.04μs     1.12  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 1011, datetime.timezone(datetime.timedelta(seconds=3600)))
+         218±4μs          244±2μs     1.12  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 1011, datetime.timezone.utc)
+     4.56±0.08μs       5.11±0.1μs     1.12  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 9000, datetime.timezone(datetime.timedelta(seconds=3600)))
+      21.5±0.5ms       24.1±0.3ms     1.12  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 3000, datetime.timezone.utc)
+      21.3±0.2ms       23.8±0.4ms     1.12  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 2000, datetime.timezone.utc)
+         218±1μs          243±1μs     1.12  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2000, datetime.timezone.utc)
+      4.57±0.1μs       5.10±0.1μs     1.12  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 9000, datetime.timezone(datetime.timedelta(seconds=3600)))
+      21.6±0.3ms       24.1±0.3ms     1.11  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 3000, None)
+      4.54±0.1μs      5.05±0.06μs     1.11  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 12000, datetime.timezone(datetime.timedelta(seconds=3600)))
+     4.63±0.02μs      5.14±0.07μs     1.11  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 4000, datetime.timezone(datetime.timedelta(seconds=3600)))
+      6.95±0.2μs      7.72±0.07μs     1.11  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 8000, datetime.timezone(datetime.timedelta(seconds=3600)))
+         225±2μs          249±3μs     1.11  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2011, datetime.timezone.utc)
+      25.6±0.5ms       28.4±0.3ms     1.11  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 4006, None)
+      21.6±0.3ms         23.8±1ms     1.10  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 2000, None)
+      4.73±0.1μs       5.20±0.1μs     1.10  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 9000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+     4.59±0.03μs       5.03±0.1μs     1.10  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 12000, datetime.timezone(datetime.timedelta(seconds=3600)))
+     1.69±0.01μs         1.85±0μs     1.10  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 1000, tzlocal())
+      21.7±0.2ms       23.8±0.3ms     1.09  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 1000, datetime.timezone.utc)
+     1.67±0.02μs      1.83±0.06μs     1.09  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 9000, None)
+     3.99±0.05μs      4.34±0.01μs     1.09  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 10000, datetime.timezone.utc)
+     4.83±0.09μs      5.25±0.03μs     1.09  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 5000, None)
+     3.81±0.04μs      4.14±0.04μs     1.09  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 2000, None)
+     4.21±0.04μs      4.56±0.03μs     1.08  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 12000, None)
+         221±3μs          240±3μs     1.08  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 2000, None)
+      22.4±0.4ms       24.2±0.2ms     1.08  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 2011, None)
+         226±3μs        245±0.8μs     1.08  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 1011, None)
+      25.8±0.3ms       27.9±0.3ms     1.08  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 4000, None)
+     3.85±0.02μs      4.17±0.07μs     1.08  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 1000, None)
+         262±2μs          283±2μs     1.08  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 4000, None)
+     1.67±0.02μs      1.80±0.01μs     1.08  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 9000, tzlocal())
+      21.9±0.2ms       23.5±0.3ms     1.08  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 1000, None)
+      23.9±0.4ms       25.7±0.3ms     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 7000, None)
+     1.57±0.02μs      1.69±0.02μs     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 5000, datetime.timezone.utc)
+     4.69±0.03μs      5.03±0.09μs     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 6000, datetime.timezone(datetime.timedelta(seconds=3600)))
+     1.59±0.03μs      1.70±0.04μs     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 6000, datetime.timezone.utc)
+         258±6μs          276±3μs     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 12000, datetime.timezone.utc)
+         263±2μs          282±2μs     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 4006, datetime.timezone.utc)
+      24.1±0.3ms       25.7±0.3ms     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 6000, None)
+      24.5±0.3ms       26.1±0.3ms     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 11000, None)
+     3.91±0.08μs      4.17±0.05μs     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 1011, None)
+     1.68±0.05μs      1.79±0.03μs     1.07  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 10000, None)
+     7.24±0.08μs      7.71±0.04μs     1.06  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 12000, datetime.timezone(datetime.timedelta(seconds=3600)))
+         265±2μs          281±3μs     1.06  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 12000, datetime.timezone(datetime.timedelta(seconds=3600)))
+     4.20±0.06μs      4.45±0.06μs     1.06  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 4000, datetime.timezone.utc)
+      25.8±0.4ms       27.3±0.2ms     1.06  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 12000, datetime.timezone(datetime.timedelta(seconds=3600)))
+     4.11±0.03μs      4.34±0.04μs     1.06  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 11000, datetime.timezone.utc)
+      22.5±0.6ms       23.8±0.2ms     1.06  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 1011, None)
+      31.7±0.3ms       33.4±0.3ms     1.05  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1000000, 5000, None)
+     7.28±0.08μs      7.63±0.06μs     1.05  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 9000, datetime.timezone(datetime.timedelta(seconds=3600)))
+     1.62±0.03μs      1.70±0.02μs     1.05  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 10000, datetime.timezone.utc)
+     3.85±0.08μs      4.01±0.01μs     1.04  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(100, 1000, datetime.timezone.utc)
+     1.63±0.01μs      1.69±0.02μs     1.04  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 10000, datetime.timezone.utc)
+         353±5ms          360±1ms     1.02  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 4000, tzlocal())
-         363±4ms        350±0.9ms     0.96  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(10000, 8000, tzlocal())
-         204±6μs          190±2μs     0.93  tslibs.tz_convert.TimeTZConvert.time_tz_localize_to_utc(10000, datetime.timezone(datetime.timedelta(seconds=3600)))
-     11.6±0.05ms      10.8±0.07ms     0.93  tslibs.tz_convert.TimeTZConvert.time_tz_convert_from_utc(1000000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
-     5.57±0.06μs      4.89±0.03μs     0.88  tslibs.tz_convert.TimeTZConvert.time_tz_convert_from_utc(1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)

@jbrockmendel
Copy link
Member Author

cc @WillAyd suggestions on the build failures?


if self.use_dst:
loc = self.trans.searchsorted(utc_val, side="right") - 1
return &loc
Copy link
Member

@WillAyd WillAyd Mar 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the problematic line. In a traditional C function, loc would go out of scope when the function ends. So returning the address of loc gives a pointer to a location that may very well be freed when the function returns.

@jreback jreback added the Timezones Timezone data dtype label Mar 6, 2022
@jreback
Copy link
Contributor

jreback commented Mar 6, 2022

can you highlite the non-microbenchmarks that matter here (e.g. a small microsecond decline doesn't matter if its just subsumed by other things)

local_val = stamps[i] + delta
else:
local_val = stamps[i] + deltas[pos[i]]
local_val = info.utc_val_to_local_val(stamps[i], pos, i)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scoder any ideas on how to track down the 2x performance hit here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline in foreign .pyx files does not help across modules. Not sure if @cython.final would, but probably not either. Calling a C function instead of a method may make a difference, though. That could also live in a .pxd file – although I'm not sure if I ever tried to call a cimported function ( tz_convert_utc_to_tzlocal above) from an inline function in a .pxd file. Might work.

Definitely try changing the method to a function here, even if you then just pass the Localizer object into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making tz_convert_utc_to_tzlocal into a function (for experimenting, put everything in vectorized.pyx rule out pyx/pxd effects) doesn't seem to have helped.

Some things i've poked at:

adding @cython.initializedcheck(False) in different places seems to cost a few microseconds, which i expected to go the other direction.

Keeping the L315-L322 inline but replacing each of the if use_foo: checks with if info.use_foo: appears perf-neutral. replacing deltas[pos[i]] with info.deltas[pos[i]] caused a minor slowdown.

writing tz_convert_utc_to_tzlocal as a module-level function but pasing each of the Localizer attributes directly instead of the instance (so the body of tz_convert_utc_to_tzlocal would be identical to the current in-loop code) did not help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what kind of code Cython is generating with the pointer indexing but have you tried declaring a contiguous memory view for pos instead?

https://docs.cython.org/en/stable/src/userguide/numpy_tutorial.html#declaring-the-numpy-arrays-as-contiguous

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for pos? yes.

@jbrockmendel
Copy link
Member Author

Closing in favor of #46259 and some other forthcoming refactoring. Looks like we're going to have to live with the checks being repeated for the forseeable future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants