-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
POC/REF: de-duplicate utc->local code #46246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @WillAyd suggestions on the build failures? |
pandas/_libs/tslibs/tzconversion.pyx
Outdated
|
||
if self.use_dst: | ||
loc = self.trans.searchsorted(utc_val, side="right") - 1 | ||
return &loc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the problematic line. In a traditional C function, loc would go out of scope when the function ends. So returning the address of loc gives a pointer to a location that may very well be freed when the function returns.
can you highlite the non-microbenchmarks that matter here (e.g. a small microsecond decline doesn't matter if its just subsumed by other things) |
local_val = stamps[i] + delta | ||
else: | ||
local_val = stamps[i] + deltas[pos[i]] | ||
local_val = info.utc_val_to_local_val(stamps[i], pos, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scoder any ideas on how to track down the 2x performance hit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline
in foreign .pyx
files does not help across modules. Not sure if @cython.final
would, but probably not either. Calling a C function instead of a method may make a difference, though. That could also live in a .pxd
file – although I'm not sure if I ever tried to call a cimported function ( tz_convert_utc_to_tzlocal
above) from an inline
function in a .pxd
file. Might work.
Definitely try changing the method to a function here, even if you then just pass the Localizer
object into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making tz_convert_utc_to_tzlocal into a function (for experimenting, put everything in vectorized.pyx rule out pyx/pxd effects) doesn't seem to have helped.
Some things i've poked at:
adding @cython.initializedcheck(False)
in different places seems to cost a few microseconds, which i expected to go the other direction.
Keeping the L315-L322 inline but replacing each of the if use_foo:
checks with if info.use_foo:
appears perf-neutral. replacing deltas[pos[i]]
with info.deltas[pos[i]]
caused a minor slowdown.
writing tz_convert_utc_to_tzlocal as a module-level function but pasing each of the Localizer attributes directly instead of the instance (so the body of tz_convert_utc_to_tzlocal would be identical to the current in-loop code) did not help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what kind of code Cython is generating with the pointer indexing but have you tried declaring a contiguous memory view for pos instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for pos
? yes.
Closing in favor of #46259 and some other forthcoming refactoring. Looks like we're going to have to live with the checks being repeated for the forseeable future. |
For supporting non-nanosecond resolutions, it is helpful to have all the relevant logic in one place instead of repeated in a bunch of places. This introduces a Localizer class to do that.
The problem is that this causes a significant performance hit (asv results posted below).
The only thing I can think of that would account for this performance hit is that maybe in main there is some implicit inlining being done? Or the tzinfo-type checks are somehow optimized outside of the for loops?