-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG/ENH: cleanup for Timestamp arithmetic #8916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes GH8865 (Timestamp - Timestamp -> Timedelta) This PR cleans up and extends `Timestamp` arithmetic similarly to the treatment for `Timedelta` in GH8884. It includes a new `to_datetime64()` method, and arithmetic now works between Timestamp and ndarrays. I also ensured comparison operations work properly between all of (Timestamp, Timedelta, NaT) and ndarrays. Implementation notes: wide use of the `NotImplemented` singleton let me cleanup many of these complex cases. I also strove to reduce the tight- coupling of `Timestamp`/`Timedelta` to pandas itself by removing use of the `_typ` property in tslib (I honestly don't quite understand why it needs to exist) and by not treating series/index any differently from any other ndarray-like object.
So However you eliminated the need for much of that, so +1 (though I think you went too far and now accept clear wrong types, e.g. dti + dti which must raise a TypeError) Need to check return types on a matrix of operations, they should all be pandas types (Timestamp/Timedelta/TimedeltaIndex/DateIndex), when operated with pandas types (e.g. a np.datetime64 + np.timedelta64) obviously will yield a np.datetime64 So need to wrap all return ops (which is a pass thru if its the right type already). The only wrinkle is that you need to determine array/scalars (I think simple enough to detect if one of the operands is an array-like (e.g. has a dtype), then you return DatetimeIndex/TimedeltaIndex). I think this needs a systematic test (their are some in tseries/test_base, so can prob just add on
This must raise a TypeError. It is a clear operation that should simply not be allowed. This is the point of a TypeError.
Furthermore Series / Timedelta / Timestamp ops need to be consistent with these. So need to matrix tests these (these raise the appropriate TypeErrors). I am still big -1 on ALWAYS raising NotImplementedError. This is just not very useful. So i'd like you to enumerate all the cases so they can be discussed (e.g. td + tdi, td - tdi, ts + tdi)....etc. Indicate where you would put NotImplementedError and TypeError, etc. We still have the side issue of: dti+dti, dti-dti 'work' but are set operations. need to think about this again. |
numpy does this and its completely useless. It should at the very least raise an error. I think this is just broken.
|
Travis was being very slow last night so I posted this PR anyways, but it looks like I still have a few fix-ups to do. My intention was not to break any existing behavior with this change -- may need some more tests to ensure that. I don't think I changed dti + dti or changed the code pathways for any (1d, 1d) operations. I was intentionally sticking to only stuff involving a scalar. I actually don't think we should ever raise Agreed RE the numpy bug. You'll notice I added explicit TypeErrors in |
af55baf
to
29ebfa8
Compare
when you have a chance can you rebase |
This change turns out to be much trickier to get right than I initially expected, due to a combination of factors:
So I'm not sure I'll be able to get this ready in time for 0.15.2 this weekend. All this makes me really look forward to switching to dynd for handling datetime arrays, instead of np.datetime64. datetime64 can't be relied on as much more than a boxed array of integers (actually, sometime I wonder if that would have been an easier choice...). |
@shoyer I don't think we'll be switching for datetime to dynd anytime soon. These these are not implemented their either. Theoretically yes we will be able to as the broadcasting will be correct with tz's, but practically the implementer (pandas here needs to actually be aware) and potentially do things. So yes dynd could help with this. But this is very close to being correct now w/o using much numpy and avoiding the pitfalls. |
sounds good. |
Closing -- this needs a reboot/refactor |
Fixes #8865 (Timestamp - Timestamp -> Timedelta)
This PR cleans up and extends
Timestamp
arithmetic similarly to my treatment forTimedelta
in #8884.It includes a new
to_datetime64()
method, and arithmetic now works between Timestamp and ndarrays. I also ensure comparison operations work properly between all of (Timestamp, Timedelta, NaT) and ndarrays.Implementation notes: wide use of the
NotImplemented
singleton let me cleanup some complex logic. I also strove to reduce the tight-coupling ofTimestamp
/Timedelta
to pandas itself by removing use of the_typ
property in tslib (I honestly don't quite understand why it needs to exist) and by not treating series/index any differently from any other ndarray-like objects.CC @jreback @jorisvandenbossche @immerrr