BUG/ENH: cleanup for Timestamp arithmetic #8916

shoyer · 2014-11-28T05:37:15Z

Fixes #8865 (Timestamp - Timestamp -> Timedelta)

This PR cleans up and extends Timestamp arithmetic similarly to my treatment for Timedelta in #8884.

It includes a new to_datetime64() method, and arithmetic now works between Timestamp and ndarrays. I also ensure comparison operations work properly between all of (Timestamp, Timedelta, NaT) and ndarrays.

Implementation notes: wide use of the NotImplemented singleton let me cleanup some complex logic. I also strove to reduce the tight-coupling of Timestamp/Timedelta to pandas itself by removing use of the _typ property in tslib (I honestly don't quite understand why it needs to exist) and by not treating series/index any differently from any other ndarray-like objects.

CC @jreback @jorisvandenbossche @immerrr

Fixes GH8865 (Timestamp - Timestamp -> Timedelta) This PR cleans up and extends `Timestamp` arithmetic similarly to the treatment for `Timedelta` in GH8884. It includes a new `to_datetime64()` method, and arithmetic now works between Timestamp and ndarrays. I also ensured comparison operations work properly between all of (Timestamp, Timedelta, NaT) and ndarrays. Implementation notes: wide use of the `NotImplemented` singleton let me cleanup many of these complex cases. I also strove to reduce the tight- coupling of `Timestamp`/`Timedelta` to pandas itself by removing use of the `_typ` property in tslib (I honestly don't quite understand why it needs to exist) and by not treating series/index any differently from any other ndarray-like object.

jreback · 2014-11-28T13:51:55Z

So _typ is needed if you need type comparisons (isinstance) in cython when you can't import things (well you can, but you don' generally want to), e.g. TimedeltaIndex and such.

However you eliminated the need for much of that, so +1 (though I think you went too far and now accept clear wrong types, e.g. dti + dti which must raise a TypeError)

Need to check return types on a matrix of operations, they should all be pandas types (Timestamp/Timedelta/TimedeltaIndex/DateIndex), when operated with pandas types (e.g. a np.datetime64 + np.timedelta64) obviously will yield a np.datetime64

So need to wrap all return ops (which is a pass thru if its the right type already). The only wrinkle is that you need to determine array/scalars (I think simple enough to detect if one of the operands is an array-like (e.g. has a dtype), then you return DatetimeIndex/TimedeltaIndex).

I think this needs a systematic test (their are some in tseries/test_base, so can prob just add on
And fully-compare e.g. don't just compare the values, but test the types as well

ts = Timestamp('20130101')
In [5]: ts-pd.to_datetime(['1999-12-31']).values
Out[5]: array([86400000000000], dtype='timedelta64[ns]')

This must raise a TypeError. It is a clear operation that should simply not be allowed. This is the point of a TypeError.

In [6]: pd.date_range('20130101',periods=3)+ts                                 
NotImplementedError:

Furthermore Series / Timedelta / Timestamp ops need to be consistent with these. So need to matrix tests these (these raise the appropriate TypeErrors). I am still big -1 on ALWAYS raising NotImplementedError. This is just not very useful. So i'd like you to enumerate all the cases so they can be discussed (e.g. td + tdi, td - tdi, ts + tdi)....etc. Indicate where you would put NotImplementedError and TypeError, etc.

We still have the side issue of:

dti+dti, dti-dti 'work' but are set operations. need to think about this again.

jreback · 2014-11-28T14:02:53Z

numpy does this and its completely useless. It should at the very least raise an error. I think this is just broken.

In [12]: pd.date_range('20130101',periods=3).values+ts.value
Out[12]: 
array(['2056-01-01T19:00:00.000000000-0500',
       '2056-01-02T19:00:00.000000000-0500',
       '2056-01-03T19:00:00.000000000-0500'], dtype='datetime64[ns]')

shoyer · 2014-11-28T17:11:41Z

Travis was being very slow last night so I posted this PR anyways, but it looks like I still have a few fix-ups to do.

My intention was not to break any existing behavior with this change -- may need some more tests to ensure that. I don't think I changed dti + dti or changed the code pathways for any (1d, 1d) operations. I was intentionally sticking to only stuff involving a scalar.

I actually don't think we should ever raise NotImplementedError in arithmetic or comparison operations (that is old code, maybe the path was not exposed before). NotImplemented is totally distinct though, and manifests itself to users via a helpful TypeError.

Agreed RE the numpy bug. You'll notice I added explicit TypeErrors in tslib.pyx to avoid those.

jreback · 2014-12-03T02:55:11Z

when you have a chance can you rebase

shoyer · 2014-12-03T08:36:49Z

This change turns out to be much trickier to get right than I initially expected, due to a combination of factors:

Timestamp optionally supports timezones and offests, which cannot be represented in datetime64 objects.
Thus, we need to support (Timestamp, array of Timestamp) operations (e.g., for tz aware Timestamps or for comparisons to NaT). The easiest way to do this is to let numpy ndararys do their automatic broadcasting thing.
OTOH, to do vectorized operations, setting a higher array priority is necessary (this conflicts with automatic broadcasting).
np.datetime64[ns] will silently convert to int64 under some conditions due to the numpy bugs (sigh), breaking some of the obvious solutions.
DataFrame operations with dates are already sort of broken, but I would like to avoid making them worse

So I'm not sure I'll be able to get this ready in time for 0.15.2 this weekend.

All this makes me really look forward to switching to dynd for handling datetime arrays, instead of np.datetime64. datetime64 can't be relied on as much more than a boxed array of integers (actually, sometime I wonder if that would have been an easier choice...).

jreback · 2014-12-03T11:05:50Z

@shoyer I don't think we'll be switching for datetime to dynd anytime soon. These these are not implemented their either. Theoretically yes we will be able to as the broadcasting will be correct with tz's, but practically the implementer (pandas here needs to actually be aware) and potentially do things. So yes dynd could help with this. But this is very close to being correct now w/o using much numpy and avoiding the pitfalls.

shoyer · 2014-12-04T00:12:28Z

@jreback I am going to make a separate PR to get some of the easy changes from here into 0.15.2 (e.g., adding the to_datetime64 method). You already fixed Timestamp - Timestamp in #8981.

jreback · 2014-12-04T00:21:28Z

sounds good.

shoyer · 2014-12-09T23:26:55Z

Closing -- this needs a reboot/refactor

jreback added Bug Timedelta Timedelta data type Datetime Datetime data dtype labels Nov 28, 2014

jreback added this to the 0.15.2 milestone Nov 28, 2014

replace TBD with PR number (and retrigger Travis)

dc72e04

fixes for Timestamp/Timedelta comparison ops

29ebfa8

shoyer force-pushed the fix-timestamp-math branch from af55baf to 29ebfa8 Compare November 29, 2014 04:27

shoyer mentioned this pull request Nov 29, 2014

BUG: Timestamp comparisons to dataframe should raise if there are mismatched types #8932

Closed

jreback modified the milestones: 0.16.0, 0.15.2 Dec 3, 2014

WIP: terrible hack to attempt to fix dt comparisons

ef97b8d

shoyer closed this Dec 9, 2014

shoyer removed this from the 0.16.0 milestone Dec 9, 2014

shoyer mentioned this pull request Jan 15, 2015

ENH: add Timestamp.to_datetime64 #9255

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/ENH: cleanup for Timestamp arithmetic #8916

BUG/ENH: cleanup for Timestamp arithmetic #8916

shoyer commented Nov 28, 2014

jreback commented Nov 28, 2014

jreback commented Nov 28, 2014

shoyer commented Nov 28, 2014

jreback commented Dec 3, 2014

shoyer commented Dec 3, 2014

jreback commented Dec 3, 2014

shoyer commented Dec 4, 2014

jreback commented Dec 4, 2014

shoyer commented Dec 9, 2014

BUG/ENH: cleanup for Timestamp arithmetic #8916

BUG/ENH: cleanup for Timestamp arithmetic #8916

Conversation

shoyer commented Nov 28, 2014

jreback commented Nov 28, 2014

jreback commented Nov 28, 2014

shoyer commented Nov 28, 2014

jreback commented Dec 3, 2014

shoyer commented Dec 3, 2014

jreback commented Dec 3, 2014

shoyer commented Dec 4, 2014

jreback commented Dec 4, 2014

shoyer commented Dec 9, 2014