Skip to content

BUG: Issues with DatetimeTZ values in where and combine_first (#21469 + #21546) #21660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,6 @@ Timezones
- Bug in :class:`Series` constructor which would coerce tz-aware and tz-naive :class:`Timestamp`s to tz-aware (:issue:`13051`)
- Bug in :class:`Index` with ``datetime64[ns, tz]`` dtype that did not localize integer data correctly (:issue:`20964`)
- Bug in :class:`DatetimeIndex` where constructing with an integer and tz would not localize correctly (:issue:`12619`)
- Bug in :func:`DataFrame.fillna` where a ``ValueError`` would raise when one column contained a ``datetime64[ns, tz]`` dtype (:issue:`15522`)

Offsets
^^^^^^^
Expand Down Expand Up @@ -288,13 +287,18 @@ Indexing
^^^^^^^^

- The traceback from a ``KeyError`` when asking ``.loc`` for a single missing label is now shorter and more clear (:issue:`21557`)
- When ``.ix`` is asked for a missing integer label in a :class:`MultiIndex` with a first level of integer type, it now raises a ``KeyError`` - consistently with the case of a flat :class:`Int64Index` - rather than falling back to positional indexing (:issue:`21593`)
- When ``.ix`` is asked for a missing integer label in a :class:`MultiIndex` with a first level of integer type, it now raises a ``KeyError``, consistently with the case of a flat :class:`Int64Index, rather than falling back to positional indexing (:issue:`21593`)
- Bug in :meth:`DatetimeIndex.reindex` when reindexing a tz-naive and tz-aware :class:`DatetimeIndex` (:issue:`8306`)
- Bug in :class:`DataFrame` when setting values with ``.loc`` and a timezone aware :class:`DatetimeIndex` (:issue:`11365`)
- Bug when indexing :class:`DatetimeIndex` with nanosecond resolution dates and timezones (:issue:`11679`)

-

Missing
^^^^^^^

- Bug in :func:`DataFrame.fillna` where a ``ValueError`` would raise when one column contained a ``datetime64[ns, tz]`` dtype (:issue:`15522`)

MultiIndex
^^^^^^^^^^

Expand Down Expand Up @@ -335,6 +339,8 @@ Reshaping
^^^^^^^^^

- Bug in :func:`pandas.concat` when joining resampled DataFrames with timezone aware index (:issue:`13783`)
- Bug in :meth:`Series.combine_first` with ``datetime64[ns, tz]`` dtype which would return tz-naive result (:issue:`21469`)
- Bug in :meth:`Series.where` and :meth:`DataFrame.where` with ``datetime64[ns, tz]`` dtype (:issue:`21546`)
-
-

Expand Down
15 changes: 1 addition & 14 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from pandas.compat import long, zip, iteritems, PY36, OrderedDict
from pandas.core.config import get_option
from pandas.core.dtypes.generic import ABCSeries, ABCIndex
from pandas.core.dtypes.common import _NS_DTYPE, is_integer
from pandas.core.dtypes.common import is_integer
from pandas.core.dtypes.inference import _iterable_not_string
from pandas.core.dtypes.missing import isna, isnull, notnull # noqa
from pandas.core.dtypes.cast import construct_1d_object_array_from_listlike
Expand Down Expand Up @@ -410,19 +410,6 @@ def _apply_if_callable(maybe_callable, obj, **kwargs):
return maybe_callable


def _where_compat(mask, arr1, arr2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

if arr1.dtype == _NS_DTYPE and arr2.dtype == _NS_DTYPE:
new_vals = np.where(mask, arr1.view('i8'), arr2.view('i8'))
return new_vals.view(_NS_DTYPE)

if arr1.dtype == _NS_DTYPE:
arr1 = tslib.ints_to_pydatetime(arr1.view('i8'))
if arr2.dtype == _NS_DTYPE:
arr2 = tslib.ints_to_pydatetime(arr2.view('i8'))

return np.where(mask, arr1, arr2)


def _dict_compat(d):
"""
Helper function to convert datetimelike-keyed dicts to Timestamp-keyed dict
Expand Down
8 changes: 5 additions & 3 deletions pandas/core/internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -1476,14 +1476,16 @@ def where(self, other, cond, align=True, errors='raise',
if transpose:
values = values.T

other = getattr(other, 'values', other)
other = getattr(other, '_values', getattr(other, 'values', other))
cond = getattr(cond, 'values', cond)

# If the default broadcasting would go in the wrong direction, then
# explicitly reshape other instead
if getattr(other, 'ndim', 0) >= 1:
if values.ndim - 1 == other.ndim and axis == 1:
other = other.reshape(tuple(other.shape + (1, )))
elif transpose and values.ndim == self.ndim - 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without lines 1487-1488 cond and values have opposite dimension, which is the root cause of (the DataFrame part of) #21544.

The issue is DatetimeTZBlock._try_coerce_args returns values as a 2-D array. I've updated _try_coerce_args so that other also is a 2-D array for consistency, which fixes part of the issue.

However, we don't run _try_coerce_args until the interior of func at 1499. Thus lines 1476-1477 (values = values.T) is a no-op given a DatetimeTZBlock (since at that point values is 1-D). other._values is also a 1-D DatetimeIndex. So to make cond consistent with the values and other we get out of _try_coerce_args it needs to be transposed.

An alternative would be changing `DatetimeTZBlock._try_coerce_args to return 1-D arrays for values and values_mask (changing 2875-2877). But if I do that it breaks setitem.

cond = cond.T

if not hasattr(cond, 'shape'):
raise ValueError("where must have a condition that is ndarray "
Expand Down Expand Up @@ -2888,8 +2890,8 @@ def _try_coerce_args(self, values, other):
elif isinstance(other, self._holder):
if other.tz != self.values.tz:
raise ValueError("incompatible or non tz-aware value")
other = other.asi8
other_mask = isna(other)
other_mask = _block_shape(isna(other), ndim=self.ndim)
other = _block_shape(other.asi8, ndim=self.ndim)
elif isinstance(other, (np.datetime64, datetime, date)):
other = tslib.Timestamp(other)
tz = getattr(other, 'tz', None)
Expand Down
10 changes: 6 additions & 4 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
is_float_dtype,
is_extension_type,
is_extension_array_dtype,
is_datetimelike,
is_datetime64tz_dtype,
is_timedelta64_dtype,
is_object_dtype,
Expand Down Expand Up @@ -78,6 +79,7 @@
from pandas._libs import index as libindex, tslib as libts, lib, iNaT
from pandas.core.config import get_option
from pandas.core.strings import StringMethods
from pandas.core.tools.datetimes import to_datetime

import pandas.plotting._core as gfx

Expand Down Expand Up @@ -2303,10 +2305,10 @@ def combine_first(self, other):
new_index = self.index.union(other.index)
this = self.reindex(new_index, copy=False)
other = other.reindex(new_index, copy=False)
# TODO: do we need name?
name = ops.get_op_result_name(self, other) # noqa
rs_vals = com._where_compat(isna(this), other._values, this._values)
return self._constructor(rs_vals, index=new_index).__finalize__(self)
if is_datetimelike(this) and not is_datetimelike(other):
other = to_datetime(other)

return this.where(notna(this), other)

def update(self, other):
"""
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/frame/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -2936,6 +2936,20 @@ def test_where_callable(self):
tm.assert_frame_equal(result,
(df + 2).where((df + 2) > 8, (df + 2) + 10))

def test_where_tz_values(self, tz_naive_fixture):
df1 = DataFrame(DatetimeIndex(['20150101', '20150102', '20150103'],
tz=tz_naive_fixture),
columns=['date'])
df2 = DataFrame(DatetimeIndex(['20150103', '20150104', '20150105'],
tz=tz_naive_fixture),
columns=['date'])
mask = DataFrame([True, True, False], columns=['date'])
exp = DataFrame(DatetimeIndex(['20150101', '20150102', '20150105'],
tz=tz_naive_fixture),
columns=['date'])
result = df1.where(mask, df2)
assert_frame_equal(exp, result)

def test_mask(self):
df = DataFrame(np.random.randn(5, 3))
cond = df > 0
Expand Down
9 changes: 4 additions & 5 deletions pandas/tests/indexing/test_coercion.py
Original file line number Diff line number Diff line change
Expand Up @@ -580,12 +580,11 @@ def test_where_series_datetime64(self, fill_val, exp_dtype):
values = pd.Series(pd.date_range(fill_val, periods=4))
if fill_val.tz:
exp = pd.Series([pd.Timestamp('2011-01-01'),
pd.Timestamp('2012-01-02 05:00'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what changed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous behavior was was treating other for DatetimeTZ as ints (see the writeup for #21546). As a result was getting incorrectly merged with naive datetime Series as naive UTC timestamps. Fixing the behavior of tz-aware other also fixed this and so now the result of combine_first is a promotion of the whole Series to object (as was desired per the deleted xfail/TODO).

pd.Timestamp('2012-01-02 00:00', tz='US/Eastern'),
pd.Timestamp('2011-01-03'),
pd.Timestamp('2012-01-04 05:00')])
self._assert_where_conversion(obj, cond, values, exp,
'datetime64[ns]')
pytest.xfail("ToDo: do not coerce to UTC, must be object")
pd.Timestamp('2012-01-04 00:00',
tz='US/Eastern')])
self._assert_where_conversion(obj, cond, values, exp, exp_dtype)

exp = pd.Series([pd.Timestamp('2011-01-01'), values[1],
pd.Timestamp('2011-01-03'), values[3]])
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/series/indexing/test_boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,18 @@ def test_where_datetime_conversion():
assert_series_equal(rs, expected)


def test_where_dt_tz_values(tz_naive_fixture):
ser1 = pd.Series(pd.DatetimeIndex(['20150101', '20150102', '20150103'],
tz=tz_naive_fixture))
ser2 = pd.Series(pd.DatetimeIndex(['20160514', '20160515', '20160516'],
tz=tz_naive_fixture))
mask = pd.Series([True, True, False])
result = ser1.where(mask, ser2)
exp = pd.Series(pd.DatetimeIndex(['20150101', '20150102', '20160516'],
tz=tz_naive_fixture))
assert_series_equal(exp, result)


def test_mask():
# compare with tested results in test_where
s = Series(np.random.randn(5))
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/series/test_combine_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,20 @@ def get_result_type(dtype, dtype2):
]).dtype
assert result.kind == expected

def test_combine_first_dt_tz_values(self, tz_naive_fixture):
ser1 = pd.Series(pd.DatetimeIndex(['20150101', '20150102', '20150103'],
tz=tz_naive_fixture),
name='ser1')
ser2 = pd.Series(pd.DatetimeIndex(['20160514', '20160515', '20160516'],
tz=tz_naive_fixture),
index=[2, 3, 4], name='ser2')
result = ser1.combine_first(ser2)
exp_vals = pd.DatetimeIndex(['20150101', '20150102', '20150103',
'20160515', '20160516'],
tz=tz_naive_fixture)
exp = pd.Series(exp_vals, name='ser1')
assert_series_equal(exp, result)

def test_concat_empty_series_dtypes(self):

# booleans
Expand Down