Skip to content

Commit b83228b

Browse files
Merge remote-tracking branch 'upstream/master' into GH36666
2 parents 8701f26 + d850140 commit b83228b

33 files changed

+1006
-755
lines changed

doc/source/development/contributing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -598,7 +598,7 @@ Building master branch documentation
598598

599599
When pull requests are merged into the pandas ``master`` branch, the main parts of
600600
the documentation are also built by Travis-CI. These docs are then hosted `here
601-
<https://dev.pandas.io>`__, see also
601+
<https://pandas.pydata.org/docs/dev/>`__, see also
602602
the :ref:`Continuous Integration <contributing.ci>` section.
603603

604604
.. _contributing.code:

doc/source/user_guide/io.rst

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5686,7 +5686,7 @@ ignored.
56865686
dtypes: float64(1), int64(1)
56875687
memory usage: 15.3 MB
56885688
5689-
Given the next test set:
5689+
The following test functions will be used below to compare the performance of several IO methods:
56905690

56915691
.. code-block:: python
56925692
@@ -5791,7 +5791,7 @@ Given the next test set:
57915791
def test_parquet_read():
57925792
pd.read_parquet("test.parquet")
57935793
5794-
When writing, the top-three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``.
5794+
When writing, the top three functions in terms of speed are ``test_feather_write``, ``test_hdf_fixed_write`` and ``test_hdf_fixed_write_compress``.
57955795

57965796
.. code-block:: ipython
57975797
@@ -5825,7 +5825,7 @@ When writing, the top-three functions in terms of speed are ``test_feather_write
58255825
In [13]: %timeit test_parquet_write(df)
58265826
67.6 ms ± 706 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
58275827
5828-
When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and
5828+
When reading, the top three functions in terms of speed are ``test_feather_read``, ``test_pickle_read`` and
58295829
``test_hdf_fixed_read``.
58305830

58315831

@@ -5862,8 +5862,7 @@ When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and
58625862
24.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
58635863
58645864
5865-
For this test case ``test.pkl.compress``, ``test.parquet`` and ``test.feather`` took the least space on disk.
5866-
Space on disk (in bytes)
5865+
The files ``test.pkl.compress``, ``test.parquet`` and ``test.feather`` took the least space on disk (in bytes).
58675866

58685867
.. code-block:: none
58695868

doc/source/whatsnew/v1.2.0.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -416,7 +416,6 @@ Strings
416416
- Bug in :func:`to_numeric` raising a ``TypeError`` when attempting to convert a string dtype :class:`Series` containing only numeric strings and ``NA`` (:issue:`37262`)
417417
-
418418

419-
420419
Interval
421420
^^^^^^^^
422421

@@ -467,6 +466,7 @@ I/O
467466
- Bug in :func:`read_table` and :func:`read_csv` when ``delim_whitespace=True`` and ``sep=default`` (:issue:`36583`)
468467
- Bug in :meth:`to_json` with ``lines=True`` and ``orient='records'`` the last line of the record is not appended with 'new line character' (:issue:`36888`)
469468
- Bug in :meth:`read_parquet` with fixed offset timezones. String representation of timezones was not recognized (:issue:`35997`, :issue:`36004`)
469+
- Bug in :meth:`DataFrame.to_html`, :meth:`DataFrame.to_string`, and :meth:`DataFrame.to_latex` ignoring the ``na_rep`` argument when ``float_format`` was also specified (:issue:`9046`, :issue:`13828`)
470470
- Bug in output rendering of complex numbers showing too many trailing zeros (:issue:`36799`)
471471
- Bug in :class:`HDFStore` threw a ``TypeError`` when exporting an empty :class:`DataFrame` with ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`20594`)
472472

@@ -530,7 +530,7 @@ Other
530530
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly raising ``AssertionError`` instead of ``ValueError`` when invalid parameter combinations are passed (:issue:`36045`)
531531
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` with numeric values and string ``to_replace`` (:issue:`34789`)
532532
- Fixed bug in metadata propagation incorrectly copying DataFrame columns as metadata when the column name overlaps with the metadata name (:issue:`37037`)
533-
- Fixed metadata propagation in the :class:`Series.dt` and :class:`Series.str` accessors and :class:`DataFrame.duplicated` and ::class:`DataFrame.stack` methods (:issue:`28283`)
533+
- Fixed metadata propagation in the :class:`Series.dt` and :class:`Series.str` accessors and :class:`DataFrame.duplicated` and :class:`DataFrame.stack` and :class:`DataFrame.unstack` and :class:`DataFrame.pivot` methods (:issue:`28283`)
534534
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`)
535535
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`)
536536

pandas/core/arrays/timedeltas.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -381,15 +381,15 @@ def sum(
381381
nv.validate_sum(
382382
(), dict(dtype=dtype, out=out, keepdims=keepdims, initial=initial)
383383
)
384-
if not len(self):
385-
return NaT
386-
if not skipna and self._hasnans:
384+
if not self.size and (self.ndim == 1 or axis is None):
387385
return NaT
388386

389387
result = nanops.nansum(
390388
self._data, axis=axis, skipna=skipna, min_count=min_count
391389
)
392-
return Timedelta(result)
390+
if is_scalar(result):
391+
return Timedelta(result)
392+
return self._from_backing_data(result)
393393

394394
def std(
395395
self,

pandas/core/base.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1201,6 +1201,16 @@ def factorize(self, sort: bool = False, na_sentinel: Optional[int] = -1):
12011201
>>> ser.searchsorted([1, 3], side='right')
12021202
array([1, 3])
12031203
1204+
>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
1205+
>>> ser
1206+
0 2000-03-11
1207+
1 2000-03-12
1208+
2 2000-03-13
1209+
dtype: datetime64[ns]
1210+
1211+
>>> ser.searchsorted('3/14/2000')
1212+
3
1213+
12041214
>>> ser = pd.Categorical(
12051215
... ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
12061216
... )

pandas/core/frame.py

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
Type,
3535
Union,
3636
cast,
37+
overload,
3738
)
3839
import warnings
3940

@@ -155,6 +156,8 @@
155156
import pandas.plotting
156157

157158
if TYPE_CHECKING:
159+
from typing import Literal
160+
158161
from pandas.core.groupby.generic import DataFrameGroupBy
159162

160163
from pandas.io.formats.style import Styler
@@ -971,9 +974,6 @@ def iterrows(self) -> Iterable[Tuple[Label, Series]]:
971974
data : Series
972975
The data of the row as a Series.
973976
974-
it : generator
975-
A generator that iterates over the rows of the frame.
976-
977977
See Also
978978
--------
979979
DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.
@@ -4706,6 +4706,30 @@ def set_index(
47064706
if not inplace:
47074707
return frame
47084708

4709+
@overload
4710+
# https://github.com/python/mypy/issues/6580
4711+
# Overloaded function signatures 1 and 2 overlap with incompatible return types
4712+
def reset_index( # type: ignore[misc]
4713+
self,
4714+
level: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
4715+
drop: bool = ...,
4716+
inplace: Literal[False] = ...,
4717+
col_level: Hashable = ...,
4718+
col_fill: Label = ...,
4719+
) -> DataFrame:
4720+
...
4721+
4722+
@overload
4723+
def reset_index(
4724+
self,
4725+
level: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
4726+
drop: bool = ...,
4727+
inplace: Literal[True] = ...,
4728+
col_level: Hashable = ...,
4729+
col_fill: Label = ...,
4730+
) -> None:
4731+
...
4732+
47094733
def reset_index(
47104734
self,
47114735
level: Optional[Union[Hashable, Sequence[Hashable]]] = None,
@@ -7185,8 +7209,6 @@ def explode(
71857209
raise ValueError("columns must be unique")
71867210

71877211
df = self.reset_index(drop=True)
7188-
# TODO: use overload to refine return type of reset_index
7189-
assert df is not None # needed for mypy
71907212
result = df[column].explode()
71917213
result = df.drop([column], axis=1).join(result)
71927214
if ignore_index:
@@ -7256,7 +7278,9 @@ def unstack(self, level=-1, fill_value=None):
72567278
"""
72577279
from pandas.core.reshape.reshape import unstack
72587280

7259-
return unstack(self, level, fill_value)
7281+
result = unstack(self, level, fill_value)
7282+
7283+
return result.__finalize__(self, method="unstack")
72607284

72617285
@Appender(_shared_docs["melt"] % dict(caller="df.melt(", other="melt"))
72627286
def melt(

pandas/core/indexes/category.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -377,11 +377,6 @@ def astype(self, dtype, copy=True):
377377

378378
return Index.astype(self, dtype=dtype, copy=copy)
379379

380-
@cache_readonly
381-
def _isnan(self):
382-
""" return if each value is nan"""
383-
return self._data.codes == -1
384-
385380
@doc(Index.fillna)
386381
def fillna(self, value, downcast=None):
387382
value = self._validate_scalar(value)

pandas/core/indexes/datetimelike.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ def wrapper(left, right):
7878

7979

8080
@inherit_names(
81-
["inferred_freq", "_isnan", "_resolution_obj", "resolution"],
81+
["inferred_freq", "_resolution_obj", "resolution"],
8282
DatetimeLikeArrayMixin,
8383
cache=True,
8484
)

pandas/core/indexes/extension.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,3 +277,7 @@ def astype(self, dtype, copy=True):
277277
# pass copy=False because any copying will be done in the
278278
# _data.astype call above
279279
return Index(new_values, dtype=new_values.dtype, name=self.name, copy=False)
280+
281+
@cache_readonly
282+
def _isnan(self) -> np.ndarray:
283+
return self._data.isna()

pandas/core/indexes/interval.py

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@
3737
is_object_dtype,
3838
is_scalar,
3939
)
40-
from pandas.core.dtypes.missing import isna
4140

4241
from pandas.core.algorithms import take_1d
4342
from pandas.core.arrays.interval import IntervalArray, _interval_shared_docs
@@ -192,9 +191,6 @@ class IntervalIndex(IntervalMixin, ExtensionIndex):
192191
# we would like our indexing holder to defer to us
193192
_defer_to_indexing = True
194193

195-
# Immutable, so we are able to cache computations like isna in '_mask'
196-
_mask = None
197-
198194
_data: IntervalArray
199195
_values: IntervalArray
200196

@@ -342,15 +338,6 @@ def _shallow_copy(
342338
result._cache = self._cache
343339
return result
344340

345-
@cache_readonly
346-
def _isnan(self):
347-
"""
348-
Return a mask indicating if each value is NA.
349-
"""
350-
if self._mask is None:
351-
self._mask = isna(self.left)
352-
return self._mask
353-
354341
@cache_readonly
355342
def _engine(self):
356343
left = self._maybe_convert_i8(self.left)

pandas/core/nanops.py

Lines changed: 70 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -327,7 +327,10 @@ def _na_ok_dtype(dtype: DtypeObj) -> bool:
327327

328328
def _wrap_results(result, dtype: DtypeObj, fill_value=None):
329329
""" wrap our results if needed """
330-
if is_datetime64_any_dtype(dtype):
330+
if result is NaT:
331+
pass
332+
333+
elif is_datetime64_any_dtype(dtype):
331334
if fill_value is None:
332335
# GH#24293
333336
fill_value = iNaT
@@ -498,18 +501,45 @@ def nansum(
498501
>>> nanops.nansum(s)
499502
3.0
500503
"""
504+
orig_values = values
505+
501506
values, mask, dtype, dtype_max, _ = _get_values(
502507
values, skipna, fill_value=0, mask=mask
503508
)
504509
dtype_sum = dtype_max
510+
datetimelike = False
505511
if is_float_dtype(dtype):
506512
dtype_sum = dtype
507513
elif is_timedelta64_dtype(dtype):
514+
datetimelike = True
508515
dtype_sum = np.float64
516+
509517
the_sum = values.sum(axis, dtype=dtype_sum)
510518
the_sum = _maybe_null_out(the_sum, axis, mask, values.shape, min_count=min_count)
511519

512-
return _wrap_results(the_sum, dtype)
520+
the_sum = _wrap_results(the_sum, dtype)
521+
if datetimelike and not skipna:
522+
the_sum = _mask_datetimelike_result(the_sum, axis, mask, orig_values)
523+
return the_sum
524+
525+
526+
def _mask_datetimelike_result(
527+
result: Union[np.ndarray, np.datetime64, np.timedelta64],
528+
axis: Optional[int],
529+
mask: Optional[np.ndarray],
530+
orig_values: np.ndarray,
531+
):
532+
if mask is None:
533+
mask = isna(orig_values)
534+
if isinstance(result, np.ndarray):
535+
# we need to apply the mask
536+
result = result.astype("i8").view(orig_values.dtype)
537+
axis_mask = mask.any(axis=axis)
538+
result[axis_mask] = iNaT
539+
else:
540+
if mask.any():
541+
result = NaT
542+
return result
513543

514544

515545
@disallow(PeriodDtype)
@@ -544,21 +574,25 @@ def nanmean(
544574
>>> nanops.nanmean(s)
545575
1.5
546576
"""
577+
orig_values = values
578+
547579
values, mask, dtype, dtype_max, _ = _get_values(
548580
values, skipna, fill_value=0, mask=mask
549581
)
550582
dtype_sum = dtype_max
551583
dtype_count = np.float64
584+
552585
# not using needs_i8_conversion because that includes period
553-
if (
554-
is_integer_dtype(dtype)
555-
or is_datetime64_any_dtype(dtype)
556-
or is_timedelta64_dtype(dtype)
557-
):
586+
datetimelike = False
587+
if dtype.kind in ["m", "M"]:
588+
datetimelike = True
589+
dtype_sum = np.float64
590+
elif is_integer_dtype(dtype):
558591
dtype_sum = np.float64
559592
elif is_float_dtype(dtype):
560593
dtype_sum = dtype
561594
dtype_count = dtype
595+
562596
count = _get_counts(values.shape, mask, axis, dtype=dtype_count)
563597
the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
564598

@@ -573,7 +607,10 @@ def nanmean(
573607
else:
574608
the_mean = the_sum / count if count > 0 else np.nan
575609

576-
return _wrap_results(the_mean, dtype)
610+
the_mean = _wrap_results(the_mean, dtype)
611+
if datetimelike and not skipna:
612+
the_mean = _mask_datetimelike_result(the_mean, axis, mask, orig_values)
613+
return the_mean
577614

578615

579616
@bottleneck_switch()
@@ -639,16 +676,37 @@ def get_median(x):
639676
# empty set so return nans of shape "everything but the passed axis"
640677
# since "axis" is where the reduction would occur if we had a nonempty
641678
# array
642-
shp = np.array(values.shape)
643-
dims = np.arange(values.ndim)
644-
ret = np.empty(shp[dims != axis])
645-
ret.fill(np.nan)
679+
ret = get_empty_reduction_result(values.shape, axis, np.float_, np.nan)
646680
return _wrap_results(ret, dtype)
647681

648682
# otherwise return a scalar value
649683
return _wrap_results(get_median(values) if notempty else np.nan, dtype)
650684

651685

686+
def get_empty_reduction_result(
687+
shape: Tuple[int, ...], axis: int, dtype: np.dtype, fill_value: Any
688+
) -> np.ndarray:
689+
"""
690+
The result from a reduction on an empty ndarray.
691+
692+
Parameters
693+
----------
694+
shape : Tuple[int]
695+
axis : int
696+
dtype : np.dtype
697+
fill_value : Any
698+
699+
Returns
700+
-------
701+
np.ndarray
702+
"""
703+
shp = np.array(shape)
704+
dims = np.arange(len(shape))
705+
ret = np.empty(shp[dims != axis], dtype=dtype)
706+
ret.fill(fill_value)
707+
return ret
708+
709+
652710
def _get_counts_nanvar(
653711
value_counts: Tuple[int],
654712
mask: Optional[np.ndarray],

0 commit comments

Comments
 (0)