Skip to content

Commit 953598a

Browse files
committed
merge pulled
2 parents 74e4539 + ccec595 commit 953598a

File tree

16 files changed

+400
-259
lines changed

16 files changed

+400
-259
lines changed

doc/source/basics.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2025,19 +2025,20 @@ object conversion
20252025

20262026
pandas offers various functions to try to force conversion of types from the ``object`` dtype to other types.
20272027
In cases where the data is already of the correct type, but stored in an ``object`` array, the
2028-
:meth:`~DataFrame.infer_objects` and :meth:`~Series.infer_objects` can be used to soft convert
2028+
:meth:`DataFrame.infer_objects` and :meth:`Series.infer_objects` methods can be used to soft convert
20292029
to the correct type.
20302030

20312031
.. ipython:: python
20322032
2033+
import datetime
20332034
df = pd.DataFrame([[1, 2],
20342035
['a', 'b'],
20352036
[datetime.datetime(2016, 3, 2), datetime.datetime(2016, 3, 2)]])
20362037
df = df.T
20372038
df
20382039
df.dtypes
20392040
2040-
Because the data transposed the original inference stored all columns as object, which
2041+
Because the data was transposed the original inference stored all columns as object, which
20412042
``infer_objects`` will correct.
20422043

20432044
.. ipython:: python

doc/source/reshaping.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -265,7 +265,7 @@ the right thing:
265265
Reshaping by Melt
266266
-----------------
267267

268-
The top-level :func:``melt` and :func:`~DataFrame.melt` functions are useful to
268+
The top-level :func:`melt` and :func:`~DataFrame.melt` functions are useful to
269269
massage a DataFrame into a format where one or more columns are identifier variables,
270270
while all other columns, considered measured variables, are "unpivoted" to the
271271
row axis, leaving just two non-identifier columns, "variable" and "value". The

doc/source/timeseries.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1093,9 +1093,9 @@ frequencies. We will refer to these aliases as *offset aliases*
10931093
"QS", "quarter start frequency"
10941094
"BQS", "business quarter start frequency"
10951095
"A, Y", "year end frequency"
1096-
"BA", "business year end frequency"
1096+
"BA, BY", "business year end frequency"
10971097
"AS, YS", "year start frequency"
1098-
"BAS", "business year start frequency"
1098+
"BAS, BYS", "business year start frequency"
10991099
"BH", "business hour frequency"
11001100
"H", "hourly frequency"
11011101
"T, min", "minutely frequency"

doc/source/whatsnew/v0.21.0.txt

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,13 @@ New features
3131
``infer_objects`` type conversion
3232
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3333

34-
The `:meth:`~DataFrame.infer_objects` and :meth:`~Series.infer_objects`
34+
The :meth:`DataFrame.infer_objects` and :meth:`Series.infer_objects`
3535
methods have been added to perform dtype inference on object columns, replacing
3636
some of the functionality of the deprecated ``convert_objects``
3737
method. See the documentation :ref:`here <basics.object_conversion>`
3838
for more details. (:issue:`11221`)
3939

40-
This function only performs soft conversions on object columns, converting Python objects
40+
This method only performs soft conversions on object columns, converting Python objects
4141
to native types, but not any coercive conversions. For example:
4242

4343
.. ipython:: python
@@ -46,11 +46,12 @@ to native types, but not any coercive conversions. For example:
4646
'B': np.array([1, 2, 3], dtype='object'),
4747
'C': ['1', '2', '3']})
4848
df.dtypes
49-
df.infer_objects().dtype
49+
df.infer_objects().dtypes
5050

5151
Note that column ``'C'`` was not converted - only scalar numeric types
5252
will be inferred to a new type. Other types of conversion should be accomplished
53-
using :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedelta`).
53+
using the :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedelta`).
54+
5455
.. ipython:: python
5556

5657
df = df.infer_objects()
@@ -218,7 +219,7 @@ Groupby/Resample/Rolling
218219

219220
Sparse
220221
^^^^^^
221-
222+
- Bug in ``SparseSeries`` raises ``AttributeError`` when a dictionary is passed in as data (:issue:`16777`)
222223

223224

224225
Reshaping

pandas/core/algorithms.py

Lines changed: 1 addition & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@
3030
from pandas.core.dtypes.missing import isnull
3131

3232
from pandas.core import common as com
33-
from pandas.compat import string_types
3433
from pandas._libs import algos, lib, hashtable as htable
3534
from pandas._libs.tslib import iNaT
3635

@@ -431,104 +430,6 @@ def isin(comps, values):
431430
return f(comps, values)
432431

433432

434-
def safe_sort(values, labels=None, na_sentinel=-1, assume_unique=False):
435-
"""
436-
Sort ``values`` and reorder corresponding ``labels``.
437-
``values`` should be unique if ``labels`` is not None.
438-
Safe for use with mixed types (int, str), orders ints before strs.
439-
440-
.. versionadded:: 0.19.0
441-
442-
Parameters
443-
----------
444-
values : list-like
445-
Sequence; must be unique if ``labels`` is not None.
446-
labels : list_like
447-
Indices to ``values``. All out of bound indices are treated as
448-
"not found" and will be masked with ``na_sentinel``.
449-
na_sentinel : int, default -1
450-
Value in ``labels`` to mark "not found".
451-
Ignored when ``labels`` is None.
452-
assume_unique : bool, default False
453-
When True, ``values`` are assumed to be unique, which can speed up
454-
the calculation. Ignored when ``labels`` is None.
455-
456-
Returns
457-
-------
458-
ordered : ndarray
459-
Sorted ``values``
460-
new_labels : ndarray
461-
Reordered ``labels``; returned when ``labels`` is not None.
462-
463-
Raises
464-
------
465-
TypeError
466-
* If ``values`` is not list-like or if ``labels`` is neither None
467-
nor list-like
468-
* If ``values`` cannot be sorted
469-
ValueError
470-
* If ``labels`` is not None and ``values`` contain duplicates.
471-
"""
472-
if not is_list_like(values):
473-
raise TypeError("Only list-like objects are allowed to be passed to"
474-
"safe_sort as values")
475-
values = np.asarray(values)
476-
477-
def sort_mixed(values):
478-
# order ints before strings, safe in py3
479-
str_pos = np.array([isinstance(x, string_types) for x in values],
480-
dtype=bool)
481-
nums = np.sort(values[~str_pos])
482-
strs = np.sort(values[str_pos])
483-
return _ensure_object(np.concatenate([nums, strs]))
484-
485-
sorter = None
486-
if compat.PY3 and lib.infer_dtype(values) == 'mixed-integer':
487-
# unorderable in py3 if mixed str/int
488-
ordered = sort_mixed(values)
489-
else:
490-
try:
491-
sorter = values.argsort()
492-
ordered = values.take(sorter)
493-
except TypeError:
494-
# try this anyway
495-
ordered = sort_mixed(values)
496-
497-
# labels:
498-
499-
if labels is None:
500-
return ordered
501-
502-
if not is_list_like(labels):
503-
raise TypeError("Only list-like objects or None are allowed to be"
504-
"passed to safe_sort as labels")
505-
labels = _ensure_platform_int(np.asarray(labels))
506-
507-
from pandas import Index
508-
if not assume_unique and not Index(values).is_unique:
509-
raise ValueError("values should be unique if labels is not None")
510-
511-
if sorter is None:
512-
# mixed types
513-
(hash_klass, _), values = _get_data_algo(values, _hashtables)
514-
t = hash_klass(len(values))
515-
t.map_locations(values)
516-
sorter = _ensure_platform_int(t.lookup(ordered))
517-
518-
reverse_indexer = np.empty(len(sorter), dtype=np.int_)
519-
reverse_indexer.put(sorter, np.arange(len(sorter)))
520-
521-
mask = (labels < -len(values)) | (labels >= len(values)) | \
522-
(labels == na_sentinel)
523-
524-
# (Out of bound indices will be masked with `na_sentinel` next, so we may
525-
# deal with them here without performance loss using `mode='wrap'`.)
526-
new_labels = reverse_indexer.take(labels, mode='wrap')
527-
np.putmask(new_labels, mask, na_sentinel)
528-
529-
return ordered, _ensure_platform_int(new_labels)
530-
531-
532433
def factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None):
533434
"""
534435
Encode input values as an enumerated type or categorical variable
@@ -568,6 +469,7 @@ def factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None):
568469
uniques = uniques.to_array()
569470

570471
if sort and len(uniques) > 0:
472+
from pandas.core.sorting import safe_sort
571473
uniques, labels = safe_sort(uniques, labels, na_sentinel=na_sentinel,
572474
assume_unique=True)
573475

0 commit comments

Comments
 (0)