Skip to content

BUG: groupy().nth() throws error on multiple groups, empty result #16064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adbull opened this issue Apr 20, 2017 · 2 comments · Fixed by #16090
Closed

BUG: groupy().nth() throws error on multiple groups, empty result #16064

adbull opened this issue Apr 20, 2017 · 2 comments · Fixed by #16090
Milestone

Comments

@adbull
Copy link
Contributor

adbull commented Apr 20, 2017

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> df = pd.DataFrame(index=[0], columns=['a', 'b', 'c'])
>>> df.groupby('a').nth(10)

Empty DataFrame
Columns: [b, c]
Index: []

>>> df.groupby(['a', 'b']).nth(10)

Traceback (most recent call last):
  File "<ipython-input-3-ae8299c3984e>", line 1, in <module>
    df.groupby(['a', 'b']).nth(10)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py", line 1390, in nth
    return out.sort_index() if self.sort else out
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 3344, in sort_index
    indexer = lexsort_indexer(labels._get_labels_for_sorting(),
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1652, in _get_labels_for_sorting
    for label in self.labels]
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1652, in <listcomp>
    for label in self.labels]
  File "~/anaconda3/lib/python3.5/site-packages/numpy/core/_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity

Problem description

In the current Github version of Pandas, when calling groupby().nth() with multiple grouping columns, an error is raised if the result is empty. This is a regression from version 0.19.2.

Expected Output

Empty DataFrame
Columns: [b, c]
Index: []

Empty DataFrame
Columns: [c]
Index: []

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.8-100.fc24.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: C
LOCALE: None.None

pandas: 0.19.0+829.gb17e286
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
xarray: 0.9.1
IPython: 4.2.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.999
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None

@adbull
Copy link
Contributor Author

adbull commented Apr 20, 2017

Think this just requires:

diff --git a/pandas/core/indexes/multi.py b/pandas/core/indexes/multi.py
index 92baf9d..34b62c5 100644
--- a/pandas/core/indexes/multi.py
+++ b/pandas/core/indexes/multi.py
@@ -1645,11 +1645,9 @@ class MultiIndex(Index):
         """
         from pandas.core.categorical import Categorical
 
-        return [Categorical.from_codes(label,
-                                       np.arange(np.array(label).max() + 1,
-                                                 dtype=label.dtype),
-                                       ordered=True)
-                for label in self.labels]
+        return [Categorical.from_codes(label, np.arange(
+            np.array(label).max() + 1 if len(label) else 0,
+            dtype=label.dtype), ordered=True) for label in self.labels]
 
     def sortlevel(self, level=0, ascending=True, sort_remaining=True):
         """

@jreback
Copy link
Contributor

jreback commented Apr 20, 2017

can u put up a PR with that fix (and test)?

@jreback jreback added this to the 0.20.0 milestone Apr 20, 2017
jreback added a commit that referenced this issue Apr 22, 2017
* TST: separate out groupby/test_nth

* BUG: bug in groupby on empty frame with multi groupers

xref #14784
closes #16064
pcluo pushed a commit to pcluo/pandas that referenced this issue May 22, 2017
)

* TST: separate out groupby/test_nth

* BUG: bug in groupby on empty frame with multi groupers

xref pandas-dev#14784
closes pandas-dev#16064
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants