BUG?: getitem with a MultiIndex returns a Series only when the lower level is "" #50805

rhshadrach · 2023-01-18T00:07:24Z

columns1 = pd.MultiIndex.from_tuples([("a", "a2"), ("b", "c")])
df1 = pd.DataFrame([[1, 2]], columns=columns1)
print(df1["a"])
#    a2
# 0   1

columns2 = pd.MultiIndex.from_tuples([("a", ""), ("b", "c")])
df2 = pd.DataFrame([[1, 2]], columns=columns2)
print(df2["a"])
# 0    1
# Name: a, dtype: int64

The first case produces a DataFrame, whereas the second case produces a Series. I don't think this is intentional. This gives rise to a difference in DataFrameGroupBy._selected_obj and DataFrameGroupBy._obj_with_exclusions which can lead to erroneous results (#50804 is one example).

Currently, df2.groupby("a") is allowed whereas df1.groupby("a") raises. So returning a DataFrame in the 2nd case will resolve the groupby inconsistency as well.

One can do df1.groupby(("a", "a2")) successfully, so I don't think there is a worry about making certain ops not possible.

cc @phofl for any thoughts

The text was updated successfully, but these errors were encountered:

phofl · 2023-01-18T09:04:55Z

Not really any thoughts, but we had an issue about this as well. I looked into why this was done the way it's done and it has been around forever and is tested explicitly

rhshadrach · 2023-01-19T22:12:13Z

I think this could be supported in groupby with just a few lines, but I do find it somewhat odd behavior. I'm guessing an empty string enables having some column behave as if they are multiindexed whereas other columns not (or just with fewer levels). I didn't see it mentioned anywhere in the docs, but could have missed it.

I'd support removing this special case behavior, but not going to push for it. It seems like added complexity that doesn't offer much in the way of benefits (but maybe it does and I just don't see it).

rhshadrach · 2023-02-01T22:56:18Z

The recommendation of using _obj_with_exclusions instead of _selected_obj in #46944 (comment) has the added benefit of handling this behavior without any other change to groupby.

rhshadrach added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jan 18, 2023

rhshadrach mentioned this issue Jan 19, 2023

Remove obj_with_exclusions #50878

Closed

5 tasks

rhshadrach closed this as completed Feb 1, 2023

rhshadrach mentioned this issue Feb 16, 2023

ENH: Improve ref-tracking for group keys #51442

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG?: getitem with a MultiIndex returns a Series only when the lower level is "" #50805

BUG?: getitem with a MultiIndex returns a Series only when the lower level is "" #50805

rhshadrach commented Jan 18, 2023 •

edited

Loading

phofl commented Jan 18, 2023 •

edited

Loading

Uh oh!

rhshadrach commented Jan 19, 2023 •

edited

Loading

Uh oh!

rhshadrach commented Feb 1, 2023

Uh oh!

Uh oh!

BUG?: getitem with a MultiIndex returns a Series only when the lower level is "" #50805

BUG?: getitem with a MultiIndex returns a Series only when the lower level is "" #50805

Comments

rhshadrach commented Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

phofl commented Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhshadrach commented Jan 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhshadrach commented Feb 1, 2023

Uh oh!

rhshadrach commented Jan 18, 2023 •

edited

Loading

phofl commented Jan 18, 2023 •

edited

Loading

rhshadrach commented Jan 19, 2023 •

edited

Loading