You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first case produces a DataFrame, whereas the second case produces a Series. I don't think this is intentional. This gives rise to a difference in DataFrameGroupBy._selected_obj and DataFrameGroupBy._obj_with_exclusions which can lead to erroneous results (#50804 is one example).
Currently, df2.groupby("a") is allowed whereas df1.groupby("a") raises. So returning a DataFrame in the 2nd case will resolve the groupby inconsistency as well.
One can do df1.groupby(("a", "a2")) successfully, so I don't think there is a worry about making certain ops not possible.
Not really any thoughts, but we had an issue about this as well. I looked into why this was done the way it's done and it has been around forever and is tested explicitly
I think this could be supported in groupby with just a few lines, but I do find it somewhat odd behavior. I'm guessing an empty string enables having some column behave as if they are multiindexed whereas other columns not (or just with fewer levels). I didn't see it mentioned anywhere in the docs, but could have missed it.
I'd support removing this special case behavior, but not going to push for it. It seems like added complexity that doesn't offer much in the way of benefits (but maybe it does and I just don't see it).
The recommendation of using _obj_with_exclusions instead of _selected_obj in #46944 (comment) has the added benefit of handling this behavior without any other change to groupby.
Uh oh!
There was an error while loading. Please reload this page.
xref #46944
The first case produces a DataFrame, whereas the second case produces a Series. I don't think this is intentional. This gives rise to a difference in
DataFrameGroupBy._selected_obj
andDataFrameGroupBy._obj_with_exclusions
which can lead to erroneous results (#50804 is one example).Currently,
df2.groupby("a")
is allowed whereasdf1.groupby("a")
raises. So returning a DataFrame in the 2nd case will resolve the groupby inconsistency as well.One can do
df1.groupby(("a", "a2"))
successfully, so I don't think there is a worry about making certain ops not possible.cc @phofl for any thoughts
The text was updated successfully, but these errors were encountered: