BUG: idxmin & idxmax axis = 1 str reducer for transform #50329

iofall · 2022-12-17T22:08:13Z

Addresses idxmin for BUG: Some str reducers fail in Groupby.transform with axis=1 #45986
Tests added and passed if fixing a bug or adding a new feature.
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

To Do:

Fix failing test_api_consistency due to changed signatures.
Fix _shared_docs for idxmin to reflect the changes in the signature.

Description

.transform("idxmin") & .transform("idxmax") now works with axis=1 groupby objects.
The default axis=0 argument is ignored. Instead we directly use the grouper's axis.
Note that this behavior is different from a generic idxmin call which still respects the axis argument.

rhshadrach

Nice fix!

As it's somewhat relevant here, I'll mention that I think we should deprecate the axis argument across most (all?) groupby methods where it is not useful, but this is to be done separately.

rhshadrach · 2022-12-18T11:13:11Z

pandas/core/groupby/generic.py

-        axis: Axis = 0,
-        skipna: bool = True,
-        numeric_only: bool = False,
+        self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False, **kwargs


I don't think we should be adding kwargs to the function signature. Whenever self.axis=1 and axis = 0 in idxmin we will raise (whether using with transform or not), so I'd suggest instead just ignoring the axis argument altogether in this case.

Assuming this sounds like a good route, can you change the axis argument in the docstring to state that it is ignored. Right now we're using the shared_docs; I'd recommend copying this and adding it as an explicit docstring to this method - that would then allow us in the future to add groupby-specific docs for this method and correct some difference in behavior between DataFrame and DataFrameGroupBy.

Thank you for the feedback, it does sound like a good route. As for the axis parameter, it does not seem intuitive to raise when we are stating in the docs that the argument will be ignored. We could however, instead of ignoring axis, just mention in the docs that axis and self.axis should match. If they don't match, we will raise with the message - they should match.

As for the axis parameter, it does not seem intuitive to raise when we are stating in the docs that the argument will be ignored.

I think this is in reply to my sentence Whenever self.axis=1 and axis = 0 in idxmin we will raise (whether using with transform or not), so I'd suggest.... When I said "we will raise", I mean "currently pandas raises". In other words, I think it's okay to ignore the axis argument because currently no one can be using .groupby(..., axis=1).idxmin(axis=0).

We could however, instead of ignoring axis, just mention in the docs that axis and self.axis should match. If they don't match, we will raise with the message - they should match.

I agree this is a viable route, but I think a bit odd. This is perhaps an over simplification, but it's like having a function

def foo(x=1): if x != 2: raise ValueError("You must set x to be 2") ...

When I said "we will raise", I mean "currently pandas raises". In other words, I think it's okay to ignore the axis argument because currently no one can be using .groupby(..., axis=1).idxmin(axis=0).

Okay now I understand what you mean. For now we will ignore axis, in the future remove it and directly take axis from the grouper.

iofall · 2022-12-23T06:20:26Z

I have implemented the required changes in the latest commit.

We ignore the axis parameter, we take it directly based on the self.axis and have mentioned the same in the docs.

iofall · 2022-12-23T06:32:34Z

pandas/core/groupby/generic.py

+        >>> df = pd.DataFrame({{'consumption': [10.51, 103.11, 55.48],
+        ...                    'co2_emissions': [37.2, 19.66, 1712]}},
+        ...                    index=['Pork', 'Wheat Products', 'Beef'])


The doctest appears to be failing because of the {{. I think keeping a single brace should fix it, but not sure if we need it as an escape character.

I believe the previous docstring had two {{ due to its use as a template; agreed there should only be one.

rhshadrach

Looks good, small requests below. Also, would you be willing to do idxmax here as well?

Also needs a whatsnew entry in 2.0.0 under the bugfix section within groupby.

rhshadrach · 2022-12-23T14:07:32Z

pandas/core/groupby/generic.py

+            The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.
+            The axis argument is ignored, instead we use the grouper's axis.
+
+            .. versionchanged:: 1.5.3


2.0.0 here; in generally only regression fixes make it to patch versions.

rhshadrach · 2022-12-23T14:08:35Z

pandas/core/groupby/generic.py

+        dtype: object
+        """
+
+        axis = self.axis
        axis = DataFrame._get_axis_number(axis)


L1958 here is now unnecessary, as self.axis is always 0 or 1.

Yes, makes sense

iofall · 2022-12-27T18:08:45Z

Looks good, small requests below. Also, would you be willing to do idxmax here as well?

Sure, I will do that.

pandas/core/groupby/generic.py

rhshadrach · 2022-12-30T15:17:09Z

@iofall - I think this looks good, just need to resolve a whatsnew conflict

rhshadrach · 2022-12-30T19:00:59Z

doc/source/whatsnew/v2.0.0.rst

@@ -944,7 +944,7 @@ Groupby/resample/rolling
 - Bug in :meth:`.SeriesGroupBy.nunique` would incorrectly raise when the grouper was an empty categorical and ``observed=True`` (:issue:`21334`)
 - Bug in :meth:`.SeriesGroupBy.nth` would raise when grouper contained NA values after subsetting from a :class:`DataFrameGroupBy` (:issue:`26454`)
 - Bug in :meth:`DataFrame.groupby` would not include a :class:`.Grouper` specified by ``key`` in the result when ``as_index=False`` (:issue:`50413`)
-
+- Bug in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` would raise incorrectly when grouper had ``axis=1`` for ``"idxmin"`` and ``"idxmax"`` arguments (:issue:`45986`). The method now ignores the ``axis`` argument, instead we use the grouper's ``axis``.


Can you make this one sentence, no period, ending with (:issue:`45986`) as the other lines do. I think it's okay to leave off the last sentence.

iofall · 2022-12-31T18:49:05Z

I looked at the failing test, we might need to remove / rewrite it, since it depends on supplying axis = 1 to idxmax which we are ignoring.

pandas/pandas/tests/groupby/test_function.py

Lines 509 to 511 in a28cadb

    
           gb = df.groupby("A") 
        
           res = gb.idxmax(axis=1)

Here, we groupby on line 509, this happens with axis = 0, so we do get some groups. But if we change the same line and put axis = 1, we don't get any groups.

>>> gb = df.groupby("A")
>>> gb.groups
{1: [0, 3, 6], 2: [1, 4, 7], 3: [2, 5, 8], 4: [9]}

>>> gb = df.groupby("A", axis = 1)
>>> gb.groups
{}

Thus the res variable comes out as an empty DataFrame, failing the test.

I am not sure how to proceed but we can try:

The same thing we had done before, honor the axis argument if given otherwise use self.axis
Change the test to check for a different result that does not rely on groups across different axes.

rhshadrach · 2023-01-02T02:26:11Z

I was under the impression groupby(..., axis=0).idxmin(axis=1) would raise, but it looks like this is not the case.

The same thing we had done before, honor the axis argument if given otherwise use self.axis

+1 on this

iofall · 2023-01-02T15:34:34Z

I was under the impression groupby(..., axis=0).idxmin(axis=1) would raise, but it looks like this is not the case.

Sorry I did not convey this properly, it does raise but only when the number of rows and columns mismatch in the issue's minimal example.

# raises
df = pd.DataFrame({"a": [1, 2,], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 1, 1], axis=0).transform("idxmin", axis = 1)

# passes
df = pd.DataFrame({"a": [1, 2, 3], "b": [3, 4, 5], "c": [5, 6, 7]}, index=["x", "y", "z"])
result = df.groupby([0, 1, 1], axis=0).transform("idxmin", axis = 1)

+1 on this

What do you recommend?

Put axis in kwargs, that way we can check if it exists in the kwargs and handle accordingly.
Remove the default axis = 0 and make it a required argument.
Or make axis = None as the default. If axis is None, we use self.axis, else use whatever value supplied by the user.

rhshadrach · 2023-01-03T21:24:01Z

@iofall - The code

df = pd.DataFrame({"a": [1, 2,], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 1, 1], axis=0)

raises because you have two rows to group and you're specifying groups for three rows.

Or make axis = None as the default. If axis is None, we use self.axis, else use whatever value supplied by the user.

This sounds like a great way forward.

iofall · 2023-01-04T17:52:03Z

raises because you have two rows to group and you're specifying groups for three rows.

Sorry I meant even this raises -

df = pd.DataFrame({"a": [1, 2,], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 1, 1], axis=1).transform("idxmin", axis=0)

iofall · 2023-01-04T19:53:47Z

I think we have the right changes now, we have completed what the issue had asked for:

This example now works and all the tests are passing as well.

df = DataFrame({"a": [1, 2], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 0, 1], axis=1).transform("first")

However, examples like these still raise

df = pd.DataFrame({"a": [1, 2,], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 1, 1], axis=1).transform("idxmin", axis=0)

Although this should be included in another issue or PR, but here is some investigation

pandas/pandas/core/groupby/groupby.py

Lines 1070 to 1076 in 209de28

    
           def reset_identity(values): 
        
               # reset the identities of the components 
        
               # of the values to prevent aliasing 
        
               for v in com.not_none(*values): 
        
                   ax = v._get_axis(self.axis) 
        
                   ax._reset_identity() 
        
               return values

Specifically for idxmin and idxmax, when the indexes mismatch, we end up at this function, which tries to reset the axis for each of the vs in the values that are obtained after grouping. However Series objects don't have axis which results in the error we see in the above idxmin call.

If we put a conditional such that axis is not checked when v is a Series, we do get some output shown below:

     a    b    c
a    x  NaN  NaN
b  NaN    x    x
c  NaN    x    x

This might help solve this issue.

rhshadrach · 2023-01-04T21:33:30Z

Thanks for the update, will take a look in the next day or so. Just wanted to give my thoughts here:

Although this should be included in another issue or PR, but here is some investigation

I think the axis argument should be deprecated entirely; see #50405.

rhshadrach

lgtm

rhshadrach · 2023-01-05T03:16:44Z

pandas/core/groupby/generic.py

    def idxmax(
        self,
-        axis: Axis = 0,
+        axis: Axis = None,


This is causing mypy to fail on the CI. Need to type-hint as Axis | None. Same for idxmin.

Thanks, I have changed it

rhshadrach

lgtm; failure is unrelated (SQL test)

rhshadrach · 2023-01-05T22:17:58Z

Thanks @iofall - very nice!

iofall · 2023-01-08T13:58:36Z

Thanks @iofall - very nice!

Thanks! It was fun 🎉

iofall added 2 commits December 18, 2022 03:03

Fix axis=1 behaviour for str reducer idxmin

9751f6d

Add DataFrame._get_axis_number

9150581

rhshadrach added Bug Groupby Apply Apply, Aggregate, Transform, Map labels Dec 18, 2022

rhshadrach requested changes Dec 18, 2022

View reviewed changes

rhshadrach changed the title ~~FIX idxmin axis = 1 str reducer for transform~~ BUG: idxmin axis = 1 str reducer for transform Dec 18, 2022

Ignore axis argument for groupby idxmin

78c0977

iofall commented Dec 23, 2022

View reviewed changes

Fix doctest

63bcc5f

rhshadrach requested changes Dec 23, 2022

View reviewed changes

Fix idxmax axis=1 failure

b2e6567

iofall commented Dec 29, 2022

View reviewed changes

pandas/core/groupby/generic.py Show resolved Hide resolved

Fix merge conflict for whatsnew

4ac6d16

iofall changed the title ~~BUG: idxmin axis = 1 str reducer for transform~~ BUG: idxmin & idxmax axis = 1 str reducer for transform Dec 30, 2022

rhshadrach requested changes Dec 30, 2022

View reviewed changes

Modify whatsnew entry

564c6fc

iofall added 2 commits January 5, 2023 01:01

Default axis to None

41bcc9a

Fix merge conflict whatsnew

e615991

rhshadrach approved these changes Jan 5, 2023

View reviewed changes

rhshadrach added this to the 2.0 milestone Jan 5, 2023

rhshadrach requested changes Jan 5, 2023

View reviewed changes

Change axis typehint for idxmax and idxmin

62f2a60

rhshadrach approved these changes Jan 5, 2023

View reviewed changes

rhshadrach merged commit 3b26841 into pandas-dev:main Jan 5, 2023

Uh oh!

BUG: idxmin & idxmax axis = 1 str reducer for transform #50329

BUG: idxmin & idxmax axis = 1 str reducer for transform #50329

Uh oh!

Conversation

iofall commented Dec 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach Dec 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iofall commented Dec 23, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach Dec 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iofall commented Dec 27, 2022

Uh oh!

Uh oh!

rhshadrach commented Dec 30, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iofall commented Dec 31, 2022

Uh oh!

rhshadrach commented Jan 2, 2023

Uh oh!

iofall commented Jan 2, 2023

Uh oh!

rhshadrach commented Jan 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iofall commented Jan 4, 2023

Uh oh!

iofall commented Jan 4, 2023

Uh oh!

rhshadrach commented Jan 4, 2023

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach commented Jan 5, 2023

Uh oh!

iofall commented Jan 8, 2023

Uh oh!

Uh oh!

iofall commented Dec 17, 2022 •

edited

Loading

rhshadrach Dec 18, 2022 •

edited

Loading

rhshadrach left a comment •

edited

Loading

rhshadrach Dec 23, 2022 •

edited

Loading

rhshadrach commented Jan 3, 2023 •

edited

Loading