Skip to content

df.groupby(...).transform(func) breaks when func renames input df #23461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DiegoAlbertoTorres opened this issue Nov 2, 2018 · 0 comments · Fixed by #23463
Closed

df.groupby(...).transform(func) breaks when func renames input df #23461

DiegoAlbertoTorres opened this issue Nov 2, 2018 · 0 comments · Fixed by #23463
Milestone

Comments

@DiegoAlbertoTorres
Copy link
Contributor

Calling df.groupby(...).transform(func) breaks when func is capable of taking in a pd.DataFrame but renames its columns.

Code sample

def demean_rename(x):
    result = x - x.mean()

    if isinstance(x, pd.Series):
        return result

    result = result.rename(columns=
        {c: '{}_demeaned'.format(c) for c in result.columns})

    return result

df = pd.DataFrame({'group': list('ababa'),
                   'value': [1, 1, 1, 2, 2]})
expected = pd.DataFrame({'value': [-1./3., -0.5, -1./3., 0.5, 2./3.]})

result = df.groupby('group').transform(demean_rename)
tm.assert_frame_equal(result, expected)
# Instead, this prints:
# E   DataFrame.iloc[:, 0] values are different (40.0 %)
# E   [left]:  [-0.33333333333333326, nan, -0.33333333333333326, nan, 0.6666666666666667]

Problem description

The current behavior gives wrong results (with nan) for everything except the first group. This happens even when func can be called successfully with each column and not return any nans.

This problem is present in master. I already have the fix: there is a bug in how results from the slow-path and the fast-path are joined in df.groupby(...).transform(func). I will be putting up a PR very soon with the fix.

DiegoAlbertoTorres pushed a commit to DiegoAlbertoTorres/pandas that referenced this issue Nov 2, 2018
DiegoAlbertoTorres pushed a commit to DiegoAlbertoTorres/pandas that referenced this issue Nov 2, 2018
@jreback jreback added this to the 0.24.0 milestone Nov 3, 2018
DiegoAlbertoTorres pushed a commit to DiegoAlbertoTorres/pandas that referenced this issue Nov 5, 2018
DiegoAlbertoTorres pushed a commit to DiegoAlbertoTorres/pandas that referenced this issue Nov 5, 2018
DiegoAlbertoTorres pushed a commit to DiegoAlbertoTorres/pandas that referenced this issue Nov 5, 2018
DiegoAlbertoTorres pushed a commit to DiegoAlbertoTorres/pandas that referenced this issue Nov 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants