Skip to content

ENH:column-wise DataFrame.fillna and duplicated DataFrame.fillna with Series and Dict #30922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

proost
Copy link
Contributor

@proost proost commented Jan 11, 2020

description:
Access "DataFrame" each column, fills up NA values using "Series.fillna"
and what i found is if dataframe is duplicated, ".fillna" not guarantee filling NA. So i change them can fill NA.

Q. Why access column base not index base?
A. problem of Index base f".illna" is not preserve column's dtype. so i use column base.

Q. Why assign in column new values not use inplace=True ?
A. To avoid chained indexing. If column and index are both duplicated, chained indexing happens, i want to avoid this situation.

@proost proost force-pushed the enh-column-wise-fillna branch from 2aa3738 to cc7037e Compare January 12, 2020 15:14
"with dict/Series column "
"by column"
)
for label in self.columns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing by label like typically causes issues when users have duplicate label names - have you tested that case by chance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd
I didn't. Thanks for pointing out. i change the logic and add test cases.

@gfyoung gfyoung added Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jan 15, 2020
@proost proost force-pushed the enh-column-wise-fillna branch from 1b4a89e to ad97507 Compare January 19, 2020 12:14
@proost proost changed the title ENH:column-wise DataFrame.fillna with Series and Dict ENH:column-wise DataFrame.fillna and duplicated DataFrame.fillna with Series and Dict Jan 19, 2020
@proost proost force-pushed the enh-column-wise-fillna branch 4 times, most recently from 6f79400 to 6060212 Compare January 24, 2020 14:35
[[100, 100, 3], [100, 5, 100], [7, 200, 200]],
columns=list("ABB"),
index=[0, 0, 1],
dtype="float64",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be integers no?

Copy link
Contributor Author

@proost proost Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd
There may be disagreement, what i intended is keeping data type same before. Although column's type in df is float because of "np.nan", anyway each column's in df is float. So if "fillna" changes data type, it means "fillna" fills NA and also changes data type. I think this is not right and if user want to change data types, then it is user's share.
Nevertheless if you think this must be integers, i will change it.
For that reason, "downcast" parameter is.

@proost proost force-pushed the enh-column-wise-fillna branch from 6bc8fe2 to 8a25980 Compare February 21, 2020 13:53
@pep8speaks
Copy link

pep8speaks commented Feb 21, 2020

Hello @proost! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-10 17:20:16 UTC

@proost proost force-pushed the enh-column-wise-fillna branch 2 times, most recently from fde8087 to 8346780 Compare February 23, 2020 14:28
proost added a commit to proost/pandas that referenced this pull request Feb 23, 2020
@proost proost force-pushed the enh-column-wise-fillna branch from 8346780 to ec012e9 Compare February 23, 2020 14:46

if axis == 1:
result = result.T

return result if not inplace else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make any sense if we have transposed

rather do

self._update_inplace(result._data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
Okay i change logic. but two ".T" operations need. because what i try to do is axis==1 and axis==0 both can handle. current ".fillna" can't handle duplicated columns.

# disable this for now
with pytest.raises(NotImplementedError, match="column by column"):
df.fillna(df.max(1), axis=1)
expected = DataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a new test

Copy link
Contributor Author

@proost proost Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
I'm confused. you mean

  1. just delete those part of codes which fails to pass test and make a new test
  2. change this "test_fillna_dict_series" as a new test

give me more specific what you wrote.

proost added a commit to proost/pandas that referenced this pull request Feb 26, 2020
@proost proost force-pushed the enh-column-wise-fillna branch from ec012e9 to 258659f Compare February 26, 2020 17:09
proost added a commit to proost/pandas that referenced this pull request Feb 26, 2020
@proost proost force-pushed the enh-column-wise-fillna branch from 258659f to a3245f9 Compare February 26, 2020 17:12
@@ -6005,20 +6005,25 @@ def fillna(
)

elif isinstance(value, (dict, ABCSeries)):
new_data = self.copy()
Copy link
Member

@WillAyd WillAyd Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy / transpose are both potentially very expensive operations - is there a way to do this without requiring both of those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd
Okay. copy conditional. remove transpose.

obj = result[k]
obj.fillna(v, limit=limit, inplace=True, downcast=downcast)
return result if not inplace else None
new_data.iloc[:, i] = new_data.iloc[:, i].fillna(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather build up a list of the results here in a list comprehension

proost added a commit to proost/pandas that referenced this pull request Mar 18, 2020
@proost proost force-pushed the enh-column-wise-fillna branch from a3245f9 to 621d2a7 Compare March 18, 2020 13:54
proost added a commit to proost/pandas that referenced this pull request Apr 8, 2020
@proost proost force-pushed the enh-column-wise-fillna branch from 5e5d272 to 36bfd78 Compare April 8, 2020 15:40
proost added a commit to proost/pandas that referenced this pull request Apr 12, 2020
@proost proost force-pushed the enh-column-wise-fillna branch 2 times, most recently from 387ca34 to 3763e31 Compare April 14, 2020 03:19
proost added a commit to proost/pandas that referenced this pull request Apr 17, 2020
@proost proost force-pushed the enh-column-wise-fillna branch from f66afef to c434366 Compare April 17, 2020 10:36
proost added a commit to proost/pandas that referenced this pull request Apr 17, 2020
@proost proost force-pushed the enh-column-wise-fillna branch from 0c64e73 to 5b3c363 Compare April 17, 2020 17:03
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls merge master as well

"by column"
)
for i, item in enumerate(temp_data.items()):
label, content = item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make sense with the axis here; you are updating the same column whether axis==0 or 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
Yes. but, filled value is different whether axis==0 or 1. And 'downcast' works properly when execute column-based.

proost added a commit to proost/pandas that referenced this pull request Jun 17, 2020
@proost proost force-pushed the enh-column-wise-fillna branch from de6d4f7 to 587f619 Compare June 17, 2020 11:48
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you merge master and move the release note

@proost proost force-pushed the enh-column-wise-fillna branch from 587f619 to 38f3657 Compare August 10, 2020 17:20
@WillAyd
Copy link
Member

WillAyd commented Sep 10, 2020

@proost can you fix merge conflict and try to fix CI failure?

@arw2019
Copy link
Member

arw2019 commented Nov 21, 2020

@proost is this still active? If yes can you merge master and fix the CI failure?

@arw2019
Copy link
Member

arw2019 commented Dec 8, 2020

Closing in favor of #38352

@arw2019 arw2019 closed this Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

column-wise fillna with Series/dict NotImplemented
7 participants