Skip to content

BUG: Series.map is ignoring the na_action keyword #46588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
shortorian opened this issue Mar 31, 2022 · 9 comments · Fixed by #46860
Closed
2 of 3 tasks

BUG: Series.map is ignoring the na_action keyword #46588

shortorian opened this issue Mar 31, 2022 · 9 comments · Fixed by #46860
Assignees
Milestone

Comments

@shortorian
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
a = pd.Series([pd.NA, 1, 1, pd.NA])
a.map({1:2, pd.NA:3}, na_action=True)

Issue Description

This code should throw an error because na_action is invalid, but instead I've gotten a Jupyter notebook, ipython in Powershell, and an online interpreter to generate the following output

0    3
1    2
2    2
3    3
dtype: int64

I get the same output for na_action='ignore' and na_action=None

Expected Behavior

ValueError: na_action must either be 'ignore' or None, True was passed

Installed Versions

INSTALLED VERSIONS ------------------ commit : 06d2301 python : 3.9.12.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 141 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252

pandas : 1.4.1
numpy : 1.22.3
pytz : 2021.3
dateutil : 2.8.2
pip : 22.0.4
setuptools : 58.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.30.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.4.28
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@shortorian shortorian added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 31, 2022
@mctessi
Copy link

mctessi commented Mar 31, 2022

take

@mctessi
Copy link

mctessi commented Mar 31, 2022

@shortorian
Can confirm this behavior! The code path taken in this example is not checking for ValueError in na_action and ignores the value completely. I will add a na_action check and try to rectify the behavior regarding the na_action value!

@mctessi
Copy link

mctessi commented Apr 1, 2022

@shortorian
And thanks slot for bringing up the issue! I will file a PR in the next couple of days...
Regards and happy coding

@simonjayhawkins
Copy link
Member

Thanks @shortorian for the report.

Expected Behavior

ValueError: na_action must either be 'ignore' or None, True was passed

The allowed values are in the api documentation but not adverse to adding some run-time validation.

Note that pd.Series([None, 1, 1, None], dtype=object).map({1: 2, None: 3}, na_action="ignore") and pd.Series([None, 1, 1, None]).map({1: 2, np.nan: 3}, na_action="ignore") also replaces so the issue also occurs with None and np.nan as well as pd.NA

from the documentation

na_action{None, ‘ignore’}, default None
If ‘ignore’, propagate NaN values, without passing them to the mapping correspondence.

we would perhaps expect this to work with at least the np.nan case but I think the intent of the na_action keyword is a convenience to allow the user to pass functions that don't handle missing values rather than to ignore mappings explicitly given. (My assumption here could be wrong, so welcome the opinions of others)

So IMO I think perhaps the documentation could also be updated to explain that the na_action keyword only applies if arg is a callable, and perhaps validation for this too.

@simonjayhawkins simonjayhawkins added Docs and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Apr 5, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Apr 5, 2022
@shortorian
Copy link
Author

That's interesting, I would never have interpreted the documentation that way ("mapping correspondence" seemed to specifically indicate something more expansive than a function to me). I ran into this because I expected to be able to use na_action='ignore' to get output that doesn't have NaNs in it, so for

a = pd.Series([pd.NA, 1, 1, pd.NA], na_action='ignore')
a.map({1:2})

I wanted output

1    2
2    2
dtype: int64

I realize now that I misunderstood what was going on - Series.map shouldn't have a return value that's a different length than the series. I guess I should do Series.map().dropna() instead.

I would definitely appreciate a rewrite on the documentation indicating this is only applicable for callable arg. Might be overkill, but would it make sense to raise an exception if a user passes na_action when arg is not callable?

@simonjayhawkins
Copy link
Member

That's interesting, I would never have interpreted the documentation that way ("mapping correspondence" seemed to specifically indicate something more expansive than a function to me)

agreed, but looking some more it appears that the docstring for the lower level function _map_values is

        na_action : {None, 'ignore'}
            If 'ignore', propagate NA values, without passing them to the
            mapping function

and the code there does not handle na_action for the dict-like and Series mapper types, so looking more likely that the docstring for Series.map needs to be updated to match this.

Interestingly, there is some validation in there too,

                msg = (
                    "na_action must either be 'ignore' or None, "
                    f"{na_action} was passed"
                )
                raise ValueError(msg)

so does make sense to also have the run-time validation for the dict-like (and Series) case as you suggested in the OP.

@stanleycai95
Copy link
Contributor

take

@stanleycai95
Copy link
Contributor

Hey everyone, if it's OK, I'm planning to update _map_values to check if na_action is in [None, 'ignore'] for Series and dict too. As na_action is not handled for Series and dict, I will also update the docstring to mention this.

I'm a first-time contributor haha.

@stanleycai95
Copy link
Contributor

I added a check for na_action in (None, 'ignore'). I didn't update the docstring because Series.map is working as expected with na_action='ignore'.

@jreback jreback modified the milestones: Contributions Welcome, 1.5 Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants