-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API/ENH: Accept nan-likes in StringArray constructor #40839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Enhancement
NA - MaskedArrays
Related to pd.NA and nullable extension arrays
Strings
String extension data type and string data
Milestone
Comments
4 tasks
4 tasks
removing milestone for now, can add back later |
3 tasks
This was referenced Dec 24, 2021
@lithomas1 created a pull request for this and it looks like it was very close to complete but I think it's now closed for inactivity. Is there a chance this issue will be reopened at some point? I could really use this feature. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Enhancement
NA - MaskedArrays
Related to pd.NA and nullable extension arrays
Strings
String extension data type and string data
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem?
Currently, StringArray can only be instantiated directly with a ndarray with strings or NA values represented by pd.NA. The only way to instantiate a StringArray with other missing value indicators(like
np.nan
andNone
) is to use pandas.array, which has a side effect of casting non-string elements to strings instead of erroring.The proposed solution would allow StringArray instantiation from a numpy array containing np.nan/None without casting non-strings. This is useful if you want the StringArray constructor to validate that inputs are strings and also accepts other missing values other than pd.NA. At the very least, it should support np.nan since StringArray is created from a numpy array, and np.nan is the missing value indicator for numpy.
Describe the solution you'd like
Either accept nan-likes in the constructor directly(breaking change) or add a parameter to the constructor allowing other na_values, maybe something like the na_values parameter from read_csv.
API breaking implications
Either breaking change or new parameter.
Describe alternatives you've considered
You'd have to do the validation yourself and validating yourself and then having StringArray validate again is not good for perf.
cc @jorisvandenbossche
The text was updated successfully, but these errors were encountered: