-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: Allow other na values in StringArray Constructor #45168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ba5fce1
8011f8d
36ad886
1b27993
3ddcf37
4b2ee88
4c60d0f
d7b5d4b
b13317e
a2d27ca
d5e594d
9c6e9d3
1fca524
9016c00
6ec2059
af8ece1
1fb424c
ad55cd3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -246,11 +246,18 @@ class StringArray(BaseStringArray, PandasArray): | |
.. warning:: | ||
|
||
Currently, this expects an object-dtype ndarray | ||
where the elements are Python strings or :attr:`pandas.NA`. | ||
where the elements are Python strings | ||
or nan-likes (``None``, ``np.nan``, ``NA``). | ||
This may change without warning in the future. Use | ||
:meth:`pandas.array` with ``dtype="string"`` for a stable way of | ||
creating a `StringArray` from any sequence. | ||
|
||
.. versionchanged:: 1.5.0 | ||
|
||
StringArray now accepts array-likes containing | ||
nan-likes(``None``, ``np.nan``) for the ``values`` parameter | ||
in addition to strings and :attr:`pandas.NA` | ||
|
||
copy : bool, default False | ||
Whether to copy the array of data. | ||
|
||
|
@@ -310,11 +317,11 @@ def __init__(self, values, copy=False): | |
values = extract_array(values) | ||
|
||
super().__init__(values, copy=copy) | ||
if not isinstance(values, type(self)): | ||
self._validate() | ||
# error: Incompatible types in assignment (expression has type "StringDtype", | ||
# variable has type "PandasDtype") | ||
NDArrayBacked.__init__(self, self._ndarray, StringDtype(storage="python")) | ||
if not isinstance(values, type(self)): | ||
self._validate() | ||
|
||
def _validate(self): | ||
"""Validate that we only store NA or strings.""" | ||
|
@@ -325,6 +332,12 @@ def _validate(self): | |
"StringArray requires a sequence of strings or pandas.NA. Got " | ||
f"'{self._ndarray.dtype}' dtype instead." | ||
) | ||
# Check to see if need to convert Na values to pd.NA | ||
if self._ndarray.ndim > 2: | ||
# Ravel if ndims > 2 b/c no cythonized version available | ||
lib.convert_nans_to_NA(self._ndarray.ravel("K")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. isnt the ravel unnecessary bc convert_nans_to_NA supports 2D? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is for > 2D (e.g. 3d stuff). IIUC, pandas doesn't support > 2D but the fancy indexing tests that check for an error in the 3D case, expect that a 3D StringArray creates correctly for some reason only to raise later in the series constructor. Previously failing tests here: https://github.com/pandas-dev/pandas/runs/4697417795?check_suite_focus=true There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we just always ravel and then reshape? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for all ndim There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
risks making copies There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. kk, don't luv the ndim > 3 here at all (i get your reasoning) but as this will be refactored would be good to simplify. |
||
else: | ||
lib.convert_nans_to_NA(self._ndarray) | ||
|
||
@classmethod | ||
def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy=False): | ||
|
Uh oh!
There was an error while loading. Please reload this page.