-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Implement Kleene logic for BooleanArray #29842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
bb904cb
13c7ea3
fff786f
4067e7f
708c553
c56894e
2e9d547
373aaab
7f78a64
36b171b
747e046
d0a8cca
fe061b0
9f9e44c
0a34257
2ba0034
2d1129a
a24fc22
77dd1fc
7b9002c
c18046b
1237caa
2ecf9b8
87aeb09
969b6dc
1c9ba49
8eec954
cb47b6a
2a946b9
efb6f8b
004238e
5a2c81c
7032318
bbb7f9b
ce763b4
5bc5328
457bd08
31c2bc6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
import numpy as np | ||
|
||
import pandas as pd | ||
|
||
|
||
class TimeLogicalOps: | ||
def setup(self): | ||
N = 10_000 | ||
left, right, lmask, rmask = np.random.randint(0, 2, size=(4, N)).astype("bool") | ||
self.left = pd.arrays.BooleanArray(left, lmask) | ||
self.right = pd.arrays.BooleanArray(right, rmask) | ||
|
||
def time_or_scalar(self): | ||
self.left | True | ||
self.left | False | ||
|
||
def time_or_array(self): | ||
self.left | self.right | ||
|
||
def time_and_scalar(self): | ||
self.left & True | ||
self.left & False | ||
|
||
def time_and_array(self): | ||
self.left & self.right | ||
|
||
def time_xor_scalar(self): | ||
self.left ^ True | ||
self.left ^ False | ||
|
||
def time_xor_array(self): | ||
self.left ^ self.right |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
.. currentmodule:: pandas | ||
|
||
.. ipython:: python | ||
:suppress: | ||
|
||
import pandas as pd | ||
import numpy as np | ||
|
||
.. _boolean: | ||
|
||
************************** | ||
Nullable Boolean Data Type | ||
************************** | ||
|
||
.. versionadded:: 1.0.0 | ||
|
||
.. _boolean.kleene: | ||
|
||
Kleene Logical Operations | ||
------------------------- | ||
|
||
:class:`arrays.BooleanArray` implements `Kleene Logic`_ (sometimes called three-value logic) for | ||
logical operations like ``&`` (and), ``|`` (or) and ``^`` (exclusive-or). | ||
|
||
This table demonstrates the results for every combination. These operations are symmetrical, | ||
so flipping the left- and right-hand side makes no difference in the result. | ||
|
||
================= ========= | ||
Expression Result | ||
================= ========= | ||
``True & True`` ``True`` | ||
``True & False`` ``False`` | ||
``True & NA`` ``NA`` | ||
``False & False`` ``False`` | ||
``False & NA`` ``False`` | ||
``NA & NA`` ``NA`` | ||
``True | True`` ``True`` | ||
``True | False`` ``True`` | ||
``True | NA`` ``True`` | ||
``False | False`` ``False`` | ||
``False | NA`` ``NA`` | ||
``NA | NA`` ``NA`` | ||
``True ^ True`` ``False`` | ||
``True ^ False`` ``True`` | ||
``True ^ NA`` ``NA`` | ||
``False ^ False`` ``False`` | ||
``False ^ NA`` ``NA`` | ||
``NA ^ NA`` ``NA`` | ||
================= ========= | ||
|
||
When an ``NA`` is present in an operation, the output value is ``NA`` only if | ||
the result cannot be determined solely based on the other input. For example, | ||
``True | NA`` is ``True``, because both ``True | True`` and ``True | False`` | ||
are ``True``. In that case, we don't actually need to consider the value | ||
of the ``NA``. | ||
|
||
On the other hand, ``True & NA`` is ``NA``. The result depends on whether | ||
the ``NA`` really is ``True`` or ``False``, since ``True & True`` is ``True``, | ||
but ``True & False`` is ``False``, so we can't determine the output. | ||
|
||
|
||
This differs from how ``np.nan`` behaves in logical operations. Pandas treated | ||
``np.nan`` is *always false in the output*. | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
In ``or`` | ||
|
||
.. ipython:: python | ||
|
||
pd.Series([True, False, np.nan], dtype="object") | True | ||
pd.Series([True, False, np.nan], dtype="boolean") | True | ||
|
||
In ``and`` | ||
|
||
.. ipython:: python | ||
|
||
pd.Series([True, False, np.nan], dtype="object") & True | ||
pd.Series([True, False, np.nan], dtype="boolean") & True | ||
|
||
.. _Kleene Logic: https://en.wikipedia.org/wiki/Three-valued_logic#Kleene_and_Priest_logics |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -184,6 +184,9 @@ class BooleanArray(ExtensionArray, ExtensionOpsMixin): | |
represented by 2 numpy arrays: a boolean array with the data and | ||
a boolean array with the mask (True indicating missing). | ||
|
||
BooleanArray implements Kleene logic (sometimes called three-value | ||
logic) for logical operations. See :ref:`boolean.kleene` for more. | ||
|
||
To construct an BooleanArray from generic array-like input, use | ||
:func:`pandas.array` specifying ``dtype="boolean"`` (see examples | ||
below). | ||
|
@@ -283,7 +286,7 @@ def __getitem__(self, item): | |
|
||
def _coerce_to_ndarray(self, dtype=None, na_value: "Scalar" = libmissing.NA): | ||
""" | ||
Coerce to an ndarary of object dtype or bool dtype (if force_bool=True). | ||
Coerce to an ndarray of object dtype or bool dtype (if force_bool=True). | ||
|
||
Parameters | ||
---------- | ||
|
@@ -565,33 +568,40 @@ def logical_method(self, other): | |
# Rely on pandas to unbox and dispatch to us. | ||
return NotImplemented | ||
|
||
assert op.__name__ in {"or_", "ror_", "and_", "rand_", "xor", "rxor"} | ||
other = lib.item_from_zerodim(other) | ||
other_is_booleanarray = isinstance(other, BooleanArray) | ||
other_is_scalar = lib.is_scalar(other) | ||
mask = None | ||
|
||
if isinstance(other, BooleanArray): | ||
if other_is_booleanarray: | ||
other, mask = other._data, other._mask | ||
elif is_list_like(other): | ||
other = np.asarray(other, dtype="bool") | ||
if other.ndim > 1: | ||
raise NotImplementedError( | ||
"can only perform ops with 1-d structures" | ||
) | ||
if len(self) != len(other): | ||
raise ValueError("Lengths must match to compare") | ||
other, mask = coerce_to_array(other, copy=False) | ||
elif isinstance(other, np.bool_): | ||
other = other.item() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is to convert to a python bool? why not just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But Tom, why is it exactly needed to convert this? I would think the numpy operations later on work fine with a numpy scalar as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC, we do things like |
||
|
||
if other_is_scalar and not (other is libmissing.NA or lib.is_bool(other)): | ||
raise TypeError( | ||
"'other' should be pandas.NA or a bool. Got {} instead.".format( | ||
type(other).__name__ | ||
) | ||
) | ||
|
||
# numpy will show a DeprecationWarning on invalid elementwise | ||
# comparisons, this will raise in the future | ||
with warnings.catch_warnings(): | ||
warnings.filterwarnings("ignore", "elementwise", FutureWarning) | ||
with np.errstate(all="ignore"): | ||
result = op(self._data, other) | ||
if not other_is_scalar and len(self) != len(other): | ||
raise ValueError("Lengths must match to compare") | ||
|
||
# nans propagate | ||
if mask is None: | ||
mask = self._mask | ||
else: | ||
mask = self._mask | mask | ||
if op.__name__ in {"or_", "ror_"}: | ||
result, mask = ops.kleene_or(self._data, other, self._mask, mask) | ||
elif op.__name__ in {"and_", "rand_"}: | ||
result, mask = ops.kleene_and(self._data, other, self._mask, mask) | ||
elif op.__name__ in {"xor", "rxor"}: | ||
result, mask = ops.kleene_xor(self._data, other, self._mask, mask) | ||
|
||
return BooleanArray(result, mask) | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.