Skip to content

Commit 0176f6e

Browse files
tommyodjreback
authored andcommitted
DOC: Spellcheck of gotchas.rst (FAQ page) (#19747)
1 parent 0ffc4b5 commit 0176f6e

File tree

2 files changed

+60
-55
lines changed

2 files changed

+60
-55
lines changed

ci/lint.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ if [ "$LINT" ]; then
156156
RET=1
157157
fi
158158
echo "Check for deprecated messages without sphinx directive DONE"
159+
159160
else
160161
echo "NOT Linting"
161162
fi

doc/source/gotchas.rst

Lines changed: 59 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -22,22 +22,22 @@ Frequently Asked Questions (FAQ)
2222

2323
DataFrame memory usage
2424
----------------------
25-
The memory usage of a dataframe (including the index)
26-
is shown when accessing the ``info`` method of a dataframe. A
27-
configuration option, ``display.memory_usage`` (see :ref:`options`),
28-
specifies if the dataframe's memory usage will be displayed when
29-
invoking the ``df.info()`` method.
25+
The memory usage of a ``DataFrame`` (including the index) is shown when calling
26+
the :meth:`~DataFrame.info`. A configuration option, ``display.memory_usage``
27+
(see :ref:`the list of options <options.available>`), specifies if the
28+
``DataFrame``'s memory usage will be displayed when invoking the ``df.info()``
29+
method.
3030

31-
For example, the memory usage of the dataframe below is shown
32-
when calling ``df.info()``:
31+
For example, the memory usage of the ``DataFrame`` below is shown
32+
when calling :meth:`~DataFrame.info`:
3333

3434
.. ipython:: python
3535
3636
dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]',
3737
'complex128', 'object', 'bool']
3838
n = 5000
39-
data = dict([ (t, np.random.randint(100, size=n).astype(t))
40-
for t in dtypes])
39+
data = dict([(t, np.random.randint(100, size=n).astype(t))
40+
for t in dtypes])
4141
df = pd.DataFrame(data)
4242
df['categorical'] = df['object'].astype('category')
4343
@@ -48,7 +48,7 @@ pandas does not count the memory used by values in columns with
4848
``dtype=object``.
4949

5050
Passing ``memory_usage='deep'`` will enable a more accurate memory usage report,
51-
that accounts for the full usage of the contained objects. This is optional
51+
accounting for the full usage of the contained objects. This is optional
5252
as it can be expensive to do this deeper introspection.
5353

5454
.. ipython:: python
@@ -58,11 +58,11 @@ as it can be expensive to do this deeper introspection.
5858
By default the display option is set to ``True`` but can be explicitly
5959
overridden by passing the ``memory_usage`` argument when invoking ``df.info()``.
6060

61-
The memory usage of each column can be found by calling the ``memory_usage``
62-
method. This returns a Series with an index represented by column names
63-
and memory usage of each column shown in bytes. For the dataframe above,
64-
the memory usage of each column and the total memory usage of the
65-
dataframe can be found with the memory_usage method:
61+
The memory usage of each column can be found by calling the
62+
:meth:`~DataFrame.memory_usage` method. This returns a ``Series`` with an index
63+
represented by column names and memory usage of each column shown in bytes. For
64+
the ``DataFrame`` above, the memory usage of each column and the total memory
65+
usage can be found with the ``memory_usage`` method:
6666

6767
.. ipython:: python
6868
@@ -71,18 +71,18 @@ dataframe can be found with the memory_usage method:
7171
# total memory usage of dataframe
7272
df.memory_usage().sum()
7373
74-
By default the memory usage of the dataframe's index is shown in the
75-
returned Series, the memory usage of the index can be suppressed by passing
74+
By default the memory usage of the ``DataFrame``'s index is shown in the
75+
returned ``Series``, the memory usage of the index can be suppressed by passing
7676
the ``index=False`` argument:
7777

7878
.. ipython:: python
7979
8080
df.memory_usage(index=False)
8181
82-
The memory usage displayed by the ``info`` method utilizes the
83-
``memory_usage`` method to determine the memory usage of a dataframe
84-
while also formatting the output in human-readable units (base-2
85-
representation; i.e., 1KB = 1024 bytes).
82+
The memory usage displayed by the :meth:`~DataFrame.info` method utilizes the
83+
:meth:`~DataFrame.memory_usage` method to determine the memory usage of a
84+
``DataFrame`` while also formatting the output in human-readable units (base-2
85+
representation; i.e. 1KB = 1024 bytes).
8686

8787
See also :ref:`Categorical Memory Usage <categorical.memory>`.
8888

@@ -91,17 +91,18 @@ See also :ref:`Categorical Memory Usage <categorical.memory>`.
9191
Using If/Truth Statements with pandas
9292
-------------------------------------
9393

94-
pandas follows the NumPy convention of raising an error when you try to convert something to a ``bool``.
95-
This happens in a ``if`` or when using the boolean operations, ``and``, ``or``, or ``not``. It is not clear
96-
what the result of
94+
pandas follows the NumPy convention of raising an error when you try to convert
95+
something to a ``bool``. This happens in an ``if``-statement or when using the
96+
boolean operations: ``and``, ``or``, and ``not``. It is not clear what the result
97+
of the following code should be:
9798

9899
.. code-block:: python
99100
100101
>>> if pd.Series([False, True, False]):
101102
...
102103
103-
should be. Should it be ``True`` because it's not zero-length? ``False`` because there are ``False`` values?
104-
It is unclear, so instead, pandas raises a ``ValueError``:
104+
Should it be ``True`` because it's not zero-length, or ``False`` because there
105+
are ``False`` values? It is unclear, so instead, pandas raises a ``ValueError``:
105106

106107
.. code-block:: python
107108
@@ -111,9 +112,9 @@ It is unclear, so instead, pandas raises a ``ValueError``:
111112
...
112113
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
113114
114-
115-
If you see that, you need to explicitly choose what you want to do with it (e.g., use `any()`, `all()` or `empty`).
116-
or, you might want to compare if the pandas object is ``None``
115+
You need to explicitly choose what you want to do with the ``DataFrame``, e.g.
116+
use :meth:`~DataFrame.any`, :meth:`~DataFrame.all` or :meth:`~DataFrame.empty`.
117+
Alternatively, you might want to compare if the pandas object is ``None``:
117118

118119
.. code-block:: python
119120
@@ -122,15 +123,16 @@ or, you might want to compare if the pandas object is ``None``
122123
>>> I was not None
123124
124125
125-
or return if ``any`` value is ``True``.
126+
Below is how to check if any of the values are ``True``:
126127

127128
.. code-block:: python
128129
129130
>>> if pd.Series([False, True, False]).any():
130131
print("I am any")
131132
>>> I am any
132133
133-
To evaluate single-element pandas objects in a boolean context, use the method ``.bool()``:
134+
To evaluate single-element pandas objects in a boolean context, use the method
135+
:meth:`~DataFrame.bool`:
134136

135137
.. ipython:: python
136138
@@ -161,25 +163,25 @@ See :ref:`boolean comparisons<basics.compare>` for more examples.
161163
Using the ``in`` operator
162164
~~~~~~~~~~~~~~~~~~~~~~~~~
163165

164-
Using the Python ``in`` operator on a Series tests for membership in the
166+
Using the Python ``in`` operator on a ``Series`` tests for membership in the
165167
index, not membership among the values.
166168

167-
.. ipython::
169+
.. ipython:: python
168170
169171
s = pd.Series(range(5), index=list('abcde'))
170172
2 in s
171173
'b' in s
172174
173175
If this behavior is surprising, keep in mind that using ``in`` on a Python
174-
dictionary tests keys, not values, and Series are dict-like.
175-
To test for membership in the values, use the method :func:`~pandas.Series.isin`:
176+
dictionary tests keys, not values, and ``Series`` are dict-like.
177+
To test for membership in the values, use the method :meth:`~pandas.Series.isin`:
176178

177-
.. ipython::
179+
.. ipython:: python
178180
179181
s.isin([2])
180182
s.isin([2]).any()
181183
182-
For DataFrames, likewise, ``in`` applies to the column axis,
184+
For ``DataFrames``, likewise, ``in`` applies to the column axis,
183185
testing for membership in the list of column names.
184186

185187
``NaN``, Integer ``NA`` values and ``NA`` type promotions
@@ -189,12 +191,12 @@ Choice of ``NA`` representation
189191
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
190192

191193
For lack of ``NA`` (missing) support from the ground up in NumPy and Python in
192-
general, we were given the difficult choice between either
194+
general, we were given the difficult choice between either:
193195

194196
- A *masked array* solution: an array of data and an array of boolean values
195-
indicating whether a value is there or is missing
197+
indicating whether a value is there or is missing.
196198
- Using a special sentinel value, bit pattern, or set of sentinel values to
197-
denote ``NA`` across the dtypes
199+
denote ``NA`` across the dtypes.
198200

199201
For many reasons we chose the latter. After years of production use it has
200202
proven, at least in my opinion, to be the best decision given the state of
@@ -226,15 +228,16 @@ arrays. For example:
226228
s2.dtype
227229
228230
This trade-off is made largely for memory and performance reasons, and also so
229-
that the resulting Series continues to be "numeric". One possibility is to use
230-
``dtype=object`` arrays instead.
231+
that the resulting ``Series`` continues to be "numeric". One possibility is to
232+
use ``dtype=object`` arrays instead.
231233

232234
``NA`` type promotions
233235
~~~~~~~~~~~~~~~~~~~~~~
234236

235-
When introducing NAs into an existing Series or DataFrame via ``reindex`` or
236-
some other means, boolean and integer types will be promoted to a different
237-
dtype in order to store the NAs. These are summarized by this table:
237+
When introducing NAs into an existing ``Series`` or ``DataFrame`` via
238+
:meth:`~Series.reindex` or some other means, boolean and integer types will be
239+
promoted to a different dtype in order to store the NAs. The promotions are
240+
summarized in this table:
238241

239242
.. csv-table::
240243
:header: "Typeclass","Promotion dtype for storing NAs"
@@ -289,19 +292,19 @@ integer arrays to floating when NAs must be introduced.
289292

290293
Differences with NumPy
291294
----------------------
292-
For Series and DataFrame objects, ``var`` normalizes by ``N-1`` to produce
293-
unbiased estimates of the sample variance, while NumPy's ``var`` normalizes
294-
by N, which measures the variance of the sample. Note that ``cov``
295-
normalizes by ``N-1`` in both pandas and NumPy.
295+
For ``Series`` and ``DataFrame`` objects, :meth:`~DataFrame.var` normalizes by
296+
``N-1`` to produce unbiased estimates of the sample variance, while NumPy's
297+
``var`` normalizes by N, which measures the variance of the sample. Note that
298+
:meth:`~DataFrame.cov` normalizes by ``N-1`` in both pandas and NumPy.
296299

297300

298301
Thread-safety
299302
-------------
300303

301304
As of pandas 0.11, pandas is not 100% thread safe. The known issues relate to
302-
the ``DataFrame.copy`` method. If you are doing a lot of copying of DataFrame
303-
objects shared among threads, we recommend holding locks inside the threads
304-
where the data copying occurs.
305+
the :meth:`~DataFrame.copy` method. If you are doing a lot of copying of
306+
``DataFrame`` objects shared among threads, we recommend holding locks inside
307+
the threads where the data copying occurs.
305308

306309
See `this link <https://stackoverflow.com/questions/13592618/python-pandas-dataframe-thread-safe>`__
307310
for more information.
@@ -310,7 +313,8 @@ for more information.
310313
Byte-Ordering Issues
311314
--------------------
312315
Occasionally you may have to deal with data that were created on a machine with
313-
a different byte order than the one on which you are running Python. A common symptom of this issue is an error like
316+
a different byte order than the one on which you are running Python. A common
317+
symptom of this issue is an error like:
314318

315319
.. code-block:: python
316320
@@ -320,8 +324,8 @@ a different byte order than the one on which you are running Python. A common sy
320324
321325
To deal
322326
with this issue you should convert the underlying NumPy array to the native
323-
system byte order *before* passing it to Series/DataFrame/Panel constructors
324-
using something similar to the following:
327+
system byte order *before* passing it to ``Series`` or ``DataFrame``
328+
constructors using something similar to the following:
325329

326330
.. ipython:: python
327331

0 commit comments

Comments
 (0)