@@ -22,22 +22,22 @@ Frequently Asked Questions (FAQ)
22
22
23
23
DataFrame memory usage
24
24
----------------------
25
- The memory usage of a dataframe (including the index)
26
- is shown when accessing the `` info `` method of a dataframe. A
27
- configuration option, `` display.memory_usage `` (see :ref: `options `),
28
- specifies if the dataframe 's memory usage will be displayed when
29
- invoking the `` df.info() `` method.
25
+ The memory usage of a `` DataFrame `` (including the index) is shown when calling
26
+ the :meth: ` ~DataFrame. info `. A configuration option, `` display.memory_usage ``
27
+ (see :ref: `the list of options < options.available > `), specifies if the
28
+ `` DataFrame `` 's memory usage will be displayed when invoking the `` df.info() ``
29
+ method.
30
30
31
- For example, the memory usage of the dataframe below is shown
32
- when calling `` df .info() ` `:
31
+ For example, the memory usage of the `` DataFrame `` below is shown
32
+ when calling :meth: ` ~DataFrame .info `:
33
33
34
34
.. ipython :: python
35
35
36
36
dtypes = [' int64' , ' float64' , ' datetime64[ns]' , ' timedelta64[ns]' ,
37
37
' complex128' , ' object' , ' bool' ]
38
38
n = 5000
39
- data = dict ([ (t, np.random.randint(100 , size = n).astype(t))
40
- for t in dtypes])
39
+ data = dict ([(t, np.random.randint(100 , size = n).astype(t))
40
+ for t in dtypes])
41
41
df = pd.DataFrame(data)
42
42
df[' categorical' ] = df[' object' ].astype(' category' )
43
43
@@ -48,7 +48,7 @@ pandas does not count the memory used by values in columns with
48
48
``dtype=object ``.
49
49
50
50
Passing ``memory_usage='deep' `` will enable a more accurate memory usage report,
51
- that accounts for the full usage of the contained objects. This is optional
51
+ accounting for the full usage of the contained objects. This is optional
52
52
as it can be expensive to do this deeper introspection.
53
53
54
54
.. ipython :: python
@@ -58,11 +58,11 @@ as it can be expensive to do this deeper introspection.
58
58
By default the display option is set to ``True `` but can be explicitly
59
59
overridden by passing the ``memory_usage `` argument when invoking ``df.info() ``.
60
60
61
- The memory usage of each column can be found by calling the `` memory_usage ``
62
- method. This returns a Series with an index represented by column names
63
- and memory usage of each column shown in bytes. For the dataframe above,
64
- the memory usage of each column and the total memory usage of the
65
- dataframe can be found with the memory_usage method:
61
+ The memory usage of each column can be found by calling the
62
+ :meth: ` ~DataFrame.memory_usage ` method. This returns a `` Series `` with an index
63
+ represented by column names and memory usage of each column shown in bytes. For
64
+ the `` DataFrame `` above, the memory usage of each column and the total memory
65
+ usage can be found with the `` memory_usage `` method:
66
66
67
67
.. ipython :: python
68
68
@@ -71,18 +71,18 @@ dataframe can be found with the memory_usage method:
71
71
# total memory usage of dataframe
72
72
df.memory_usage().sum()
73
73
74
- By default the memory usage of the dataframe 's index is shown in the
75
- returned Series, the memory usage of the index can be suppressed by passing
74
+ By default the memory usage of the `` DataFrame `` 's index is shown in the
75
+ returned `` Series `` , the memory usage of the index can be suppressed by passing
76
76
the ``index=False `` argument:
77
77
78
78
.. ipython :: python
79
79
80
80
df.memory_usage(index = False )
81
81
82
- The memory usage displayed by the `` info ` ` method utilizes the
83
- `` memory_usage `` method to determine the memory usage of a dataframe
84
- while also formatting the output in human-readable units (base-2
85
- representation; i.e., 1KB = 1024 bytes).
82
+ The memory usage displayed by the :meth: ` ~DataFrame. info ` method utilizes the
83
+ :meth: ` ~DataFrame. memory_usage ` method to determine the memory usage of a
84
+ `` DataFrame `` while also formatting the output in human-readable units (base-2
85
+ representation; i.e. 1KB = 1024 bytes).
86
86
87
87
See also :ref: `Categorical Memory Usage <categorical.memory >`.
88
88
@@ -91,17 +91,18 @@ See also :ref:`Categorical Memory Usage <categorical.memory>`.
91
91
Using If/Truth Statements with pandas
92
92
-------------------------------------
93
93
94
- pandas follows the NumPy convention of raising an error when you try to convert something to a ``bool ``.
95
- This happens in a ``if `` or when using the boolean operations, ``and ``, ``or ``, or ``not ``. It is not clear
96
- what the result of
94
+ pandas follows the NumPy convention of raising an error when you try to convert
95
+ something to a ``bool ``. This happens in an ``if ``-statement or when using the
96
+ boolean operations: ``and ``, ``or ``, and ``not ``. It is not clear what the result
97
+ of the following code should be:
97
98
98
99
.. code-block :: python
99
100
100
101
>> > if pd.Series([False , True , False ]):
101
102
...
102
103
103
- should be. Should it be ``True `` because it's not zero-length? ``False `` because there are `` False `` values?
104
- It is unclear, so instead, pandas raises a ``ValueError ``:
104
+ Should it be ``True `` because it's not zero-length, or ``False `` because there
105
+ are `` False `` values? It is unclear, so instead, pandas raises a ``ValueError ``:
105
106
106
107
.. code-block :: python
107
108
@@ -111,9 +112,9 @@ It is unclear, so instead, pandas raises a ``ValueError``:
111
112
...
112
113
ValueError : The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
113
114
114
-
115
- If you see that, you need to explicitly choose what you want to do with it (e.g., use ` any() `, ` all() ` or ` empty `) .
116
- or , you might want to compare if the pandas object is ``None ``
115
+ You need to explicitly choose what you want to do with the `` DataFrame ``, e.g.
116
+ use :meth: ` ~DataFrame. any `, :meth: ` ~DataFrame. all ` or :meth: ` ~DataFrame. empty `.
117
+ Alternatively , you might want to compare if the pandas object is ``None ``:
117
118
118
119
.. code-block :: python
119
120
@@ -122,15 +123,16 @@ or, you might want to compare if the pandas object is ``None``
122
123
>> > I was not None
123
124
124
125
125
- or return if `` any `` value is ``True ``.
126
+ Below is how to check if any of the values are ``True ``:
126
127
127
128
.. code-block :: python
128
129
129
130
>> > if pd.Series([False , True , False ]).any():
130
131
print (" I am any" )
131
132
>> > I am any
132
133
133
- To evaluate single-element pandas objects in a boolean context, use the method ``.bool() ``:
134
+ To evaluate single-element pandas objects in a boolean context, use the method
135
+ :meth: `~DataFrame.bool `:
134
136
135
137
.. ipython :: python
136
138
@@ -161,25 +163,25 @@ See :ref:`boolean comparisons<basics.compare>` for more examples.
161
163
Using the ``in `` operator
162
164
~~~~~~~~~~~~~~~~~~~~~~~~~
163
165
164
- Using the Python ``in `` operator on a Series tests for membership in the
166
+ Using the Python ``in `` operator on a `` Series `` tests for membership in the
165
167
index, not membership among the values.
166
168
167
- .. ipython ::
169
+ .. ipython :: python
168
170
169
171
s = pd.Series(range (5 ), index = list (' abcde' ))
170
172
2 in s
171
173
' b' in s
172
174
173
175
If this behavior is surprising, keep in mind that using ``in `` on a Python
174
- dictionary tests keys, not values, and Series are dict-like.
175
- To test for membership in the values, use the method :func : `~pandas.Series.isin `:
176
+ dictionary tests keys, not values, and `` Series `` are dict-like.
177
+ To test for membership in the values, use the method :meth : `~pandas.Series.isin `:
176
178
177
- .. ipython ::
179
+ .. ipython :: python
178
180
179
181
s.isin([2 ])
180
182
s.isin([2 ]).any()
181
183
182
- For DataFrames, likewise, ``in `` applies to the column axis,
184
+ For `` DataFrames `` , likewise, ``in `` applies to the column axis,
183
185
testing for membership in the list of column names.
184
186
185
187
``NaN ``, Integer ``NA `` values and ``NA `` type promotions
@@ -189,12 +191,12 @@ Choice of ``NA`` representation
189
191
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
190
192
191
193
For lack of ``NA `` (missing) support from the ground up in NumPy and Python in
192
- general, we were given the difficult choice between either
194
+ general, we were given the difficult choice between either:
193
195
194
196
- A *masked array * solution: an array of data and an array of boolean values
195
- indicating whether a value is there or is missing
197
+ indicating whether a value is there or is missing.
196
198
- Using a special sentinel value, bit pattern, or set of sentinel values to
197
- denote ``NA `` across the dtypes
199
+ denote ``NA `` across the dtypes.
198
200
199
201
For many reasons we chose the latter. After years of production use it has
200
202
proven, at least in my opinion, to be the best decision given the state of
@@ -226,15 +228,16 @@ arrays. For example:
226
228
s2.dtype
227
229
228
230
This trade-off is made largely for memory and performance reasons, and also so
229
- that the resulting Series continues to be "numeric". One possibility is to use
230
- ``dtype=object `` arrays instead.
231
+ that the resulting `` Series `` continues to be "numeric". One possibility is to
232
+ use ``dtype=object `` arrays instead.
231
233
232
234
``NA `` type promotions
233
235
~~~~~~~~~~~~~~~~~~~~~~
234
236
235
- When introducing NAs into an existing Series or DataFrame via ``reindex `` or
236
- some other means, boolean and integer types will be promoted to a different
237
- dtype in order to store the NAs. These are summarized by this table:
237
+ When introducing NAs into an existing ``Series `` or ``DataFrame `` via
238
+ :meth: `~Series.reindex ` or some other means, boolean and integer types will be
239
+ promoted to a different dtype in order to store the NAs. The promotions are
240
+ summarized in this table:
238
241
239
242
.. csv-table ::
240
243
:header: "Typeclass","Promotion dtype for storing NAs"
@@ -289,19 +292,19 @@ integer arrays to floating when NAs must be introduced.
289
292
290
293
Differences with NumPy
291
294
----------------------
292
- For Series and DataFrame objects, `` var `` normalizes by `` N-1 `` to produce
293
- unbiased estimates of the sample variance, while NumPy's `` var `` normalizes
294
- by N, which measures the variance of the sample. Note that `` cov ``
295
- normalizes by ``N-1 `` in both pandas and NumPy.
295
+ For `` Series `` and `` DataFrame `` objects, :meth: ` ~DataFrame. var ` normalizes by
296
+ `` N-1 `` to produce unbiased estimates of the sample variance, while NumPy's
297
+ `` var `` normalizes by N, which measures the variance of the sample. Note that
298
+ :meth: ` ~DataFrame.cov ` normalizes by ``N-1 `` in both pandas and NumPy.
296
299
297
300
298
301
Thread-safety
299
302
-------------
300
303
301
304
As of pandas 0.11, pandas is not 100% thread safe. The known issues relate to
302
- the `` DataFrame.copy `` method. If you are doing a lot of copying of DataFrame
303
- objects shared among threads, we recommend holding locks inside the threads
304
- where the data copying occurs.
305
+ the :meth: ` ~ DataFrame.copy ` method. If you are doing a lot of copying of
306
+ `` DataFrame `` objects shared among threads, we recommend holding locks inside
307
+ the threads where the data copying occurs.
305
308
306
309
See `this link <https://stackoverflow.com/questions/13592618/python-pandas-dataframe-thread-safe >`__
307
310
for more information.
@@ -310,7 +313,8 @@ for more information.
310
313
Byte-Ordering Issues
311
314
--------------------
312
315
Occasionally you may have to deal with data that were created on a machine with
313
- a different byte order than the one on which you are running Python. A common symptom of this issue is an error like
316
+ a different byte order than the one on which you are running Python. A common
317
+ symptom of this issue is an error like:
314
318
315
319
.. code-block :: python
316
320
@@ -320,8 +324,8 @@ a different byte order than the one on which you are running Python. A common sy
320
324
321
325
To deal
322
326
with this issue you should convert the underlying NumPy array to the native
323
- system byte order *before * passing it to Series/ DataFrame/Panel constructors
324
- using something similar to the following:
327
+ system byte order *before * passing it to `` Series `` or `` DataFrame ``
328
+ constructors using something similar to the following:
325
329
326
330
.. ipython :: python
327
331
0 commit comments