-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Default value for missing values on merge #20007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you can add This is just adding more complexity to an already very complex |
So I just stumbled into this thread looking for the solution the OP is asking. I have a different case, in which I have a
In this example, One possible workaround would be something like this:
But this feels like something I should not be doing. |
I have a similar issue: Working only with string data, the missing values still becomes NaN, which makes no sense in a string column. My target is to have None values. The proposed fix .fillna(value=None, downcast='defer') does not work, because fillna thinks the value parameter is missing. |
Same issue here. Like @GSanchis I want to left-join (merge) two DataFrames where, in my case, the second one has a single column of ints or Strings where the missing values need to filled with 0 or empty String, respectively. |
Hello this feature is needed since on very large and complex dataset we focus on memory size and pandas changes column types to float32 (even if you have int8) :( |
A similar issue affects things implemented based on align (#31874) |
+1 Fillna is no solution. |
+1 we need this feature Perhaps something similar to |
Does anyone disagree that this is only a workaround but not a solution? But I am not sure if we need an You can have
Question: Why does not |
It does handle the NA type. NA ist the missing value indicator for extension dtypes, e.g. if you are merging
you keep |
Thanks. Can understand now. But maybe the docu can improved here? I am not sure about it. Despite explaining the technical details and backgrounds just add a FAQ like section to For newbies it is quite hard to understand the difference between |
We have sections in the user guide explaining the nullable dtypes. Since this is relevant for all functions which may produce missing values, I don't think adding an explanation to merge specifically is desireable |
Perhaps a reference to the nullable types doc?
And if the answer is that this issue is somehow solved in the nullable types doc, then please update the docs need to read: "Yes, we know that lots of functions produce nullable data. We've made a policy decision not to include fill_value for very many cases. If your data consists of categorical values, then perhaps Pandas is not for you." ... if it is indeed the case.
|
If you think the docs can be improved, then pull requests are welcome
Why? What's the issue with categorical values? |
Auto-casting ints to floats destroys the information categorical values.
And 500000000005->500000000005.011111 is no longer primary key in your
database.
It's a classic programming footgun. When nulls are introduced into an
integer field, a value that represents missing (or other information
representing it's missingness) is needed.
…On Wed, Sep 8, 2021 at 1:15 AM Marco Edward Gorelli < ***@***.***> wrote:
If you think the docs can be improved, then pull requests are welcome
If your data consists of categorical values, then perhaps Pandas is not
for you
Why? What's the issue with categorical values?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20007 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3QJLEZLNPE4YEW526D5G3UA4LS3ANCNFSM4ETXMJWQ>
.
|
Please show a minimal reproducible example - this isn't what I'm seeing (though I may be misunderstanding): In [3]: left = pd.DataFrame({"a": [1, 2, 3]}, dtype="Int64")
...: right = pd.DataFrame({"a": [1, 2, 4], "b": 1}, dtype="Int64")
...: right['b'] = right['b'].astype('category')
...:
...: pd.merge(left, right, how="outer")
Out[3]:
a b
0 1 1
1 2 1
2 3 NaN
3 4 1 |
This is a reopening of #1836. The suggestion there was to add a parameter to
pd.merge
, such asfillvalue
, whose value would be used instead ofNaN
for missing values. This isn't simply solved byfillna
since addingNaN
to columns casts them tofloat
.#1836 also asked to provide an example where this would be useful. Admittedly, in my case there might be a simpler solution than
merge
, but anyway.I have a
DataFrame
with a single column which is basically an index: it contains distinct numbers. I also have aDataFrame
where one column contains some (but not all) values from the same index, while others contain useful data. I want to extend thisDataFrame
to include all values from the index, filling the other columns with zeros. I do this by callingand end up with a
DataFrame
where all columns except forcol_index
are cast tofloat
.Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: