Skip to content

BUG:Sanity check on merge parameters for correct exception #26824 #26855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c3064e6
BUG:Sanity check on merge parameters for correct exception #26824
harshit-py Jun 14, 2019
c862ec0
added tests
harshit-py Jun 15, 2019
0df85f0
updated tests for better coverage
harshit-py Jun 15, 2019
dea59fa
requested changes on the test
harshit-py Jun 15, 2019
53aee36
updated whatsnew v0250rst
harshit-py Jun 15, 2019
6396947
requested changes
harshit-py Jun 15, 2019
11b1894
further changes
harshit-py Jun 15, 2019
7b4aa22
updated test
harshit-py Jun 15, 2019
bfb7984
Merge branch 'master' into merge_py_bug
harshit-py Jun 17, 2019
1e2b276
BUG:Sanity check on merge parameters for correct exception #26824
harshit-py Jun 14, 2019
0f88c32
added tests
harshit-py Jun 15, 2019
a9905bd
updated tests for better coverage
harshit-py Jun 15, 2019
a939d8e
requested changes on the test
harshit-py Jun 15, 2019
83f8cda
updated whatsnew v0250rst
harshit-py Jun 15, 2019
962882d
requested changes
harshit-py Jun 15, 2019
139d696
further changes
harshit-py Jun 15, 2019
4e6bd89
updated test
harshit-py Jun 15, 2019
a0680e0
Trying original patch
harshit-py Jun 17, 2019
0a894a9
Revert "Trying original patch"
harshit-py Jun 19, 2019
0cb8843
Trying original patch
harshit-py Jun 19, 2019
4e45c75
Revert "Trying original patch"
harshit-py Jun 19, 2019
500e374
Trying original patch
harshit-py Jun 19, 2019
773dec2
Original patch
harshit-py Jun 19, 2019
7770b1d
further changes
harshit-py Jun 19, 2019
efb0761
Merge remote-tracking branch 'upstream/master' into harshit-py-merge_…
TomAugspurger Jun 25, 2019
6b7f5a2
re-revert
TomAugspurger Jun 25, 2019
b7c482e
Merge pull request #1 from TomAugspurger/harshit-py-merge_py_bug
harshit-py Jun 28, 2019
e4e486c
Merge branch 'master' into PR_TOOL_MERGE_PR_26855
jreback Jun 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,6 @@ than :attr:`options.display.max_seq_items` (default: 100 items). Horizontally,
the output will truncate, if it's wider than :attr:`options.display.width`
(default: 80 characters).


.. _whatsnew_0250.enhancements.other:

Other enhancements
Expand All @@ -156,6 +155,7 @@ Other enhancements
- Error message for missing required imports now includes the original import error's text (:issue:`23868`)
- :class:`DatetimeIndex` and :class:`TimedeltaIndex` now have a ``mean`` method (:issue:`24757`)
- :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`)
- :func:`pandas.merge` now raises ``ValueError`` when either of ``left_on`` or ``right_on`` is not provided (:issue:`26855`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate slightly; this only matters if on itself is not provided. Do we have a check if on is provide and either of left_on or right_on is not None?

- Added support for reading SPSS .sav files using :func:`read_spss` (:issue:`26537`)
- Added new option ``plotting.backend`` to be able to select a plotting backend different than the existing ``matplotlib`` one. Use ``pandas.set_option('plotting.backend', '<backend-module>')`` where ``<backend-module`` is a library implementing the pandas plotting API (:issue:`14130`)
- :class:`pandas.offsets.BusinessHour` supports multiple opening hours intervals (:issue:`15481`)
Expand Down Expand Up @@ -604,6 +604,7 @@ Other deprecations
Instead, use :meth:`Series.dtype` and :meth:`DataFrame.dtypes` (:issue:`26705`).
- :meth:`Timedelta.resolution` is deprecated and replaced with :meth:`Timedelta.resolution_string`. In a future version, :meth:`Timedelta.resolution` will be changed to behave like the standard library :attr:`timedelta.resolution` (:issue:`21344`)


.. _whatsnew_0250.prior_deprecations:

Removal of prior version deprecations/changes
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1091,13 +1091,19 @@ def _validate_specification(self):
raise ValueError('len(left_on) must equal the number '
'of levels in the index of "right"')
self.right_on = [None] * n
if not self.right_on:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead try to set

self.left_on = self.left_on or [None] * len( self.right_on)
and same for right_on

before these if clauses; that way don’t need to add additional checks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

raise ValueError('both "right_on" and "left_on" '
'should be passed')
elif self.right_on is not None:
n = len(self.right_on)
if self.left_index:
if len(self.right_on) != self.left.index.nlevels:
raise ValueError('len(right_on) must equal the number '
'of levels in the index of "left"')
self.left_on = [None] * n
if not self.left_on:
raise ValueError('both "right_on" and "left_on" '
'should be passed')
if len(self.right_on) != len(self.left_on):
raise ValueError("len(right_on) must equal len(left_on)")

Expand Down
33 changes: 33 additions & 0 deletions pandas/tests/reshape/merge/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1026,6 +1026,22 @@ def test_validation(self):
result = merge(left, right, on=['a', 'b'], validate='1:1')
assert_frame_equal(result, expected_multi)

@pytest.mark.parametrize('merge_type', ['left_on', 'right_on'])
def test_missing_on_raises(self, merge_type):
# GH26824
left = DataFrame({
'A': [1, 2, 3, 4, 5, 6],
'B': ['P', 'Q', 'R', 'S', 'T', 'U']
})
right = DataFrame({
'A': [1, 2, 4, 5, 7, 8],
'C': ['L', 'M', 'N', 'O', 'P', 'Q']
})
msg = 'must equal'
kwargs = {merge_type: 'A'}
with pytest.raises(ValueError, match=msg):
pd.merge(left, right, how='left', **kwargs)

def test_merge_two_empty_df_no_division_error(self):
# GH17776, PR #17846
a = pd.DataFrame({'a': [], 'b': [], 'c': []})
Expand Down Expand Up @@ -1763,3 +1779,20 @@ def test_merge_equal_cat_dtypes2():

# Categorical is unordered, so don't check ordering.
tm.assert_frame_equal(result, expected, check_categorical=False)


@pytest.mark.parametrize('merge_type', ['left_on', 'right_on'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this near the test_validation test

def test_missing_on_raises(merge_type):
# GH26824
df1 = DataFrame({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call these left & right

'A': [1, 2, 3, 4, 5, 6],
'B': ['P', 'Q', 'R', 'S', 'T', 'U']
})
df2 = DataFrame({
'A': [1, 2, 4, 5, 7, 8],
'C': ['L', 'M', 'N', 'O', 'P', 'Q']
})
msg = 'must equal'
kwargs = {merge_type: 'A'}
with pytest.raises(ValueError, match=msg):
pd.merge(df1, df2, how='left', **kwargs)