Skip to content

Commit 71ff0f6

Browse files
authored
[MRG] EHN refactoring of the ratio argument (#413)
1 parent 2e5956a commit 71ff0f6

File tree

77 files changed

+3408
-2779
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+3408
-2779
lines changed

doc/api.rst

+1
Original file line numberDiff line numberDiff line change
@@ -205,4 +205,5 @@ Imbalance-learn provides some fast-prototyping tools.
205205
utils.estimator_checks.check_estimator
206206
utils.check_neighbors_object
207207
utils.check_ratio
208+
utils.check_sampling_strategy
208209
utils.hash_X_y

doc/datasets/index.rst

+13-11
Original file line numberDiff line numberDiff line change
@@ -94,29 +94,31 @@ Imbalanced generator
9494
====================
9595

9696
:func:`make_imbalance` turns an original dataset into an imbalanced
97-
dataset. This behaviour is driven by the parameter ``ratio`` which behave
98-
similarly to other resampling algorithm. ``ratio`` can be given as a dictionary
99-
where the key corresponds to the class and the value is the the number of
100-
samples in the class::
97+
dataset. This behaviour is driven by the parameter ``sampling_strategy`` which
98+
behave similarly to other resampling algorithm. ``sampling_strategy`` can be
99+
given as a dictionary where the key corresponds to the class and the value is
100+
the number of samples in the class::
101101

102102
>>> from sklearn.datasets import load_iris
103103
>>> from imblearn.datasets import make_imbalance
104104
>>> iris = load_iris()
105-
>>> ratio = {0: 20, 1: 30, 2: 40}
106-
>>> X_imb, y_imb = make_imbalance(iris.data, iris.target, ratio=ratio)
105+
>>> sampling_strategy = {0: 20, 1: 30, 2: 40}
106+
>>> X_imb, y_imb = make_imbalance(iris.data, iris.target,
107+
... sampling_strategy=sampling_strategy)
107108
>>> sorted(Counter(y_imb).items())
108109
[(0, 20), (1, 30), (2, 40)]
109110

110111
Note that all samples of a class are passed-through if the class is not mentioned
111112
in the dictionary::
112113

113-
>>> ratio = {0: 10}
114-
>>> X_imb, y_imb = make_imbalance(iris.data, iris.target, ratio=ratio)
114+
>>> sampling_strategy = {0: 10}
115+
>>> X_imb, y_imb = make_imbalance(iris.data, iris.target,
116+
... sampling_strategy=sampling_strategy)
115117
>>> sorted(Counter(y_imb).items())
116118
[(0, 10), (1, 50), (2, 50)]
117119

118120
Instead of a dictionary, a function can be defined and directly pass to
119-
``ratio``::
121+
``sampling_strategy``::
120122

121123
>>> def ratio_multiplier(y):
122124
... multiplier = {0: 0.5, 1: 0.7, 2: 0.95}
@@ -125,9 +127,9 @@ Instead of a dictionary, a function can be defined and directly pass to
125127
... target_stats[key] = int(value * multiplier[key])
126128
... return target_stats
127129
>>> X_imb, y_imb = make_imbalance(iris.data, iris.target,
128-
... ratio=ratio_multiplier)
130+
... sampling_strategy=ratio_multiplier)
129131
>>> sorted(Counter(y_imb).items())
130132
[(0, 25), (1, 35), (2, 47)]
131133

132134
See :ref:`sphx_glr_auto_examples_datasets_plot_make_imbalance.py` and
133-
:ref:`sphx_glr_auto_examples_plot_ratio_usage.py`.
135+
:ref:`sphx_glr_auto_examples_plot_sampling_strategy_usage.py`.

doc/developers_utils.rst

+4-3
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,10 @@ which accepts arrays, matrices, or sparse matrices as arguments, the following
2626
should be used when applicable.
2727

2828
- :func:`check_neighbors_object`: Check the objects is consistent to be a NN.
29-
- :func:`check_target_type`: Check the target types to be conform to the current samplers.
30-
- :func:`check_ratio`: Checks ratio for consistent type and return a dictionary
31-
containing each targeted class with its corresponding number of pixel.
29+
- :func:`check_target_type`: Check the target types to be conform to the current sam plers.
30+
- :func:`check_sampling_strategy`: Checks that sampling target is onsistent with
31+
the type and return a dictionary containing each targeted class with its
32+
corresponding number of pixel.
3233

3334

3435
Deprecation

doc/ensemble.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -92,12 +92,13 @@ output of an :class:`EasyEnsemble` sampler with an ensemble of classifiers
9292
(i.e. ``BaggingClassifier``). Therefore, :class:`BalancedBaggingClassifier`
9393
takes the same parameters than the scikit-learn
9494
``BaggingClassifier``. Additionally, there is two additional parameters,
95-
``ratio`` and ``replacement``, as in the :class:`EasyEnsemble` sampler::
95+
``sampling_strategy`` and ``replacement``, as in the :class:`EasyEnsemble`
96+
sampler::
9697

9798

9899
>>> from imblearn.ensemble import BalancedBaggingClassifier
99100
>>> bbc = BalancedBaggingClassifier(base_estimator=DecisionTreeClassifier(),
100-
... ratio='auto',
101+
... sampling_strategy='auto',
101102
... replacement=False,
102103
... random_state=0)
103104
>>> bbc.fit(X_train, y_train) # doctest: +ELLIPSIS

doc/under_sampling.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ by considering independently each targeted class::
103103
>>> print(np.vstack({tuple(row) for row in X_resampled}).shape)
104104
(181, 2)
105105

106-
See :ref:`sphx_glr_auto_examples_plot_ratio_usage.py`,
106+
See :ref:`sphx_glr_auto_examples_plot_sampling_strategy_usage.py`.,
107107
:ref:`sphx_glr_auto_examples_under-sampling_plot_comparison_under_sampling.py`,
108108
and :ref:`sphx_glr_auto_examples_under-sampling_plot_random_under_sampler.py`.
109109

@@ -214,11 +214,11 @@ the samples of interest in green.
214214
:scale: 60
215215
:align: center
216216

217-
The parameter ``ratio`` control which sample of the link will be removed. For
218-
instance, the default (i.e., ``ratio='auto'``) will remove the sample from the
219-
majority class. Both samples from the majority and minority class can be
220-
removed by setting ``ratio`` to ``'all'``. The figure illustrates this
221-
behaviour.
217+
The parameter ``sampling_strategy`` control which sample of the link will be
218+
removed. For instance, the default (i.e., ``sampling_strategy='auto'``) will
219+
remove the sample from the majority class. Both samples from the majority and
220+
minority class can be removed by setting ``sampling_strategy`` to ``'all'``. The
221+
figure illustrates this behaviour.
222222

223223
.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_002.png
224224
:target: ./auto_examples/under-sampling/plot_illustration_tomek_links.html

doc/whats_new/v0.0.4.rst

+29
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,18 @@ Version 0.4 (under development)
66
Changelog
77
---------
88

9+
API
10+
...
11+
12+
- Replace the parameter ``ratio`` by ``sampling_strategy``. :issue:`411` by
13+
:user:`Guillaume Lemaitre <glemaitre>`.
14+
15+
- Enable to use a ``float`` with binary classification for
16+
``sampling_strategy``. :issue:`411` by :user:`Guillaume Lemaitre <glemaitre>`.
17+
18+
- Enable to use a ``list`` for the cleaning methods to specify the class to
19+
sample. :issue:`411` by :user:`Guillaume Lemaitre <glemaitre>`.
20+
921
Enhancement
1022
...........
1123

@@ -34,3 +46,20 @@ Maintenance
3446

3547
- Remove deprecated parameters in 0.2 - :issue:`331` by :user:`Guillaume
3648
Lemaitre <glemaitre>`.
49+
50+
Deprecation
51+
...........
52+
53+
- Deprecate ``ratio`` in favor of ``sampling_strategy``. :issue:`411` by
54+
:user:`Guillaume Lemaitre <glemaitre>`.
55+
56+
- Deprecate the use of a ``dict`` for cleaning methods. a ``list`` should be
57+
used. :issue:`411` by :user:`Guillaume Lemaitre <glemaitre>`.
58+
59+
- Deprecate ``random_state`` in :class:`imblearn.under_sampling.NearMiss`,
60+
:class:`imblearn.under_sampling.EditedNearestNeighbors`,
61+
:class:`imblearn.under_sampling.RepeatedEditedNearestNeighbors`,
62+
:class:`imblearn.under_sampling.AllKNN`,
63+
:class:`imblearn.under_sampling.NeighbourhoodCleaningRule`,
64+
:class:`imblearn.under_sampling.InstanceHardnessThreshold`,
65+
:class:`imblearn.under_sampling.CondensedNearestNeighbours`.

examples/applications/plot_multi_class_under_sampling.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,9 @@
2929

3030
# Create a folder to fetch the dataset
3131
iris = load_iris()
32-
X, y = make_imbalance(iris.data, iris.target, ratio={0: 25, 1: 50, 2: 50},
33-
random_state=0)
32+
X, y = make_imbalance(iris.data, iris.target,
33+
sampling_strategy={0: 25, 1: 50, 2: 50},
34+
random_state=RANDOM_STATE)
3435

3536
X_train, X_test, y_train, y_test = train_test_split(
3637
X, y, random_state=RANDOM_STATE)
@@ -39,7 +40,7 @@
3940
print('Testing target statistics: {}'.format(Counter(y_test)))
4041

4142
# Create a pipeline
42-
pipeline = make_pipeline(NearMiss(version=2, random_state=RANDOM_STATE),
43+
pipeline = make_pipeline(NearMiss(version=2),
4344
LinearSVC(random_state=RANDOM_STATE))
4445
pipeline.fit(X_train, y_train)
4546

examples/datasets/plot_make_imbalance.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,12 @@ def ratio_func(y, multiplier, minority_class):
5555
for i, multiplier in enumerate(multipliers, start=1):
5656
ax = axs[i]
5757

58-
X_, y_ = make_imbalance(X, y, ratio=ratio_func,
58+
X_, y_ = make_imbalance(X, y, sampling_strategy=ratio_func,
5959
**{"multiplier": multiplier,
6060
"minority_class": 1})
6161
ax.scatter(X_[y_ == 0, 0], X_[y_ == 0, 1], label="Class #0", alpha=0.5)
6262
ax.scatter(X_[y_ == 1, 0], X_[y_ == 1, 1], label="Class #1", alpha=0.5)
63-
ax.set_title('ratio = {}'.format(multiplier))
63+
ax.set_title('sampling_strategy = {}'.format(multiplier))
6464
plot_decoration(ax)
6565

6666
plt.tight_layout()

examples/plot_ratio_usage.py

-134
This file was deleted.

0 commit comments

Comments
 (0)