You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/python/ml-knn.md
+36-34Lines changed: 36 additions & 34 deletions
Original file line number
Diff line number
Diff line change
@@ -5,8 +5,8 @@ jupyter:
5
5
text_representation:
6
6
extension: .md
7
7
format_name: markdown
8
-
format_version: '1.1'
9
-
jupytext_version: 1.1.1
8
+
format_version: '1.2'
9
+
jupytext_version: 1.4.2
10
10
kernelspec:
11
11
display_name: Python 3
12
12
language: python
@@ -28,8 +28,8 @@ jupyter:
28
28
language: python
29
29
layout: base
30
30
name: kNN Classification
31
-
order: 1
32
-
page_type: example_index
31
+
order: 2
32
+
page_type: u-guide
33
33
permalink: python/knn-classification/
34
34
thumbnail: thumbnail/knn-classification.png
35
35
---
@@ -49,10 +49,11 @@ Using Scikit-learn, we first generate synthetic data that form the shape of a mo
49
49
50
50
In the graph, we display all the negative labels as squares, and positive labels as circles. We differentiate the training and test set by adding a dot to the center of test data.
51
51
52
+
In this example, we will use [graph objects](/python/graph-objects/), Plotly's low-level API for building figures.
53
+
52
54
```python
53
-
import numpy as np
54
-
import plotly.express as px
55
55
import plotly.graph_objects as go
56
+
import numpy as np
56
57
from sklearn.datasets import make_moons
57
58
from sklearn.model_selection import train_test_split
58
59
from sklearn.neighbors import KNeighborsClassifier
@@ -72,13 +73,13 @@ trace_specs = [
72
73
fig = go.Figure(data=[
73
74
go.Scatter(
74
75
x=X[y==label, 0], y=X[y==label, 1],
75
-
name=f'{split} Split, Label {label}',
76
+
name=f'{split} Split, Label {label}',
76
77
mode='markers', marker_symbol=marker
77
78
)
78
79
for X, y, label, split, marker in trace_specs
79
80
])
80
81
fig.update_traces(
81
-
marker_size=12, marker_line_width=1.5,
82
+
marker_size=12, marker_line_width=1.5,
82
83
marker_color="lightyellow"
83
84
)
84
85
fig.show()
@@ -89,12 +90,11 @@ fig.show()
89
90
90
91
Now, we train the kNN model on the same training data displayed in the previous graph. Then, we predict the confidence score of the model for each of the data points in the test set. We will use shapes to denote the true labels, and the color will indicate the confidence of the model for assign that score.
91
92
92
-
Notice that `px.scatter` only require 1 function call to plot both negative and positive labels, and can additionally set a continuous color scale based on the `y_score` output by our kNN model.
93
+
In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures. Notice that `px.scatter` only require 1 function call to plot both negative and positive labels, and can additionally set a continuous color scale based on the `y_score` output by our kNN model.
93
94
94
95
```python
95
-
import numpy as np
96
96
import plotly.express as px
97
-
importplotly.graph_objectsasgo
97
+
importnumpyasnp
98
98
from sklearn.datasets import make_moons
99
99
from sklearn.model_selection import train_test_split
100
100
from sklearn.neighbors import KNeighborsClassifier
labels={'symbol': 'label', 'color': 'score of <br>first class'}
@@ -124,14 +124,15 @@ fig.show()
124
124
125
125
Just like the previous example, we will first train our kNN model on the training set.
126
126
127
-
Instead of predicting the conference for the test set, we can predict the confidence map for the entire area that wraps around the dimensions of our dataset. To do this, we use [`np.meshgrid`](https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html) to create a grid, where the distance between each point is denoted by the `mesh_size` variable.
127
+
Instead of predicting the conference for the test set, we can predict the confidence map for the entire area that wraps around the dimensions of our dataset. To do this, we use [`np.meshgrid`](https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html) to create a grid, where the distance between each point is denoted by the `mesh_size` variable.
128
128
129
129
Then, for each of those points, we will use our model to give a confidence score, and plot it with a [contour plot](https://plotly.com/python/contour-plots/).
130
130
131
+
In this example, we will use [graph objects](/python/graph-objects/), Plotly's low-level API for building figures.
132
+
131
133
```python
132
-
import numpy as np
133
-
import plotly.express as px
134
134
import plotly.graph_objects as go
135
+
import numpy as np
135
136
from sklearn.datasets import make_moons
136
137
from sklearn.model_selection import train_test_split
137
138
from sklearn.neighbors import KNeighborsClassifier
@@ -161,21 +162,20 @@ Z = Z.reshape(xx.shape)
161
162
# Plot the figure
162
163
fig = go.Figure(data=[
163
164
go.Contour(
164
-
x=xrange,
165
-
y=yrange,
166
-
z=Z,
165
+
x=xrange,
166
+
y=yrange,
167
+
z=Z,
167
168
colorscale='RdBu'
168
-
)
169
+
)
169
170
])
170
171
fig.show()
171
172
```
172
173
173
174
Now, let's try to combine our `go.Contour` plot with the first scatter plot of our data points, so that we can visually compare the confidence of our model with the true labels.
174
175
175
176
```python
176
-
import numpy as np
177
-
import plotly.express as px
178
177
import plotly.graph_objects as go
178
+
import numpy as np
179
179
from sklearn.datasets import make_moons
180
180
from sklearn.model_selection import train_test_split
181
181
from sklearn.neighbors import KNeighborsClassifier
@@ -211,21 +211,21 @@ trace_specs = [
211
211
fig = go.Figure(data=[
212
212
go.Scatter(
213
213
x=X[y==label, 0], y=X[y==label, 1],
214
-
name=f'{split} Split, Label {label}',
214
+
name=f'{split} Split, Label {label}',
215
215
mode='markers', marker_symbol=marker
216
216
)
217
217
for X, y, label, split, marker in trace_specs
218
218
])
219
219
fig.update_traces(
220
-
marker_size=12, marker_line_width=1.5,
220
+
marker_size=12, marker_line_width=1.5,
221
221
marker_color="lightyellow"
222
222
)
223
223
224
224
fig.add_trace(
225
225
go.Contour(
226
-
x=xrange,
227
-
y=yrange,
228
-
z=Z,
226
+
x=xrange,
227
+
y=yrange,
228
+
z=Z,
229
229
showscale=False,
230
230
colorscale='RdBu',
231
231
opacity=0.4,
@@ -240,10 +240,12 @@ fig.show()
240
240
241
241
It is also possible to visualize the prediction confidence of the model using [heatmaps](https://plotly.com/python/heatmaps/). In this example, you can see how to compute how confident the model is about its prediction at every point in the 2D grid. Here, we define the confidence as the difference between the highest score and the score of the other classes summed, at a certain point.
242
242
243
+
In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures.
244
+
243
245
```python
244
-
import numpy as np
245
246
import plotly.express as px
246
247
import plotly.graph_objects as go
248
+
import numpy as np
247
249
from sklearn.neighbors import KNeighborsClassifier
Copy file name to clipboardExpand all lines: doc/python/ml-pca.md
+11-7Lines changed: 11 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ jupyter:
29
29
layout: base
30
30
name: PCA Visualization
31
31
order: 4
32
-
page_type: example_index
32
+
page_type: u-guide
33
33
permalink: python/pca-visualization/
34
34
thumbnail: thumbnail/ml-pca.png
35
35
---
@@ -52,6 +52,8 @@ First, let's plot all the features and see how the `species` in the Iris dataset
52
52
53
53
In our example, we are plotting all 4 features from the Iris dataset, thus we can see how `sepal_width` is compared against `sepal_length`, then against `petal_width`, and so forth. Keep in mind how some pairs of features can more easily separate different species.
54
54
55
+
In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures.
56
+
55
57
```python
56
58
import plotly.express as px
57
59
@@ -69,10 +71,12 @@ fig.show()
69
71
70
72
### Visualize all the principal components
71
73
72
-
Now, we apply `PCA` the same dataset, and retrieve **all** the components. We use the same `px.scatter_matrix` trace to display our results, but this time our features are the resulting *principal components*, ordered by how much variance they are able to explain.
74
+
Now, we apply `PCA` the same dataset, and retrieve **all** the components. We use the same `px.scatter_matrix` trace to display our results, but this time our features are the resulting *principal components*, ordered by how much variance they are able to explain.
73
75
74
76
The importance of explained variance is demonstrated in the example below. The subplot between PC3 and PC4 is clearly unable to separate each class, whereas the subplot between PC1 and PC2 shows a clear separation between each species.
75
77
78
+
In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures.
79
+
76
80
```python
77
81
import plotly.express as px
78
82
from sklearn.decomposition importPCA
@@ -83,7 +87,7 @@ features = ["sepal_width", "sepal_length", "petal_width", "petal_length"]
83
87
pca = PCA()
84
88
components = pca.fit_transform(df[features])
85
89
labels = {
86
-
str(i): f"PC {i+1} ({var:.1f}%)"
90
+
str(i): f"PC {i+1} ({var:.1f}%)"
87
91
for i, var inenumerate(pca.explained_variance_ratio_ *100)
88
92
}
89
93
@@ -122,7 +126,7 @@ labels = {str(i): f"PC {i+1}" for i in range(n_components)}
@@ -181,9 +185,9 @@ Often, you might be interested in seeing how much variance PCA is able to explai
181
185
With a higher explained variance, you are able to capture more variability in your dataset, which could potentially lead to better performance when training your model. For a more mathematical explanation, see this [Q&A thread](https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained).
0 commit comments