Skip to content

Commit caebf49

Browse files
fixups
1 parent c419b09 commit caebf49

File tree

5 files changed

+124
-116
lines changed

5 files changed

+124
-116
lines changed

doc/python/ml-knn.md

Lines changed: 36 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ jupyter:
55
text_representation:
66
extension: .md
77
format_name: markdown
8-
format_version: '1.1'
9-
jupytext_version: 1.1.1
8+
format_version: '1.2'
9+
jupytext_version: 1.4.2
1010
kernelspec:
1111
display_name: Python 3
1212
language: python
@@ -28,8 +28,8 @@ jupyter:
2828
language: python
2929
layout: base
3030
name: kNN Classification
31-
order: 1
32-
page_type: example_index
31+
order: 2
32+
page_type: u-guide
3333
permalink: python/knn-classification/
3434
thumbnail: thumbnail/knn-classification.png
3535
---
@@ -49,10 +49,11 @@ Using Scikit-learn, we first generate synthetic data that form the shape of a mo
4949

5050
In the graph, we display all the negative labels as squares, and positive labels as circles. We differentiate the training and test set by adding a dot to the center of test data.
5151

52+
In this example, we will use [graph objects](/python/graph-objects/), Plotly's low-level API for building figures.
53+
5254
```python
53-
import numpy as np
54-
import plotly.express as px
5555
import plotly.graph_objects as go
56+
import numpy as np
5657
from sklearn.datasets import make_moons
5758
from sklearn.model_selection import train_test_split
5859
from sklearn.neighbors import KNeighborsClassifier
@@ -72,13 +73,13 @@ trace_specs = [
7273
fig = go.Figure(data=[
7374
go.Scatter(
7475
x=X[y==label, 0], y=X[y==label, 1],
75-
name=f'{split} Split, Label {label}',
76+
name=f'{split} Split, Label {label}',
7677
mode='markers', marker_symbol=marker
7778
)
7879
for X, y, label, split, marker in trace_specs
7980
])
8081
fig.update_traces(
81-
marker_size=12, marker_line_width=1.5,
82+
marker_size=12, marker_line_width=1.5,
8283
marker_color="lightyellow"
8384
)
8485
fig.show()
@@ -89,12 +90,11 @@ fig.show()
8990

9091
Now, we train the kNN model on the same training data displayed in the previous graph. Then, we predict the confidence score of the model for each of the data points in the test set. We will use shapes to denote the true labels, and the color will indicate the confidence of the model for assign that score.
9192

92-
Notice that `px.scatter` only require 1 function call to plot both negative and positive labels, and can additionally set a continuous color scale based on the `y_score` output by our kNN model.
93+
In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures. Notice that `px.scatter` only require 1 function call to plot both negative and positive labels, and can additionally set a continuous color scale based on the `y_score` output by our kNN model.
9394

9495
```python
95-
import numpy as np
9696
import plotly.express as px
97-
import plotly.graph_objects as go
97+
import numpy as np
9898
from sklearn.datasets import make_moons
9999
from sklearn.model_selection import train_test_split
100100
from sklearn.neighbors import KNeighborsClassifier
@@ -110,7 +110,7 @@ clf.fit(X_train, y_train)
110110
y_score = clf.predict_proba(X_test)[:, 1]
111111

112112
fig = px.scatter(
113-
X_test, x=0, y=1,
113+
X_test, x=0, y=1,
114114
color=y_score, color_continuous_scale='RdBu',
115115
symbol=y_test, symbol_map={'0': 'square-dot', '1': 'circle-dot'},
116116
labels={'symbol': 'label', 'color': 'score of <br>first class'}
@@ -124,14 +124,15 @@ fig.show()
124124

125125
Just like the previous example, we will first train our kNN model on the training set.
126126

127-
Instead of predicting the conference for the test set, we can predict the confidence map for the entire area that wraps around the dimensions of our dataset. To do this, we use [`np.meshgrid`](https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html) to create a grid, where the distance between each point is denoted by the `mesh_size` variable.
127+
Instead of predicting the conference for the test set, we can predict the confidence map for the entire area that wraps around the dimensions of our dataset. To do this, we use [`np.meshgrid`](https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html) to create a grid, where the distance between each point is denoted by the `mesh_size` variable.
128128

129129
Then, for each of those points, we will use our model to give a confidence score, and plot it with a [contour plot](https://plotly.com/python/contour-plots/).
130130

131+
In this example, we will use [graph objects](/python/graph-objects/), Plotly's low-level API for building figures.
132+
131133
```python
132-
import numpy as np
133-
import plotly.express as px
134134
import plotly.graph_objects as go
135+
import numpy as np
135136
from sklearn.datasets import make_moons
136137
from sklearn.model_selection import train_test_split
137138
from sklearn.neighbors import KNeighborsClassifier
@@ -161,21 +162,20 @@ Z = Z.reshape(xx.shape)
161162
# Plot the figure
162163
fig = go.Figure(data=[
163164
go.Contour(
164-
x=xrange,
165-
y=yrange,
166-
z=Z,
165+
x=xrange,
166+
y=yrange,
167+
z=Z,
167168
colorscale='RdBu'
168-
)
169+
)
169170
])
170171
fig.show()
171172
```
172173

173174
Now, let's try to combine our `go.Contour` plot with the first scatter plot of our data points, so that we can visually compare the confidence of our model with the true labels.
174175

175176
```python
176-
import numpy as np
177-
import plotly.express as px
178177
import plotly.graph_objects as go
178+
import numpy as np
179179
from sklearn.datasets import make_moons
180180
from sklearn.model_selection import train_test_split
181181
from sklearn.neighbors import KNeighborsClassifier
@@ -211,21 +211,21 @@ trace_specs = [
211211
fig = go.Figure(data=[
212212
go.Scatter(
213213
x=X[y==label, 0], y=X[y==label, 1],
214-
name=f'{split} Split, Label {label}',
214+
name=f'{split} Split, Label {label}',
215215
mode='markers', marker_symbol=marker
216216
)
217217
for X, y, label, split, marker in trace_specs
218218
])
219219
fig.update_traces(
220-
marker_size=12, marker_line_width=1.5,
220+
marker_size=12, marker_line_width=1.5,
221221
marker_color="lightyellow"
222222
)
223223

224224
fig.add_trace(
225225
go.Contour(
226-
x=xrange,
227-
y=yrange,
228-
z=Z,
226+
x=xrange,
227+
y=yrange,
228+
z=Z,
229229
showscale=False,
230230
colorscale='RdBu',
231231
opacity=0.4,
@@ -240,10 +240,12 @@ fig.show()
240240

241241
It is also possible to visualize the prediction confidence of the model using [heatmaps](https://plotly.com/python/heatmaps/). In this example, you can see how to compute how confident the model is about its prediction at every point in the 2D grid. Here, we define the confidence as the difference between the highest score and the score of the other classes summed, at a certain point.
242242

243+
In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures.
244+
243245
```python
244-
import numpy as np
245246
import plotly.express as px
246247
import plotly.graph_objects as go
248+
import numpy as np
247249
from sklearn.neighbors import KNeighborsClassifier
248250

249251
mesh_size = .02
@@ -275,21 +277,21 @@ diff = proba.max(axis=-1) - (proba.sum(axis=-1) - proba.max(axis=-1))
275277

276278
fig = px.scatter(
277279
df_test, x='sepal_length', y='sepal_width',
278-
symbol='species',
280+
symbol='species',
279281
symbol_map={
280-
'setosa': 'square-dot',
281-
'versicolor': 'circle-dot',
282+
'setosa': 'square-dot',
283+
'versicolor': 'circle-dot',
282284
'virginica': 'diamond-dot'},
283285
)
284286
fig.update_traces(
285-
marker_size=12, marker_line_width=1.5,
287+
marker_size=12, marker_line_width=1.5,
286288
marker_color="lightyellow"
287289
)
288290
fig.add_trace(
289291
go.Heatmap(
290-
x=lrange,
291-
y=wrange,
292-
z=diff,
292+
x=lrange,
293+
y=wrange,
294+
z=diff,
293295
opacity=0.25,
294296
customdata=proba,
295297
colorscale='RdBu',

doc/python/ml-pca.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jupyter:
2929
layout: base
3030
name: PCA Visualization
3131
order: 4
32-
page_type: example_index
32+
page_type: u-guide
3333
permalink: python/pca-visualization/
3434
thumbnail: thumbnail/ml-pca.png
3535
---
@@ -52,6 +52,8 @@ First, let's plot all the features and see how the `species` in the Iris dataset
5252

5353
In our example, we are plotting all 4 features from the Iris dataset, thus we can see how `sepal_width` is compared against `sepal_length`, then against `petal_width`, and so forth. Keep in mind how some pairs of features can more easily separate different species.
5454

55+
In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures.
56+
5557
```python
5658
import plotly.express as px
5759

@@ -69,10 +71,12 @@ fig.show()
6971

7072
### Visualize all the principal components
7173

72-
Now, we apply `PCA` the same dataset, and retrieve **all** the components. We use the same `px.scatter_matrix` trace to display our results, but this time our features are the resulting *principal components*, ordered by how much variance they are able to explain.
74+
Now, we apply `PCA` the same dataset, and retrieve **all** the components. We use the same `px.scatter_matrix` trace to display our results, but this time our features are the resulting *principal components*, ordered by how much variance they are able to explain.
7375

7476
The importance of explained variance is demonstrated in the example below. The subplot between PC3 and PC4 is clearly unable to separate each class, whereas the subplot between PC1 and PC2 shows a clear separation between each species.
7577

78+
In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures.
79+
7680
```python
7781
import plotly.express as px
7882
from sklearn.decomposition import PCA
@@ -83,7 +87,7 @@ features = ["sepal_width", "sepal_length", "petal_width", "petal_length"]
8387
pca = PCA()
8488
components = pca.fit_transform(df[features])
8589
labels = {
86-
str(i): f"PC {i+1} ({var:.1f}%)"
90+
str(i): f"PC {i+1} ({var:.1f}%)"
8791
for i, var in enumerate(pca.explained_variance_ratio_ * 100)
8892
}
8993

@@ -122,7 +126,7 @@ labels = {str(i): f"PC {i+1}" for i in range(n_components)}
122126
labels['color'] = 'Median Price'
123127

124128
fig = px.scatter_matrix(
125-
components,
129+
components,
126130
color=boston.target,
127131
dimensions=range(n_components),
128132
labels=labels,
@@ -167,7 +171,7 @@ components = pca.fit_transform(X)
167171
total_var = pca.explained_variance_ratio_.sum() * 100
168172

169173
fig = px.scatter_3d(
170-
components, x=0, y=1, z=2, color=df['species'],
174+
components, x=0, y=1, z=2, color=df['species'],
171175
title=f'Total Explained Variance: {total_var:.2f}%',
172176
labels={'0': 'PC 1', '1': 'PC 2', '2': 'PC 3'}
173177
)
@@ -181,9 +185,9 @@ Often, you might be interested in seeing how much variance PCA is able to explai
181185
With a higher explained variance, you are able to capture more variability in your dataset, which could potentially lead to better performance when training your model. For a more mathematical explanation, see this [Q&A thread](https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained).
182186

183187
```python
188+
import plotly.express as px
184189
import numpy as np
185190
import pandas as pd
186-
import plotly.express as px
187191
from sklearn.decomposition import PCA
188192
from sklearn.datasets import load_diabetes
189193

@@ -196,7 +200,7 @@ exp_var_cumul = np.cumsum(pca.explained_variance_ratio_)
196200

197201
px.area(
198202
x=range(1, exp_var_cumul.shape[0] + 1),
199-
y=exp_var_cumul,
203+
y=exp_var_cumul,
200204
labels={"x": "# Components", "y": "Explained Variance"}
201205
)
202206
```

0 commit comments

Comments
 (0)