|
12 | 12 | "cell_type": "markdown",
|
13 | 13 | "metadata": {},
|
14 | 14 | "source": [
|
15 |
| - ":::{post} Apr 25, 2022\n", |
16 |
| - ":tags: pymc.ADVI, pymc.Bernoulli, pymc.Data, pymc.Minibatch, pymc.Model, pymc.Normal, variational inference\n", |
| 15 | + ":::{post} May 30, 2022\n", |
| 16 | + ":tags: neural networks, perceptron, variational inference, minibatch\n", |
17 | 17 | ":category: intermediate\n",
|
18 | 18 | ":author: Thomas Wiecki, updated by Chris Fonnesbeck\n",
|
19 | 19 | ":::"
|
|
28 | 28 | "**Probabilistic Programming**, **Deep Learning** and \"**Big Data**\" are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using **Variational Inference**. In this example, I will show how to use **Variational Inference** in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.\n",
|
29 | 29 | "\n",
|
30 | 30 | "### Probabilistic Programming at scale\n",
|
31 |
| - "**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using [MCMC sampling algorithms](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/) we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in PyMC, NumPyro and Stan. \n", |
| 31 | + "**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using {ref}`MCMC sampling algorithms <multilevel_modeling>` we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in several probabilistic programming packages including PyMC, NumPyro and Stan. \n", |
32 | 32 | "\n",
|
33 | 33 | "Unfortunately, when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) (e.g. [random forests](https://en.wikipedia.org/wiki/Random_forest) or [gradient boosted regression trees](https://en.wikipedia.org/wiki/Boosting_(machine_learning)).\n",
|
34 | 34 | "\n",
|
|
106 | 106 | "cell_type": "code",
|
107 | 107 | "execution_count": 3,
|
108 | 108 | "metadata": {
|
109 |
| - "ExecuteTime": { |
110 |
| - "end_time": "2023-01-25T10:08:38.513164Z", |
111 |
| - "start_time": "2023-01-25T10:08:38.487796Z" |
112 |
| - }, |
| 109 | + "collapsed": true, |
113 | 110 | "jupyter": {
|
114 | 111 | "outputs_hidden": true
|
115 | 112 | }
|
|
166 | 163 | "cell_type": "code",
|
167 | 164 | "execution_count": 5,
|
168 | 165 | "metadata": {
|
169 |
| - "ExecuteTime": { |
170 |
| - "end_time": "2023-01-25T10:08:38.862511Z", |
171 |
| - "start_time": "2023-01-25T10:08:38.796907Z" |
172 |
| - }, |
| 166 | + "collapsed": true, |
173 | 167 | "jupyter": {
|
174 | 168 | "outputs_hidden": true
|
175 | 169 | }
|
|
240 | 234 | "source": [
|
241 | 235 | "### Variational Inference: Scaling model complexity\n",
|
242 | 236 | "\n",
|
243 |
| - "We could now just run a MCMC sampler like {class}`~pymc.step_methods.hmc.nuts.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.\n", |
| 237 | + "We could now just run a MCMC sampler like {class}`pymc.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.\n", |
244 | 238 | "\n",
|
245 |
| - "Instead, we will use the {class}`~pymc.variational.inference.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior." |
| 239 | + "Instead, we will use the {class}`pymc.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior." |
246 | 240 | ]
|
247 | 241 | },
|
248 | 242 | {
|
|
361 | 355 | "cell_type": "markdown",
|
362 | 356 | "metadata": {},
|
363 | 357 | "source": [
|
364 |
| - "Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sampling.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation)." |
| 358 | + "Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation)." |
365 | 359 | ]
|
366 | 360 | },
|
367 | 361 | {
|
368 | 362 | "cell_type": "code",
|
369 | 363 | "execution_count": 9,
|
370 | 364 | "metadata": {
|
| 365 | + "collapsed": true, |
371 | 366 | "jupyter": {
|
372 | 367 | "outputs_hidden": true
|
373 | 368 | }
|
|
435 | 430 | "metadata": {},
|
436 | 431 | "outputs": [],
|
437 | 432 | "source": [
|
438 |
| - "pred = ppc.posterior_predictive[\"out\"].squeeze().mean(axis=0) > 0.5" |
| 433 | + "pred = ppc.posterior_predictive[\"out\"].mean((\"chain\", \"draw\")) > 0.5" |
439 | 434 | ]
|
440 | 435 | },
|
441 | 436 | {
|
|
505 | 500 | "cell_type": "code",
|
506 | 501 | "execution_count": 13,
|
507 | 502 | "metadata": {
|
508 |
| - "ExecuteTime": { |
509 |
| - "end_time": "2023-01-25T10:08:59.870106Z", |
510 |
| - "start_time": "2023-01-25T10:08:59.846007Z" |
511 |
| - }, |
| 503 | + "collapsed": true, |
512 | 504 | "jupyter": {
|
513 | 505 | "outputs_hidden": true
|
514 | 506 | }
|
|
524 | 516 | "cell_type": "code",
|
525 | 517 | "execution_count": 14,
|
526 | 518 | "metadata": {
|
527 |
| - "ExecuteTime": { |
528 |
| - "end_time": "2023-01-25T10:09:10.094031Z", |
529 |
| - "start_time": "2023-01-25T10:08:59.871216Z" |
530 |
| - }, |
| 519 | + "collapsed": true, |
531 | 520 | "jupyter": {
|
532 | 521 | "outputs_hidden": true
|
533 | 522 | }
|
|
630 | 619 | "cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)\n",
|
631 | 620 | "fig, ax = plt.subplots(figsize=(16, 9))\n",
|
632 | 621 | "contour = ax.contourf(\n",
|
633 |
| - " grid[0], grid[1], y_pred.squeeze().values.mean(axis=0).reshape(100, 100), cmap=cmap\n", |
| 622 | + " grid[0], grid[1], y_pred.mean((\"chain\", \"draw\")).values.reshape(100, 100), cmap=cmap\n", |
634 | 623 | ")\n",
|
635 | 624 | "ax.scatter(X_test[pred == 0, 0], X_test[pred == 0, 1], color=\"C0\")\n",
|
636 | 625 | "ax.scatter(X_test[pred == 1, 0], X_test[pred == 1, 1], color=\"C1\")\n",
|
|
860 | 849 | "\n",
|
861 | 850 | "- This notebook was originally authored as a [blog post](https://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/) by Thomas Wiecki in 2016\n",
|
862 | 851 | "- Updated by Chris Fonnesbeck for PyMC v4 in 2022\n",
|
863 |
| - "- Updated by Oriol Abril-Pla and Earl Bellinger in 2023\n", |
864 | 852 | "\n",
|
865 | 853 | "## Watermark"
|
866 | 854 | ]
|
|
916 | 904 | "hash": "5429d053af7e221df99a6f00514f0d50433afea7fb367ba3ad570571d9163dca"
|
917 | 905 | },
|
918 | 906 | "kernelspec": {
|
919 |
| - "display_name": "Python 3.9.10 ('pymc-dev-py39')", |
| 907 | + "display_name": "Python 3 (ipykernel)", |
920 | 908 | "language": "python",
|
921 | 909 | "name": "python3"
|
922 | 910 | },
|
|
0 commit comments