Skip to content

Add new tutorial: Multi-Objective NAS with Ax #2006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Aug 20, 2022
Merged

Conversation

Balandat
Copy link
Contributor

@Balandat Balandat commented Aug 14, 2022

Adds a new tutorial for how to perform multi-objective Neural Architecture Search with Ax. The tutorial walks the user through a case study that shows how to use Bayesian optimization in order to find Pareto-efficient tradeoffs between model complexity and performance in a sample-efficient fashion. The tutorial uses PyTorch lightning and TorchX to run a number of actual training jobs in the background, and uses Ax+BoTorch to automate the multi-objective optimization.

Built website output: Multi-Objective NAS with Ax — PyTorch Tutorial [BUILD].zip

cc @dme65, @d4l3k

@netlify
Copy link

netlify bot commented Aug 14, 2022

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit afaf12a
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/63002325ffe8aa00096ee6e8
😎 Deploy Preview https://deploy-preview-2006--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Balandat! Looks great! I added a few editorial suggestions and there seems to be a problem with data loading. From the log:

Aug 14 22:51:14 ax.exceptions.core.NoDataError: Observed data is required for generation step #1 (model MOO), but fetched data was empty. Something is wrong with experiment setup -- likely metrics do not implement fetching logic (check your metrics) or no data was attached to experiment for completed trials.

@Balandat
Copy link
Contributor Author

Thanks, @Balandat! Looks great! I added a few editorial suggestions and there seems to be a problem with data loading. From the log:

Aug 14 22:51:14 ax.exceptions.core.NoDataError: Observed data is required for generation step #1 (model MOO), but fetched data was empty. Something is wrong with experiment setup -- likely metrics do not implement fetching logic (check your metrics) or no data was attached to experiment for completed trials.

Hmm this is interesting. I've never ran into this issue before, this seems to either a problem with running the training jobs or at least with accessing their logs. A few different folks are able to run this locally, maybe it's an issue with the environment / permissions of the circleci runner? Is there a way to run this as a docker image or ssh into the job to understand what's going on?

@svekars
Copy link
Contributor

svekars commented Aug 15, 2022

@Balandat
Copy link
Contributor Author

@danielrjiang just pointed out to me that the trials are completing in one second, which is not good - it means that the training actually fails. I'll dig into why that may be the case (likely some issue with the environment or the relative location of the script file...).

@d4l3k is there a straightforward way to view the logs kept by TorchX in a programmatic way (without using the command line tool torch log)? Then I can catch the exception raised here and print the logs to debug what's going on.

@malfet
Copy link
Contributor

malfet commented Aug 17, 2022

Our CI system (and .py to -> .ipynb converter) is currently implemented around the idea, that single file corresponds to single tutorial. As this tutorial comprises of two files, it:

  • will result in generating 2 "tutorials"
  • It accidentally break CI system as different parts of tutorials are rendered in different shard, and all tutorials not targeting are temporarily rendered empty

Both problems can be easily fixed by blocklisting stripping code/rendering of auxilary file, but I'm curious how Jupyther notebook will look like in this case, and whether it is a good experience from user point of view, when they can not see part of the tutorial code on a single page.

@Balandat
Copy link
Contributor Author

Thanks for catching the issue!

I'm linking the training script in the tutorial file, so it should be pretty straightforward for the user to view it. But I do take your point about that potentially being confusing / not as clear as it could be if it's not shown on the same page, so I'd be happy to consider changing this if it makes sense.

Note that we can just pass the source code of the training script as a string to the TorchX runner. So one option could be to define the code directly in the notebook. The issue with that is that if it's a string it won't have syntax highlighting etc. and so it would be hard to read. Maybe there is some fancy way to extract the source code of a specific cell into a string (though since this is a .py file I guess the concept of a cell doesn't even exist yet prior to the parsing)?

I'm not super familiar with what one could do here using rst functionality, maybe there is a way to render the separate py file in some minipage as part of the tutorial?

malfet added a commit that referenced this pull request Aug 19, 2022
Add option (to be used by #2006) to specify extra files particular tutorial depends on
Make sure those files will not get deleted

Add a bit more type annotation to the code
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though we might want to move it to different folder later

@malfet malfet merged commit ebce103 into pytorch:master Aug 20, 2022
@Balandat Balandat deleted the ax_optim branch August 20, 2022 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants