-
Notifications
You must be signed in to change notification settings - Fork 14
Investigate compiler performance problems related to dynamic classloading of compiler plugins #458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The classloading profiler in JMH (
If I let them both warm up for 10x10s, I see that the enabling the macro-paradise plugin slows the result from 785ms to 897 (1.14x). Next experiment is to add macro-paradise to the classpath of the forked JVM directly, to find out how much of that 14% is due to inherent cost of the plugin vs how much is due to the unintended costs of dynamic classlosding. |
I've just tested what happens when the plugin is added to the classpath containing
So the presence of macro paradise slows down this benchmark by 14%, but only 5% of that seems to be the direct fault of the plugin itself. (Updated numbers with 3 forks in the benchmark) |
Macro implementations are also called with a dynamic classloader. It would be interesting to see if the same trick of putting the macro on the compiler classpath (or, just caching the dynamically created classloader) affects the performance of compilation of code that uses macros. |
I have a prototype at https://github.com/scalacenter/scala/tree/ticket/sd-458 for both macros and plugins. I'm in the process of benchmarking the impact in small and big OSS projects. I will post benchmark results as soon as I have them, and then open a pull request. My approach caches dynamically created classloaders only if all the URLs are either jar or zip files (as suggested by @retronym). The invalidation of these files assumes that the last modified time of the zip file changes if one of the zip archive entries changes. This may not be true if a postprocessing step is applied to compilation products, like reproducible-maven-build-plugin, which sets all zip and jar timestamps to zero to achieve build reproducibility. (However, as I understand it, this breaks classpath caching for the compiler too; so I'm not sure it's an immediate problem we need to solve now. If we aim for a solid solution, we should hash the contents of every classpath entry, but this takes time for huge classpaths.) If a file that is not a zip or jar is found, the caching doesn't happen. This occurs, by default, when a classpath entry ends with the following classpath entries in sbt ( |
Summary
Where These results confirm that caching dynamically loaded compiler plugins works, Machine
Processor detailsHyper threading is disabled, NUMA is disabled, Turbo boost is disabled, cpu ImplicationsDynamically loaded compiler plugins are the norm, so more or less all builds NextI'm working on collecting benchmarking results for macro libraries. I will |
To measure the impact of this patch against code using macros, I have tried first with case-app, a CLI library built on top of shapeless.
For the benchmark I've used scalac-profiling's sbt plugin, which has a task that allows sbt to warm up a compiler for a certain duration. Here are the logs of the session. I'd like to try with Tomorrow I'll gather more fine-grained results for |
Reminder: you can easily remove SBT from the benchmarking equation with the instructions in #392. |
Are you sure it zero-s the timestamp of the JAR file itself? I thought it was just of the ZIP metadata inside the JAR. We could/should add file size into the cache key as another way to help out. |
It includes the classloading changes in scala/scala-dev#458.
(I'll double check tomorrow, but that was my understanding from reading the source. IIRC Bazel also does zero all timestamps, including the ones of the file; I'm confirming tomorrow.) |
Apparently the maven plugin only does it for the zip entries, but build tools like Bazel zero out all timestamps (including the file system's timestamp of a jar), so the current invalidation algorithm would fail. I agree adding the file size to the cache key is a good idea, I'll do that. We may want to experiment with a more robust solution that reliably caches the classpath. We may be able to accommodate this use case in two ways:
|
PerformanceHere is my continuation about the impact of the paches in this Scala Center branch. This time, I have benchmarked two important open source projects: Circe and upickle. Tested Scala versions
CirceThis is a benchmark of Circe's test suite. Command to generate circe benchmarksfor args_file in \
/home/benchmarks/experiments/circe/modules/tests/js/target/test.args \
/home/benchmarks/experiments/circe/modules/tests/js/target/test-with-classes-dir.args \
/home/benchmarks/experiments/circe/modules/tests/jvm/target/test.args \
/home/benchmarks/experiments/circe/modules/tests/jvm/target/test-with-classes-dir.args; do \
for v in 2.12.5-bin-dbd90495c2-SNAPSHOT 2.12.5-bin-f18e3c59fd-SNAPSHOT 2.12.5-bin-0417fcf133-SNAPSHOT; do \
/usr/bin/sbt "set scalaVersion in compilation := \"$v\"" "hot -f1 -wi 10 -i 8 -p source=@$args_file" || break; \
done Test argument files for CirceThe
-deprecation |
Scenario | Scala version | Compilation time (ms) | Factor vs baseline |
---|---|---|---|
Scala.js with jar deps | 2.12.5-bin-dbd90495c2-SNAPSHOT |
31985.762 ± 1755.206 | |
Scala.js with jar deps | 2.12.5-bin-f18e3c59fd-SNAPSHOT |
23664.263 ± 1119.504 | 1.35x |
Scala.js with jar deps | 2.12.5-bin-0417fcf133-SNAPSHOT |
23769.121 ± 1137.835 | 1.35x |
Scala.js with classes dirs | 2.12.5-bin-dbd90495c2-SNAPSHOT |
32438.747 ± 1011.336 | |
Scala.js with classes dirs | 2.12.5-bin-f18e3c59fd-SNAPSHOT |
23882.367 ± 924.465 | 1.36x |
Scala.js with classes dirs | 2.12.5-bin-0417fcf133-SNAPSHOT |
23781.704 ± 1392.437 | 1.36x |
JVM with jar deps | 2.12.5-bin-dbd90495c2-SNAPSHOT |
21222.515 ± 1291.782 | |
JVM with jar deps | 2.12.5-bin-f18e3c59fd-SNAPSHOT |
21583.888 ± 1028.630 | 0.99x |
JVM with jar deps | 2.12.5-bin-0417fcf133-SNAPSHOT |
21831.352 ± 1169.938 | 0.98x |
JVM with classes dirs | 2.12.5-bin-dbd90495c2-SNAPSHOT |
21445.476 ± 1429.397 | |
JVM with classes dirs | 2.12.5-bin-f18e3c59fd-SNAPSHOT |
21562.917 ± 1072.940 | 0.99x |
JVM with classes dirs | 2.12.5-bin-0417fcf133-SNAPSHOT |
21055.406 ± 11114.899 | 1.02x |
Complete logs are available in this link.
Interpretation of Circe benchmarks
These results strongly suggest that:
- Code depending on compiler plugins is significantly faster after we cache their classloaders.
- For a non trivial use case like Circe, compiling code with Scalajs is 35% faster.
- Caching classloaders for macro libraries has no effect.
- There is no improvement between having some dependencies be jars or classes directories.
- This may be relevant for this Zinc ticket.
- On a meta level, Scala.js compilation is (after this patch) only 9% slower than JVM compilation.
Upickle
This is a benchmark of Upickle's Scala.js test suite.
Command to generate upickle benchmarks
for i in 2.12.5-bin-dbd90495c2-SNAPSHOT 2.12.5-bin-f18e3c59fd-SNAPSHOT; do \
sbt "set scalaVersion in compilation := \"$i\"" 'hot -f1 -wi 10 -i 8 -p source=@/data/rw/code/scala/upickle/upickle/js/target/test.args' || break; \
done
Upickle results under different scenarios
Scenario | Scala version | Compilation time (ms) | Factor vs baseline |
---|---|---|---|
Scala.js | 2.12.5-bin-dbd90495c2-SNAPSHOT |
12381.585 ± 380.321 | |
Scala.js | 2.12.5-bin-f18e3c59fd-SNAPSHOT |
10714.350 ± 355.112 | 1.16x |
Complete logs are available in this link.
Interpretation of Upickle benchmarks
These results strongly suggest that caching plugin classloaders for small-sized libraries also improves compile times. In this case, it's 16%, which seems to be the lowest improvement we've seen in all the studied scenarios.
General interpretation of results
We have seen several results of this patch in different open source projects. All of my results strongly suggest that caching compiler plugins' classloaders (dynamically loaded) is worth it, with compile-time improvements ranging from 16%, 20% to 35% in projects of different size.
My results also suggest that there is no compile-time improvement to scratch on macro-land, so the commit hash 0417fcf133
will be dropped from the upcoming PR to scala/scala, only f18e3c59fd
will be included.
With the increasing use of compiler plugins in our Community for things like tooling (Scalameta, Scala-sculpt, Splain, scalac-profilng) and the compilation to other backends (Scala.js and Scala Native), this is good news because the compilation with these projects will significantly decrease.
Thanks for the thorough analysis. I'm open to keeping macro classloaders caching even if it performance neutral, unless we can see some risks to the approach. |
Both those projects compile at such a glacial pace relative to "regular" Scala code. For example, upickle test:
So about 0.17kloc/s. Compare with 2-3kloc/s for regular code. So there must be something really inefficient in the way implicits are organized or typeclasses are being materialized, or maybe the code generated is really massive. To really get a sense of the why of the performance improvements, we'd need to understand what the bottlenecks are and compare profiles before and after. Not something that we need to do to justify the improvement, but interesting nonetheless. |
Fixes scala/scala-dev#458. It reports whenever `AbstractFile.getUrl` returns `null` if verbose is enabled.
FWIW it's not surprising at all uPickle's test suite compiles glacially slow. uPickle's code-generating macros would normally be only a tiny part of any other program (in 0.5.x, with automatic/deep derivation gone) but in the uPickle test suite those code-generating macros are 100% of the compilation run. a 10-15x ratio sounds about right; it wouldn't surprise me at all if on average each macro callsite expanded to 20-30 lines of "boring" code, especially in the test suite which tries to exercise derivation for big/complex datatypes, divided by two since only half of the lines in the test suite being macro expansions |
…(however, this supposedly breaks macros with global mutable state...) scala/scala#6412 https://twitter.com/olafurpg/status/1191299377064824832 > The caching logic for compiler plugins is enabled by default in Bloop and that one does make a difference, around 20/30%, see scala/scala-dev#458
* renames for Project fields * Disable eta-sam lint scala/bug#11644 * Enable compiler plugin & macro classloader caching for faster builds (however, this supposedly breaks macros with global mutable state...) scala/scala#6412 https://twitter.com/olafurpg/status/1191299377064824832 > The caching logic for compiler plugins is enabled by default in Bloop and that one does make a difference, around 20/30%, see scala/scala-dev#458 * Don't use -Xsource: since it's not recommended scala/bug#11661 * fix Ybackend-parallelism option * empty `enabled` default parameter for Plugins
scalameta/scalameta#1181 (comment)
The text was updated successfully, but these errors were encountered: