-
Notifications
You must be signed in to change notification settings - Fork 14
Gather real-world feedback about 2.12.3 performance #392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I ran the benchmark on the trees module in the scalameta repo. I chose that module because it synthesizes a lot of code with paradise macro annotations. The speedup factor is x0.83 (~12s down to ~10s), which seems consistent with the reported numbers in http://developer.lightbend.com/blog/2017-06-12-faster-scala-compiler/ 2.12.2
2.12.3-bin-0b150b7
|
Results for our proprietary application, ~70 kLoC in a mixture of styles: ~37s down to ~30s again gives a speedup factor of x0.83. As an additional data point, the cold benchmark shows a x0.87 speedup, ~63s down to ~55s. 2.12.2
2.12.3-bin-d1ec01a
|
scalaz-core
2.12.2
2.12.3-bin-d1ec01a
|
sbt core-macros moduleI wanted to see the results for macro libraries' code. The new 2.12.3 is 48% faster in percentile 0.99. I have cleaned the environment to make sure the benchmark is reproducible. I have fixed the min and max frequency to 2.5GHz in an Intel(R) Core(TM) i7-6600U CPU whose max CPU is 3.4GHz. At the end of the benchmarks I attach my cpuinfo and meminfo. Scala 2.12.2
Scala 2.12.3-bin-d1ec01a
Cpupower
/proc/cpuinfo
/proc/meminfo
ComputerThinkpad t460s, connected to AC. |
Very impressive!
…On 18 June 2017 4:21:29 pm Jorge ***@***.***> wrote:
# sbt core-macros module
I wanted to see the results for macro intensive code. The new 2.12.3 is
**48% faster** in percentile 0.99. I have cleaned the environment to make
sure everything is reproducible. I have fixed the min and max frequency to
2.5GHz in an Intel(R) Core(TM) i7-6600U CPU whose max CPU is 3.2GHz. At the
end of the benchmarks I attach my cpuinfo and meminfo.
## Scala 2.12.2
```[info] Benchmark (corpusVersion)
(extraArgs) (source)
Mode Cnt Score Error Units
[info] HotScalacBenchmark.compile latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample 273
1132.547 ± 5.967 ms/op
[info] HotScalacBenchmark.compile:compile·p0.00 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
1073.742 ms/op
[info] HotScalacBenchmark.compile:compile·p0.50 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
1128.268 ms/op
[info] HotScalacBenchmark.compile:compile·p0.90 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
1169.372 ms/op
[info] HotScalacBenchmark.compile:compile·p0.95 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
1188.875 ms/op
[info] HotScalacBenchmark.compile:compile·p0.99 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
1213.874 ms/op
[info] HotScalacBenchmark.compile:compile·p0.999 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
1291.846 ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
1291.846 ms/op
[info] HotScalacBenchmark.compile:compile·p1.00 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
1291.846 ms/op
```
## Scala 2.12.3-bin-d1ec01a
```[success] Total time: 642 s, completed Jun 18, 2017 3:56:52 PM
[info] Benchmark (corpusVersion)
(extraArgs) (source)
Mode Cnt Score Error Units
[info] HotScalacBenchmark.compile latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample 551
561.365 ± 2.816 ms/op
[info] HotScalacBenchmark.compile:compile·p0.00 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
530.579 ms/op
[info] HotScalacBenchmark.compile:compile·p0.50 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
557.842 ms/op
[info] HotScalacBenchmark.compile:compile·p0.90 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
586.154 ms/op
[info] HotScalacBenchmark.compile:compile·p0.95 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
597.898 ms/op
[info] HotScalacBenchmark.compile:compile·p0.99 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
631.033 ms/op
[info] HotScalacBenchmark.compile:compile·p0.999 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
677.380 ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
677.380 ms/op
[info] HotScalacBenchmark.compile:compile·p1.00 latest
@/data/rw/code/scala/sbt/core-macros/target/compile.args sample
677.380 ms/op
[success] Total time: 653 s, completed Jun 18, 2017 4:07:54 PM
```
### Cpupower
```
tribox# cpupower frequency-info
analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 400 MHz - 3.40 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 2.50 GHz and 2.50 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency: 2.50 GHz (asserted by call to hardware)
boost state support:
Supported: yes
Active: yes
```
### /proc/cpuinfo
```
tribox# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
stepping : 3
microcode : 0x88
cpu MHz : 2499.902
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx
smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp
hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 5618.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
stepping : 3
microcode : 0x88
cpu MHz : 2499.902
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx
smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp
hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 5619.35
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
stepping : 3
microcode : 0x88
cpu MHz : 2499.902
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx
smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp
hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 5620.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
stepping : 3
microcode : 0x88
cpu MHz : 2499.902
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx
smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp
hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 5619.42
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
```
### /proc/meminfo
```
MemTotal: 20430272 kB
MemFree: 12453344 kB
MemAvailable: 18396672 kB
Buffers: 4228 kB
Cached: 6259632 kB
SwapCached: 0 kB
Active: 5007704 kB
Inactive: 2550424 kB
Active(anon): 1008356 kB
Inactive(anon): 445756 kB
Active(file): 3999348 kB
Inactive(file): 2104668 kB
Unevictable: 32 kB
Mlocked: 32 kB
SwapTotal: 12582908 kB
SwapFree: 12582908 kB
Dirty: 6608 kB
Writeback: 0 kB
AnonPages: 1266372 kB
Mapped: 391608 kB
Shmem: 159844 kB
Slab: 265392 kB
SReclaimable: 205904 kB
SUnreclaim: 59488 kB
KernelStack: 6048 kB
PageTables: 18360 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 22798044 kB
Committed_AS: 3871752 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 227328 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 204204 kB
DirectMap2M: 14340096 kB
DirectMap1G: 6291456 kB
```
### Computer
Thinkpad t460s, connected to AC.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#392 (comment)
|
sbt main moduleWith the same environment as before, I've benchmarked the compilation of the main module in sbt (the one that contains lots of calls to sbt macros and has Scala 2.12.2
Scala 2.12.3-bin-d1ec01a
|
@jvican I don't think that's the best way to interpret the results, as you are comparing outliers. I'd say comparing 10089±174. vs 7825.872±110 (the result labelled "HotScalacBenchmark.compile") suggests a 22.5% improvement (compile time reduced by 0.776x) |
Sorry, I got the improvement percentage wrong in my last comment. It's actually 22.5% in p99, the same as the average you report. On a side note, I prefer to use percentiles rather than average times because percentiles give you more precise results. In computers that are running several applications at the same time (browser, other sbt instances, messaging applications, et cetera) hiccups or outliers are more prone to happen. Gil Tene's talk on measuring latency and several other interviews I've read seem to agree with this point. It is my understanding that it's good practice to use high percentiles to compare results safely and with more precision. |
30% speedup for https://github.com/mpollmeier/gremlin-scala/ which uses shapeless and a macro! :) 2.12.2
2.12.3-bin-d1ec01a
|
OnionOnion is my own programming language written in Scala. lines are about 10000. Scala 2.12.2
Scala 2.12.3-bin-e72ab5a
The result seems that switching from Scala 2.12.2 to Scala 2.12.3 cause 30% compilation speed up. |
Here are instructions to run a comparative benchmark of the performance of the 2.12.2 and 2.12.3-SNAPSHOT compilers over your project sources. We're interested in finding out real-world numbers to complement our automatically run benchmarks.
The instructions below assume
sbt
as your build tool. Users of other build tools can also benchmark their project, providing they first create an "args file" containing the command line to the compiler.Create an
sbt
plugin to export the compiler command lineExport compiler command line
We'll use the
akka-actor
subproject of Akka to demonstrate.Clone the
compiler-benchmark
projectConfigure a SBT resolver to access nightly builds
Quiesce your machine
top
to see that the machine is idle.Find latest nightly build
Use the first version number in the next step.
Execute the benchmark
Let us know your results
In the comments below.
While you are at it
jardiff
to check for unexpected changes to the bytecode generated for your project.The text was updated successfully, but these errors were encountered: