[testing] Execjs runtime benchmarks #290

rmosolgo · 2015-06-08T17:55:32Z

(I don't really want to merge this, I just want to share some code & results)

I ran some tests to compare ExecJS runtimes & connection pool sizes:

I added ExecJSRenderer to sidestep the dependency on Rails asset pipeline
I threw duktape.rb in the mix just for kicks
update: try benchmark with lots of threads accessing the lock

Results

rmosolgo ~/code/react-rails $ ruby -I lib  benchmarks/server_rendering_benchmark.rb
                                          user     system      total        real
threaded, 1 conn, therubyracer (V8)   0.120000   0.010000   0.130000 (  0.127382)
threaded, 1 conn, Duktape             0.460000   0.010000   0.470000 (  0.465932)
threaded, 1 conn, JavaScriptCore      0.110000   0.250000   2.460000 (  2.487619)
threaded, 1 conn, Node.js (V8)        0.110000   0.250000   9.560000 (  9.504323)
threaded, 10 conn, therubyracer (V8)  0.070000   0.010000   0.080000 (  0.074837)
threaded, 10 conn, Duktape            1.480000   0.040000   1.520000 (  1.515937)
threaded, 10 conn, JavaScriptCore     0.080000   0.230000   3.840000 (  0.932142)
threaded, 10 conn, Node.js (V8)       0.140000   0.250000  16.970000 (  3.768226)
threaded, 25 conn, therubyracer (V8)  0.060000   0.010000   0.070000 (  0.068196)
threaded, 25 conn, Duktape            3.170000   0.060000   3.230000 (  3.230653)
threaded, 25 conn, JavaScriptCore     0.190000   0.300000   4.240000 (  1.856800)
threaded, 25 conn, Node.js (V8)       0.080000   0.240000  17.570000 (  5.963759)
1 conn, therubyracer (V8)             0.070000   0.010000   0.080000 (  0.080440)
1 conn, Duktape                       0.470000   0.020000   0.490000 (  0.480219)
1 conn, JavaScriptCore                0.090000   0.230000   2.400000 (  2.442591)
1 conn, Node.js (V8)                  0.100000   0.250000   9.360000 (  9.294853)
10 conn, therubyracer (V8)            0.060000   0.010000   0.070000 (  0.079364)
10 conn, Duktape                      1.530000   0.040000   1.570000 (  1.558580)
10 conn, JavaScriptCore               0.090000   0.260000   4.050000 (  0.994402)
10 conn, Node.js (V8)                 0.160000   0.270000  17.320000 (  3.884776)
25 conn, therubyracer (V8)            0.090000   0.010000   0.100000 (  0.094128)
25 conn, Duktape                      3.300000   0.070000   3.370000 (  3.371425)
25 conn, JavaScriptCore               0.080000   0.300000   4.890000 (  1.982959)
25 conn, Node.js (V8)                 0.220000   0.340000  23.380000 (  8.886694)

Sorted

# Ruby Racer 

threaded, 1 conn, therubyracer (V8)   0.120000   0.010000   0.130000 (  0.127382)
threaded, 25 conn, therubyracer (V8)  0.060000   0.010000   0.070000 (  0.068196)
threaded, 10 conn, therubyracer (V8)  0.070000   0.010000   0.080000 (  0.074837)
1 conn, therubyracer (V8)             0.070000   0.010000   0.080000 (  0.080440)
10 conn, therubyracer (V8)            0.060000   0.010000   0.070000 (  0.079364)
25 conn, therubyracer (V8)            0.090000   0.010000   0.100000 (  0.094128)

# Duktape 

threaded, 1 conn, Duktape             0.460000   0.010000   0.470000 (  0.465932)
threaded, 10 conn, Duktape            1.480000   0.040000   1.520000 (  1.515937)
threaded, 25 conn, Duktape            3.170000   0.060000   3.230000 (  3.230653)
1 conn, Duktape                       0.470000   0.020000   0.490000 (  0.480219)
10 conn, Duktape                      1.530000   0.040000   1.570000 (  1.558580)
25 conn, Duktape                      3.300000   0.070000   3.370000 (  3.371425)

# JavaScriptCore

threaded, 1 conn, JavaScriptCore      0.110000   0.250000   2.460000 (  2.487619)
threaded, 10 conn, JavaScriptCore     0.080000   0.230000   3.840000 (  0.932142)
threaded, 25 conn, JavaScriptCore     0.190000   0.300000   4.240000 (  1.856800)
1 conn, JavaScriptCore                0.090000   0.230000   2.400000 (  2.442591)
10 conn, JavaScriptCore               0.090000   0.260000   4.050000 (  0.994402)
25 conn, JavaScriptCore               0.080000   0.300000   4.890000 (  1.982959)

# Node.js

threaded, 1 conn, Node.js (V8)        0.110000   0.250000   9.560000 (  9.504323)
threaded, 10 conn, Node.js (V8)       0.140000   0.250000  16.970000 (  3.768226)
threaded, 25 conn, Node.js (V8)       0.080000   0.240000  17.570000 (  5.963759)
1 conn, Node.js (V8)                  0.100000   0.250000   9.360000 (  9.294853)
10 conn, Node.js (V8)                 0.160000   0.270000  17.320000 (  3.884776)
25 conn, Node.js (V8)                 0.220000   0.340000  23.380000 (  8.886694)

Takeaways

RubyRacer is the fastest
Node.js speed decreases with a larger connection pool
Running in different threads decreases speed

…to execjs-benchmark

vipulnsward · 2015-06-08T18:09:04Z

benchmarks/server_rendering_benchmark.rb

+React::ServerRendering.renderer_options = {code: JS_CODE}
+React::ServerRendering.pool_timeout = 10
+
+def test_runtime(runtime, renders:, pool_size:)


would be nice to have these explicit

what do you mean? I think keyword args are good for expressing meaning

vipulnsward · 2015-06-08T18:18:18Z

benchmarks/server_rendering_benchmark.rb

+React::ServerRendering.renderer_options = {code: JS_CODE}
+React::ServerRendering.pool_timeout = 1000
+
+def test_runtime(runtime, renders:, pool_size:, threaded:)


I meant with the default values to be explicit- nil. Just easier to read, nothing specific :-)

mchristen · 2015-06-08T19:06:23Z

Great script.

When running the tests though, my ruby process now deadlocks consistently when pool size > 1 and using the threaded version of the test. This is a much easier way to reproduce the problem then trying to stress the entire web stack.

Looking through the thread stacks in gdb it looks like the deadlock problem is something with libv8 itself.

(gdb) info threads
  Id   Target Id         Frame 
  11   Thread 0x7f7a3346e700 (LWP 14387) "ruby-timer-thr" 0x00007f7a3215350d in poll () at ../sysdeps/unix/syscall-template.S:81
  10   Thread 0x7f7a33463700 (LWP 14392) "SignalSender" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
  9    Thread 0x7f7a2de1e700 (LWP 16028) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  8    Thread 0x7f7a2dd1d700 (LWP 16029) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  7    Thread 0x7f7a2e424700 (LWP 16030) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  6    Thread 0x7f7a2db1b700 (LWP 16031) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  5    Thread 0x7f7a26222700 (LWP 16032) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  4    Thread 0x7f7a2e121700 (LWP 16033) "server_renderi*" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  3    Thread 0x7f7a2df1f700 (LWP 16034) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  2    Thread 0x7f7a2dc1c700 (LWP 16035) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
* 1    Thread 0x7f7a3344e700 (LWP 14355) "ruby" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

Most of those threads look OK, just the ruby interpreter doing it's threading thing, BUT threads 10 and 4 seem suspicious.

(gdb) thread 10
[Switching to thread 10 (Thread 0x7f7a33463700 (LWP 14392))]
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
85      ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S: No such file or directory.
(gdb) where
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x00007f7a3077b59d in v8::internal::LinuxSemaphore::Wait() () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#2  0x00007f7a30680e8c in v8::internal::RuntimeProfiler::WaitForSomeIsolateToEnterJS() () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#3  0x00007f7a3077b7a8 in v8::internal::SignalSender::Run() () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#4  0x00007f7a3077b6ae in v8::internal::ThreadEntry(void*) () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#5  0x00007f7a32b630a4 in start_thread (arg=0x7f7a33463700) at pthread_create.c:309
#6  0x00007f7a3215c04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

(gdb) thread 4
[Switching to thread 4 (Thread 0x7f7a2e121700 (LWP 16033))]
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) where
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f7a32b654d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f7a32b65336 in __GI___pthread_mutex_lock (mutex=0x1e861f8) at ../nptl/pthread_mutex_lock.c:114
#3  0x00007f7a306f36ed in v8::internal::ThreadManager::Lock() () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#4  0x00007f7a306f3cff in v8::Locker::Initialize(v8::Isolate*) () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#5  0x00007f7a304d98cf in rr::Locker::setupLockAndCall (state=0x7f7a2e11cf94, code=37382800) at locker.cc:47
#6  0x00007f7a304d9897 in rr::Locker::doLock (argc=0, argv=0x7f7a2e122310, self=26362160) at locker.cc:39
#7  0x00007f7a32f45199 in vm_call_cfunc_with_frame (ci=<optimized out>, reg_cfp=0x7f7a2e221840, th=0x290ede0) at vm_insnhelper.c:1360
#8  vm_call_cfunc (th=<optimized out>, reg_cfp=<optimized out>, ci=<optimized out>) at vm_insnhelper.c:1453
#9  0x00007f7a32f4b30f in vm_exec_core (th=th@entry=0x290ede0, initial=initial@entry=0) at insns.def:1024
#10 0x00007f7a32f4fa28 in vm_exec (th=0x290ede0) at vm.c:1407
#11 0x00007f7a32f517ae in vm_call0_body (th=0x1e861f8, th@entry=0x290ede0, ci=0x80, ci@entry=0x7f7a2e11d5f0, argv=0x0) at vm_eval.c:180
...
...
...
#79 0x00007f7a32f55f20 in vm_invoke_proc (th=th@entry=0x290ede0, proc=proc@entry=0x212e6f0, self=26549360, defined_class=26553440, argc=0, argv=0x258d8f0, blockptr=0x0) at vm.c:886
#80 0x00007f7a32f56008 in rb_vm_invoke_proc (th=th@entry=0x290ede0, proc=proc@entry=0x212e6f0, argc=<optimized out>, argv=<optimized out>, blockptr=blockptr@entry=0x0) at vm.c:905
#81 0x00007f7a32f6b360 in thread_start_func_2 (th=th@entry=0x290ede0, stack_start=<optimized out>) at thread.c:577
#82 0x00007f7a32f6b8c0 in thread_start_func_1 (th_ptr=0x290ede0) at thread_pthread.c:809
#83 0x00007f7a32b630a4 in start_thread (arg=0x7f7a2e121700) at pthread_create.c:309
#84 0x00007f7a3215c04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

…to execjs-benchmark

rmosolgo · 2015-06-08T20:11:54Z

Yeah, I have lock problems sometimes too, I couldn't even get Benchmark.ips to run :S

Thread.new over and over is kind of a stupid way of testing, but I think it shows that there's a lock somewhere further down & ConnectionPool isn't buying us much

mchristen · 2015-06-08T20:31:30Z

I just made a pull request with some tweaks to the script that should make it execute more like how a web server would be calling into ExecJS. Should address your concern about creating lots of threads.

I also agree that connection_pool seems to not be doing much.

Execjs benchmark

rmosolgo · 2015-06-08T23:18:17Z

💸

mine keeps locking here:

rmosolgo ~/code/react-rails $ ruby -I lib  benchmarks/server_rendering_benchmark.rb
                                                    user     system      total        real
threaded, 1 conn, 2 threads, therubyracer (V8)  0.140000   0.020000   0.160000 (  0.156010)
threaded, 1 conn, 2 threads, Duktape            0.780000   0.010000   0.790000 (  0.790320)
threaded, 1 conn, 2 threads, JavaScriptCore     0.180000   0.320000   4.680000 (  4.849217)
threaded, 1 conn, 2 threads, Node.js (V8)       0.140000   0.290000  18.570000 ( 18.563796)
threaded, 1 conn, 4 threads, therubyracer (V8)  0.190000   0.020000   0.210000 (  0.206175)
threaded, 1 conn, 4 threads, Duktape            1.490000   0.020000   1.510000 (  1.500667)
threaded, 1 conn, 4 threads, JavaScriptCore     0.270000   0.580000  10.590000 ( 11.642282)
threaded, 1 conn, 4 threads, Node.js (V8)       0.250000   0.620000  38.750000 ( 38.782936)
threaded, 10 conn, 2 threads, therubyracer (V8)

But I get the point anyways!

@johnthethird, it looks like ExecJS/V8 has a lock which nullifies the use of ConnectionPool in server rendering. Do you know of any red flags in removing it?

mchristen · 2015-06-08T23:29:59Z

I've been testing both with a connection pool and without a connection pool running 10000 renders / test. I modified the script slightly so that when using more than one thread it executes 10000 / # of threads renders per thread, so each test should be calling render the same amount of times.

$ ruby -I lib benchmarks/server_rendering_benchmark.rb # WITHOUT connection_pool
                                                    user     system      total        real
threaded, 1 conn, 1 threads, therubyracer (V8)  4.480000   0.060000   4.540000 (  4.499783)
threaded, 1 conn, 2 threads, therubyracer (V8)  5.020000   0.080000   5.100000 (  4.913696)
threaded, 1 conn, 4 threads, therubyracer (V8)  5.070000   0.100000   5.170000 (  4.942754)
threaded, 1 conn, 8 threads, therubyracer (V8)  5.230000   0.070000   5.300000 (  5.068558)
threaded, 10 conn, 1 threads, therubyracer (V8)  4.580000   0.050000   4.630000 (  4.587420)
threaded, 10 conn, 2 threads, therubyracer (V8)  5.100000   0.110000   5.210000 (  5.010717)
threaded, 10 conn, 4 threads, therubyracer (V8)  4.980000   0.090000   5.070000 (  4.857607)
threaded, 10 conn, 8 threads, therubyracer (V8)  5.040000   0.080000   5.120000 (  4.893360)
threaded, 25 conn, 1 threads, therubyracer (V8)  4.470000   0.000000   4.470000 (  4.443161)
threaded, 25 conn, 2 threads, therubyracer (V8)  5.100000   0.100000   5.200000 (  5.000211)
threaded, 25 conn, 4 threads, therubyracer (V8)  4.990000   0.090000   5.080000 (  4.871171)
threaded, 25 conn, 8 threads, therubyracer (V8)  5.360000   0.070000   5.430000 (  5.197703)
1 conn, 1 threads, therubyracer (V8)            5.000000   0.020000   5.020000 (  4.991949)
1 conn, 2 threads, therubyracer (V8)            5.140000   0.080000   5.220000 (  5.019844)
1 conn, 4 threads, therubyracer (V8)            5.390000   0.080000   5.470000 (  5.260477)
1 conn, 8 threads, therubyracer (V8)            5.420000   0.080000   5.500000 (  5.276437)
10 conn, 1 threads, therubyracer (V8)           4.810000   0.020000   4.830000 (  4.801155)
10 conn, 2 threads, therubyracer (V8)           5.330000   0.070000   5.400000 (  5.179808)
10 conn, 4 threads, therubyracer (V8)           5.130000   0.070000   5.200000 (  4.998896)
10 conn, 8 threads, therubyracer (V8)           5.340000   0.100000   5.440000 (  5.207626)
25 conn, 1 threads, therubyracer (V8)           4.690000   0.030000   4.720000 (  4.684538)
25 conn, 2 threads, therubyracer (V8)           5.220000   0.050000   5.270000 (  5.059232)
25 conn, 4 threads, therubyracer (V8)           5.440000   0.050000   5.490000 (  5.282391)
25 conn, 8 threads, therubyracer (V8)           5.380000   0.110000   5.490000 (  5.257271)

[matt@popcorn:~/work/react-rails]
$ ruby -I lib benchmarks/server_rendering_benchmark.rb # WITH connection_pool
                                                    user     system      total        real
threaded, 1 conn, 1 threads, therubyracer (V8)  4.540000   0.080000   4.620000 (  4.579024)
threaded, 1 conn, 2 threads, therubyracer (V8)  5.430000   0.120000   5.550000 (  5.310940)
threaded, 1 conn, 4 threads, therubyracer (V8)  5.500000   0.150000   5.650000 (  5.459235)
threaded, 1 conn, 8 threads, therubyracer (V8)  5.620000   0.120000   5.740000 (  5.525929)
threaded, 10 conn, 1 threads, therubyracer (V8)  4.860000   0.040000   4.900000 (  4.851019)
threaded, 10 conn, 2 threads, therubyracer (V8)  6.150000   0.100000   6.250000 (  5.992818)
threaded, 10 conn, 4 threads, therubyracer (V8)  6.910000   0.100000   7.010000 (  6.786146)
threaded, 10 conn, 8 threads, therubyracer (V8)  9.820000   0.230000  10.050000 (  9.466074)
threaded, 25 conn, 1 threads, therubyracer (V8)  4.990000   0.030000   5.020000 (  4.985817)
threaded, 25 conn, 2 threads, therubyracer (V8)  6.520000   0.090000   6.610000 (  6.355268)
threaded, 25 conn, 4 threads, therubyracer (V8)  7.560000   0.150000   7.710000 (  7.434552)
threaded, 25 conn, 8 threads, therubyracer (V8)  9.790000   0.290000  10.080000 (  9.522219)
1 conn, 1 threads, therubyracer (V8)            4.930000   0.060000   4.990000 (  4.951394)
1 conn, 2 threads, therubyracer (V8)            5.720000   0.100000   5.820000 (  5.598103)
1 conn, 4 threads, therubyracer (V8)            5.900000   0.070000   5.970000 (  5.771040)
1 conn, 8 threads, therubyracer (V8)            5.860000   0.080000   5.940000 (  5.730886)
10 conn, 1 threads, therubyracer (V8)           4.970000   0.020000   4.990000 (  4.951558)
10 conn, 2 threads, therubyracer (V8)           6.670000   0.100000   6.770000 (  6.535869)
10 conn, 4 threads, therubyracer (V8)           7.700000   0.120000   7.820000 (  7.576611)
10 conn, 8 threads, therubyracer (V8)           9.120000   0.190000   9.310000 (  8.935771)
25 conn, 1 threads, therubyracer (V8)           4.980000   0.030000   5.010000 (  4.968515)
25 conn, 2 threads, therubyracer (V8)           7.150000   0.100000   7.250000 (  6.979404)
25 conn, 4 threads, therubyracer (V8)           8.150000   0.090000   8.240000 (  7.999867)
25 conn, 8 threads, therubyracer (V8)          10.100000   0.190000  10.290000 (  9.712077)

The first results are without using a connection_pool and the second is with using the pool. You can clearly see the performance degradation when using the pool, I would definitely go ahead and remove it.

I was able to fix the deadlock issue by wrapping all calls to ExecJS with Mutex#synchronize so that only one thread can be compiling or rendering at the same time. It seems silly to have to also provide our own 'GIL' for interacting with ExecJS, but it definitely fixes the deadlock problem, and it only seems to matter for therubyracer, none of the other runtimes would deadlock.

johnthethird · 2015-06-09T00:25:19Z

@rmosolgo I wouldnt think the connection pool would matter much on MRI. (Also, V8 has its own GIL, similar to MRI). But JRuby is a different story. The connection pool gem is battle hardened from its use in Sidekiq, works great on JRuby/JVM, and is used to mediate access to potentially long-running external resources/processes. So it was a perfect fit for mediating access to the JS engine, or so I thought. Ill try to come up with a benchmark that shows what happens with a truly multi-threaded ruby runtime.

mchristen · 2015-06-09T00:34:23Z

@johnthethird I thought about JRuby as well, but wouldn't the GIL in V8 negate any sort of performance from the JRuby threads? Regardless of accessing an ExecJS context through connection_pool or not they will still share the same GIL in V8.

johnthethird · 2015-06-09T01:30:25Z

Jruby has TheRubyRhino which is wicked fast and multithreaded, and lives in the same JVM.

rmosolgo · 2015-06-09T01:47:49Z

Thanks @johnthethird , I forgot about JRuby!

Here's the benchmark with Ruby Rhino & Node.js on jruby:

rmosolgo ~/code/react-rails $ jruby -I lib  benchmarks/server_rendering_benchmark.rb
                                                    user     system      total        real
threaded, 1 conn, 2 threads, therubyrhino (Rhino)  11.690000   0.180000  11.870000 (  2.922000)
threaded, 1 conn, 2 threads, Node.js (V8)           5.410000   0.280000  24.250000 ( 19.702000)
threaded, 1 conn, 4 threads, therubyrhino (Rhino)   6.460000   0.120000   6.580000 (  2.127000)
threaded, 1 conn, 4 threads, Node.js (V8)           2.340000   0.380000  39.480000 ( 38.058000)
threaded, 10 conn, 2 threads, therubyrhino (Rhino)  4.030000   0.050000   4.080000 (  1.025000)
threaded, 10 conn, 2 threads, Node.js (V8)          1.480000   0.160000  21.260000 ( 10.446000)
threaded, 10 conn, 4 threads, therubyrhino (Rhino)  5.150000   0.060000   5.210000 (  1.078000)
threaded, 10 conn, 4 threads, Node.js (V8)          1.930000   0.390000  64.290000 ( 16.491000)
1 conn, 2 threads, therubyrhino (Rhino)         2.560000   0.030000   2.590000 (  1.086000)
1 conn, 2 threads, Node.js (V8)                 0.720000   0.170000  19.420000 ( 19.410000)
1 conn, 4 threads, therubyrhino (Rhino)         1.680000   0.030000   1.710000 (  1.075000)
1 conn, 4 threads, Node.js (V8)                 2.060000   0.400000  40.220000 ( 38.874000)
10 conn, 2 threads, therubyrhino (Rhino)        0.750000   0.010000   0.760000 (  0.357000)
10 conn, 2 threads, Node.js (V8)                0.520000   0.130000  20.160000 ( 10.280000)
10 conn, 4 threads, therubyrhino (Rhino)        1.860000   0.030000   1.890000 (  0.560000)
10 conn, 4 threads, Node.js (V8)                1.960000   0.350000  59.300000 ( 15.169000)

mchristen · 2015-06-09T01:56:08Z

Gotcha, that doesn't use V8 at all so is not subject to its GIL.

The whole reason this pull request exists is because using ExecJS w/ therubyracer in a threaded environment exhibits a deadlock. So while trying to debug/figure out what/where the problem is we discovered that using the connection_pool with MRI actually hurts performance.

So it looks like perhaps the best solution is one where a connection_pool is used when the underlying JS runtime is thread-safe and JRuby is being used, otherwise don't bother because it hurts performance on MRI.

The issue of therubyracer deadlocking still isn't fixed though. until the real bug is fixed whether it is in therubyracer, libv8, or MRI itself, there needs to be some locking going on inside of react-rails to prevent against concurrent calls to anything in ExecJS. Perhaps that could also be conditional on whether or not therubyracer is being used so that we don't step in the way of JRuby.

mchristen · 2015-06-09T03:11:03Z

Here's a full test run using jruby and therubyrhino running 20k renders per test. Node.js is so slow that I'm not bothering to test against it.

I altered the script test loop order to change the size of the connection pool before it changes the thread count and you can clearly see the performance benefit of using a pool size >= thread count by looking at the real column. My machine has hyperthreading so I guess that's why everything above 4 threads fails to show any improvement.

$ ruby -I lib benchmarks/server_rendering_benchmark.rb  20000                                                                                                                                                                                                                                                                
                                                              user     system      total        real
threaded, 1 conn, 1 threads, therubyrhino (Rhino)        85.400000   0.140000  85.540000 ( 77.309994)
threaded, 2 conn, 1 threads, therubyrhino (Rhino)        75.520000   0.080000  75.600000 ( 73.073037)
threaded, 4 conn, 1 threads, therubyrhino (Rhino)        73.250000   0.060000  73.310000 ( 72.023527)
threaded, 8 conn, 1 threads, therubyrhino (Rhino)        73.640000   0.080000  73.720000 ( 72.338433)
threaded, 10 conn, 1 threads, therubyrhino (Rhino)       73.250000   0.040000  73.290000 ( 71.963621)
threaded, 1 conn, 2 threads, therubyrhino (Rhino)        76.500000   0.200000  76.700000 ( 73.928030)
threaded, 2 conn, 2 threads, therubyrhino (Rhino)        87.760000   0.040000  87.800000 ( 43.708759)
threaded, 4 conn, 2 threads, therubyrhino (Rhino)        85.640000   0.070000  85.710000 ( 42.704853)
threaded, 8 conn, 2 threads, therubyrhino (Rhino)        89.220000   0.070000  89.290000 ( 44.385474)
threaded, 10 conn, 2 threads, therubyrhino (Rhino)       89.370000   0.060000  89.430000 ( 44.542108)
threaded, 1 conn, 4 threads, therubyrhino (Rhino)        78.490000   0.550000  79.040000 ( 74.063415)
threaded, 2 conn, 4 threads, therubyrhino (Rhino)        96.940000   0.260000  97.200000 ( 47.619800)
threaded, 4 conn, 4 threads, therubyrhino (Rhino)       112.190000   0.080000 112.270000 ( 28.976933)
threaded, 8 conn, 4 threads, therubyrhino (Rhino)       110.300000   0.050000 110.350000 ( 28.598516)
threaded, 10 conn, 4 threads, therubyrhino (Rhino)      115.960000   0.020000 115.980000 ( 29.758068)
threaded, 1 conn, 8 threads, therubyrhino (Rhino)        80.580000   1.110000  81.690000 ( 73.936175)
threaded, 2 conn, 8 threads, therubyrhino (Rhino)       100.400000   0.950000 101.350000 ( 48.156891)
threaded, 4 conn, 8 threads, therubyrhino (Rhino)       124.540000   0.510000 125.050000 ( 30.938125)
threaded, 8 conn, 8 threads, therubyrhino (Rhino)       235.120000   0.290000 235.410000 ( 31.244846)
threaded, 10 conn, 8 threads, therubyrhino (Rhino)      236.010000   0.180000 236.190000 ( 31.007223)

mchristen · 2015-06-09T03:35:22Z

Here's a full run using MRI(2.2.1).

Maybe I'm doing something wrong with JRuby, but that performance is quite bad. I was using version 9.0.0.0.pre2 installed through rbenv and the runtime numbers definitely show it was utilizing more than one CPU core at a time(I could see through sysmon tools as well this was the case).

So MRI running on one CPU core is almost 4x faster than JRuby running on all 4 CPU cores?? There's obviously something else going on... right?

The way this benchmark script is setup it pretty much purely tests the performance of the associated JS runtimes, so does that mean Rhino is that much slower than V8, at least when used in this context?

$ ruby -I lib benchmarks/server_rendering_benchmark.rb  20000
                                                              user     system      total        real
threaded, 1 conn, 1 threads, therubyracer (V8)            8.700000   0.090000   8.790000 (  8.723473)
threaded, 2 conn, 1 threads, therubyracer (V8)            8.800000   0.080000   8.880000 (  8.800348)
threaded, 4 conn, 1 threads, therubyracer (V8)            9.020000   0.090000   9.110000 (  9.035414)
threaded, 8 conn, 1 threads, therubyracer (V8)            9.130000   0.080000   9.210000 (  9.141726)
threaded, 10 conn, 1 threads, therubyracer (V8)           9.100000   0.060000   9.160000 (  9.089403)
threaded, 1 conn, 2 threads, therubyracer (V8)           10.560000   0.190000  10.750000 ( 10.345181)
threaded, 2 conn, 2 threads, therubyracer (V8)           11.510000   0.190000  11.700000 ( 11.248498)
threaded, 4 conn, 2 threads, therubyracer (V8)           12.870000   0.180000  13.050000 ( 12.605508)
threaded, 8 conn, 2 threads, therubyracer (V8)           11.710000   0.210000  11.920000 ( 11.525172)
threaded, 10 conn, 2 threads, therubyracer (V8)          14.330000   0.240000  14.570000 ( 14.143688)
threaded, 1 conn, 4 threads, therubyracer (V8)           11.120000   0.220000  11.340000 ( 11.010513)
threaded, 2 conn, 4 threads, therubyracer (V8)           14.400000   0.230000  14.630000 ( 14.241322)
threaded, 4 conn, 4 threads, therubyracer (V8)           14.610000   0.220000  14.830000 ( 14.364411)
threaded, 8 conn, 4 threads, therubyracer (V8)           15.190000   0.230000  15.420000 ( 14.863919)
threaded, 10 conn, 4 threads, therubyracer (V8)          15.790000   0.180000  15.970000 ( 15.485036)
threaded, 1 conn, 8 threads, therubyracer (V8)           11.300000   0.200000  11.500000 ( 11.143031)
threaded, 2 conn, 8 threads, therubyracer (V8)           13.400000   0.270000  13.670000 ( 13.197583)
threaded, 4 conn, 8 threads, therubyracer (V8)           15.700000   0.350000  16.050000 ( 15.256964)
threaded, 8 conn, 8 threads, therubyracer (V8)           18.940000   0.440000  19.380000 ( 18.190847)
threaded, 10 conn, 8 threads, therubyracer (V8)          18.520000   0.330000  18.850000 ( 18.015099)

mchristen · 2015-06-09T05:24:35Z

I replaced the JS code that is being rendered with

var SlowComponent = React.createClass({                                 
  render: function() {                                                  
    var i, items, rand;                                                 
    rand = 0;                                                           
    i = 0;                                                              
    items = [];                                                         
    while (i < 10) {                                                    
      items.push(React.createElement('div', i + ' :)'));                
      i++;                                                              
    }                                                                   
    return React.createElement('div', items);                           
  }                                                                     
})

and now I get the following results for JRuby

$ ruby -I lib benchmarks/server_rendering_benchmark.rb  20000                                                                                                                                                                                                                                                                
                                                              user     system      total        real
threaded, 1 conn, 2 threads, therubyrhino (Rhino)        32.790000   0.290000  33.080000 ( 22.342218)
threaded, 2 conn, 2 threads, therubyrhino (Rhino)        21.270000   0.040000  21.310000 (  9.462662)
threaded, 4 conn, 2 threads, therubyrhino (Rhino)        17.340000   0.020000  17.360000 (  8.563669)
threaded, 8 conn, 2 threads, therubyrhino (Rhino)        17.390000   0.030000  17.420000 (  8.699205)
threaded, 10 conn, 2 threads, therubyrhino (Rhino)       18.570000   0.030000  18.600000 (  9.317547)
threaded, 1 conn, 4 threads, therubyrhino (Rhino)        19.690000   0.370000  20.060000 ( 17.576572)
threaded, 2 conn, 4 threads, therubyrhino (Rhino)        20.380000   0.180000  20.560000 (  9.834104)
threaded, 4 conn, 4 threads, therubyrhino (Rhino)        20.540000   0.050000  20.590000 (  5.342520)
threaded, 8 conn, 4 threads, therubyrhino (Rhino)        20.680000   0.020000  20.700000 (  5.593378)
threaded, 10 conn, 4 threads, therubyrhino (Rhino)       22.150000   0.050000  22.200000 (  5.902722)

and for MRI

$ ruby -I lib benchmarks/server_rendering_benchmark.rb  20000
                                                              user     system      total        real
threaded, 1 conn, 2 threads, therubyracer (V8)           13.300000   0.120000  13.420000 ( 13.034083)
threaded, 2 conn, 2 threads, therubyracer (V8)           14.440000   0.160000  14.600000 ( 14.172245)
threaded, 4 conn, 2 threads, therubyracer (V8)           14.490000   0.230000  14.720000 ( 14.261971)
threaded, 8 conn, 2 threads, therubyracer (V8)           14.850000   0.180000  15.030000 ( 14.610182)
threaded, 10 conn, 2 threads, therubyracer (V8)          15.150000   0.190000  15.340000 ( 14.904837)
threaded, 1 conn, 4 threads, therubyracer (V8)           14.700000   0.190000  14.890000 ( 14.527581)
threaded, 2 conn, 4 threads, therubyracer (V8)           17.250000   0.200000  17.450000 ( 17.055383)
threaded, 4 conn, 4 threads, therubyracer (V8)           19.420000   0.260000  19.680000 ( 19.221227)
threaded, 8 conn, 4 threads, therubyracer (V8)           18.680000   0.210000  18.890000 ( 18.426053)
threaded, 10 conn, 4 threads, therubyracer (V8)          19.380000   0.200000  19.580000 ( 19.127787)

Those results make more sense, so my conclusion is that Math.random() causes drastically different performance on the two JS runtimes.

johnthethird · 2015-06-09T15:29:54Z

Yea, V8 is a lot faster than Rhino, from a pure speed perspective. The new JS engine in Java 8 (Nashorn) approaches V8 in speed.

Focusing on just the pool concept, I changed to benchmark script to show the advantages of a pool (I had to normalize the renders so that each test is doing the same number of total renders taking into account the number of threads, to make it a fair test. Also, instead of Math.random do some more real life React things.)

https://gist.github.com/johnthethird/19074284591b6091e367

The thing to note in the results, is that 10 threads against a size 10 pool, makes a huge difference, obviously, for JRuby and for even for MRI if it uses an out-of-process JS engine like Node.

Re the MRI deadlock issue, it looks like not only does V8 have a GIL of its own but ExecJS also has locks around compile and eval

https://github.com/sstephenson/execjs/blob/master/lib/execjs/ruby_racer_runtime.rb#L32

So Im not sure why putting a lock around our render fixes the deadlock, but it sure does.

For MRI/RubyRacer, probably the right thing to do is have a pool size of 1. Having the pool there shouldnt really hurt performance, as the overhead of checking in/out of the pool is minimal. The question is whether to make this a hard-coded limit or just issue a warning if pool size > 1 when using RubyRacer.

rmosolgo · 2015-06-11T16:58:10Z

How's this for takeaways:

Default Pool Size to 1
Document: on JRuby, you can up the pool size
Recommend RubyRacer for MRI

??

mchristen · 2015-06-11T17:33:58Z

👍

Perhaps also include a note about the deadlock condition if you configure Pool Size > 1 with RubyRacer on MRI.

glittershark · 2015-06-16T20:20:47Z

Per my comments on #222, I'm definitely throwing my weight behind officially recommending therubyracer on MRI. Should definitely throw in a note there saying that you should do it even though the Heroku docs say not to, to avoid chumps like me thinking one suggestion overrides the other.

rmosolgo · 2015-06-16T20:56:38Z

Anybody want to cook up a PR 😬 ?? TODO:

Default Pool Size to 1
Document: Recommend RubyRacer for MRI
Document: on MRI, you get a deadlock when pool size > 1
Document: on JRuby, you can up the pool size and get effective multi-threading

rmosolgo · 2015-06-23T23:16:54Z

Updating code & docs here: https://github.com/rmosolgo/react-rails/commit/850f2b89695cc0974a7444834db815b0b2cb6e89

reactjs/react-rails#290

compilation of libv8 takes allot of time though reactjs/react-rails#290

BobbyMcWho · 2018-08-01T14:49:43Z

People finding this thread later should note that per #886 'therubyracer' should not be used anymore.

BookOfGreg · 2018-08-02T09:28:40Z

👍 @BobbyMcWho Yep! It was deprecated because it only supports LibV8 version 3, when the current version is v5. It's also notorious for segfaulting your servers in production. MiniRacer is the current recommendation and it was even a little faster since it's using a newer version of libV8.

If you remind me in a few, I might get a benchmark out of it later.

mchristen and others added 5 commits May 29, 2015 20:32

Some PUTS for tracking progress

5a680c3

Load pre-compiled assets if compiling is turned off

4c0e1ce

Bugfix for compiling assets

2ac01d9

feat(ExecJSRenderer) add ExecJSRenderer

373f3e0

benchmark(ServerRendering) add benchmarks for different renderers

e843b97

rmosolgo mentioned this pull request Jun 8, 2015

ExecJS exhibits a deadlock when multiple threads are executing with connection_pool size > 1 #289

Closed

facebook-github-bot added the CLA signed label Jun 8, 2015

Merge branch 'execjs-benchmark' of github.com:rmosolgo/react-rails in…

1d7beb9

…to execjs-benchmark

vipulnsward reviewed Jun 8, 2015
View reviewed changes

test(ServerRendering) benchmark with lots of threads

bb781c5

vipulnsward reviewed Jun 8, 2015
View reviewed changes

Merge branch 'execjs-benchmark' of github.com:rmosolgo/react-rails in…

d7e9286

…to execjs-benchmark

Configurable Thread size; multiple requests per thread

f9a8b98

Merge pull request #1 from ownerscircle/execjs-benchmark

e9c00d3

Execjs benchmark

This was referenced Jun 15, 2015

feat(ExecJSRenderer) add ExecJSRenderer #299

Merged

Use therubyracer or node.js ExecJS backend in production? #222

Closed

rmosolgo mentioned this pull request Jun 23, 2015

Config & initialization fixes #302

Merged

rmosolgo closed this Jun 23, 2015

rmosolgo deleted the execjs-benchmark branch July 1, 2015 01:43

rmosolgo mentioned this pull request Aug 17, 2015

Timing out after 1.0 -> 1.1 upgrade #334

Closed

justin808 mentioned this pull request Sep 13, 2015

Switch to therubyracer and general questions about what JS engine shakacode/react_on_rails#21

Closed

rmosolgo mentioned this pull request Dec 6, 2015

Node runtime required in production #418

Closed

erkde mentioned this pull request Feb 1, 2016

Update therubyracer gem from v0.12.1 to v0.12.2 alphagov/smart-answers#2259

Closed

eminnett mentioned this pull request Apr 10, 2016

Use a different JS runtime eminnett/methodic-io#18

Closed

n-rodriguez pushed a commit to jbox-web/easy-app that referenced this pull request Dec 26, 2016

Add therubyracer gem (Faster than NodeJS)

03ed013

reactjs/react-rails#290

kimat added a commit to kimat/s2i-ruby-alpine-mysql that referenced this pull request Feb 15, 2018

rubyracer is much faster than nodejs

f47e06b

compilation of libv8 takes allot of time though reactjs/react-rails#290

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[testing] Execjs runtime benchmarks #290

[testing] Execjs runtime benchmarks #290

rmosolgo commented Jun 8, 2015

vipulnsward Jun 8, 2015

rmosolgo Jun 8, 2015

vipulnsward Jun 8, 2015

mchristen commented Jun 8, 2015

rmosolgo commented Jun 8, 2015

mchristen commented Jun 8, 2015

rmosolgo commented Jun 8, 2015

mchristen commented Jun 8, 2015

johnthethird commented Jun 9, 2015

mchristen commented Jun 9, 2015

johnthethird commented Jun 9, 2015

rmosolgo commented Jun 9, 2015

mchristen commented Jun 9, 2015

mchristen commented Jun 9, 2015

mchristen commented Jun 9, 2015

mchristen commented Jun 9, 2015

johnthethird commented Jun 9, 2015

rmosolgo commented Jun 11, 2015

mchristen commented Jun 11, 2015

glittershark commented Jun 16, 2015

rmosolgo commented Jun 16, 2015

rmosolgo commented Jun 23, 2015

BobbyMcWho commented Aug 1, 2018

BookOfGreg commented Aug 2, 2018

[testing] Execjs runtime benchmarks #290

[testing] Execjs runtime benchmarks #290

Conversation

rmosolgo commented Jun 8, 2015

Results

Sorted

Takeaways

vipulnsward Jun 8, 2015

Choose a reason for hiding this comment

rmosolgo Jun 8, 2015

Choose a reason for hiding this comment

vipulnsward Jun 8, 2015

Choose a reason for hiding this comment

mchristen commented Jun 8, 2015

rmosolgo commented Jun 8, 2015

mchristen commented Jun 8, 2015

rmosolgo commented Jun 8, 2015

mchristen commented Jun 8, 2015

johnthethird commented Jun 9, 2015

mchristen commented Jun 9, 2015

johnthethird commented Jun 9, 2015

rmosolgo commented Jun 9, 2015

mchristen commented Jun 9, 2015

mchristen commented Jun 9, 2015

mchristen commented Jun 9, 2015

mchristen commented Jun 9, 2015

johnthethird commented Jun 9, 2015

rmosolgo commented Jun 11, 2015

mchristen commented Jun 11, 2015

glittershark commented Jun 16, 2015

rmosolgo commented Jun 16, 2015

rmosolgo commented Jun 23, 2015

BobbyMcWho commented Aug 1, 2018

BookOfGreg commented Aug 2, 2018