Skip to content

[testing] Execjs runtime benchmarks #290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Closed

[testing] Execjs runtime benchmarks #290

wants to merge 10 commits into from

Conversation

rmosolgo
Copy link
Member

@rmosolgo rmosolgo commented Jun 8, 2015

(I don't really want to merge this, I just want to share some code & results)

I ran some tests to compare ExecJS runtimes & connection pool sizes:

  • I added ExecJSRenderer to sidestep the dependency on Rails asset pipeline
  • I threw duktape.rb in the mix just for kicks
  • update: try benchmark with lots of threads accessing the lock

Results

rmosolgo ~/code/react-rails $ ruby -I lib  benchmarks/server_rendering_benchmark.rb
                                          user     system      total        real
threaded, 1 conn, therubyracer (V8)   0.120000   0.010000   0.130000 (  0.127382)
threaded, 1 conn, Duktape             0.460000   0.010000   0.470000 (  0.465932)
threaded, 1 conn, JavaScriptCore      0.110000   0.250000   2.460000 (  2.487619)
threaded, 1 conn, Node.js (V8)        0.110000   0.250000   9.560000 (  9.504323)
threaded, 10 conn, therubyracer (V8)  0.070000   0.010000   0.080000 (  0.074837)
threaded, 10 conn, Duktape            1.480000   0.040000   1.520000 (  1.515937)
threaded, 10 conn, JavaScriptCore     0.080000   0.230000   3.840000 (  0.932142)
threaded, 10 conn, Node.js (V8)       0.140000   0.250000  16.970000 (  3.768226)
threaded, 25 conn, therubyracer (V8)  0.060000   0.010000   0.070000 (  0.068196)
threaded, 25 conn, Duktape            3.170000   0.060000   3.230000 (  3.230653)
threaded, 25 conn, JavaScriptCore     0.190000   0.300000   4.240000 (  1.856800)
threaded, 25 conn, Node.js (V8)       0.080000   0.240000  17.570000 (  5.963759)
1 conn, therubyracer (V8)             0.070000   0.010000   0.080000 (  0.080440)
1 conn, Duktape                       0.470000   0.020000   0.490000 (  0.480219)
1 conn, JavaScriptCore                0.090000   0.230000   2.400000 (  2.442591)
1 conn, Node.js (V8)                  0.100000   0.250000   9.360000 (  9.294853)
10 conn, therubyracer (V8)            0.060000   0.010000   0.070000 (  0.079364)
10 conn, Duktape                      1.530000   0.040000   1.570000 (  1.558580)
10 conn, JavaScriptCore               0.090000   0.260000   4.050000 (  0.994402)
10 conn, Node.js (V8)                 0.160000   0.270000  17.320000 (  3.884776)
25 conn, therubyracer (V8)            0.090000   0.010000   0.100000 (  0.094128)
25 conn, Duktape                      3.300000   0.070000   3.370000 (  3.371425)
25 conn, JavaScriptCore               0.080000   0.300000   4.890000 (  1.982959)
25 conn, Node.js (V8)                 0.220000   0.340000  23.380000 (  8.886694)

Sorted

# Ruby Racer 

threaded, 1 conn, therubyracer (V8)   0.120000   0.010000   0.130000 (  0.127382)
threaded, 25 conn, therubyracer (V8)  0.060000   0.010000   0.070000 (  0.068196)
threaded, 10 conn, therubyracer (V8)  0.070000   0.010000   0.080000 (  0.074837)
1 conn, therubyracer (V8)             0.070000   0.010000   0.080000 (  0.080440)
10 conn, therubyracer (V8)            0.060000   0.010000   0.070000 (  0.079364)
25 conn, therubyracer (V8)            0.090000   0.010000   0.100000 (  0.094128)

# Duktape 

threaded, 1 conn, Duktape             0.460000   0.010000   0.470000 (  0.465932)
threaded, 10 conn, Duktape            1.480000   0.040000   1.520000 (  1.515937)
threaded, 25 conn, Duktape            3.170000   0.060000   3.230000 (  3.230653)
1 conn, Duktape                       0.470000   0.020000   0.490000 (  0.480219)
10 conn, Duktape                      1.530000   0.040000   1.570000 (  1.558580)
25 conn, Duktape                      3.300000   0.070000   3.370000 (  3.371425)

# JavaScriptCore

threaded, 1 conn, JavaScriptCore      0.110000   0.250000   2.460000 (  2.487619)
threaded, 10 conn, JavaScriptCore     0.080000   0.230000   3.840000 (  0.932142)
threaded, 25 conn, JavaScriptCore     0.190000   0.300000   4.240000 (  1.856800)
1 conn, JavaScriptCore                0.090000   0.230000   2.400000 (  2.442591)
10 conn, JavaScriptCore               0.090000   0.260000   4.050000 (  0.994402)
25 conn, JavaScriptCore               0.080000   0.300000   4.890000 (  1.982959)

# Node.js

threaded, 1 conn, Node.js (V8)        0.110000   0.250000   9.560000 (  9.504323)
threaded, 10 conn, Node.js (V8)       0.140000   0.250000  16.970000 (  3.768226)
threaded, 25 conn, Node.js (V8)       0.080000   0.240000  17.570000 (  5.963759)
1 conn, Node.js (V8)                  0.100000   0.250000   9.360000 (  9.294853)
10 conn, Node.js (V8)                 0.160000   0.270000  17.320000 (  3.884776)
25 conn, Node.js (V8)                 0.220000   0.340000  23.380000 (  8.886694)

Takeaways

  • RubyRacer is the fastest
  • Node.js speed decreases with a larger connection pool
  • Running in different threads decreases speed

React::ServerRendering.renderer_options = {code: JS_CODE}
React::ServerRendering.pool_timeout = 10

def test_runtime(runtime, renders:, pool_size:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have these explicit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? I think keyword args are good for expressing meaning

React::ServerRendering.renderer_options = {code: JS_CODE}
React::ServerRendering.pool_timeout = 1000

def test_runtime(runtime, renders:, pool_size:, threaded:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant with the default values to be explicit- nil. Just easier to read, nothing specific :-)

@mchristen
Copy link

Great script.

When running the tests though, my ruby process now deadlocks consistently when pool size > 1 and using the threaded version of the test. This is a much easier way to reproduce the problem then trying to stress the entire web stack.

Looking through the thread stacks in gdb it looks like the deadlock problem is something with libv8 itself.

(gdb) info threads
  Id   Target Id         Frame 
  11   Thread 0x7f7a3346e700 (LWP 14387) "ruby-timer-thr" 0x00007f7a3215350d in poll () at ../sysdeps/unix/syscall-template.S:81
  10   Thread 0x7f7a33463700 (LWP 14392) "SignalSender" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
  9    Thread 0x7f7a2de1e700 (LWP 16028) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  8    Thread 0x7f7a2dd1d700 (LWP 16029) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  7    Thread 0x7f7a2e424700 (LWP 16030) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  6    Thread 0x7f7a2db1b700 (LWP 16031) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  5    Thread 0x7f7a26222700 (LWP 16032) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  4    Thread 0x7f7a2e121700 (LWP 16033) "server_renderi*" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  3    Thread 0x7f7a2df1f700 (LWP 16034) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  2    Thread 0x7f7a2dc1c700 (LWP 16035) "server_renderi*" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
* 1    Thread 0x7f7a3344e700 (LWP 14355) "ruby" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

Most of those threads look OK, just the ruby interpreter doing it's threading thing, BUT threads 10 and 4 seem suspicious.

(gdb) thread 10
[Switching to thread 10 (Thread 0x7f7a33463700 (LWP 14392))]
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
85      ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S: No such file or directory.
(gdb) where
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x00007f7a3077b59d in v8::internal::LinuxSemaphore::Wait() () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#2  0x00007f7a30680e8c in v8::internal::RuntimeProfiler::WaitForSomeIsolateToEnterJS() () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#3  0x00007f7a3077b7a8 in v8::internal::SignalSender::Run() () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#4  0x00007f7a3077b6ae in v8::internal::ThreadEntry(void*) () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#5  0x00007f7a32b630a4 in start_thread (arg=0x7f7a33463700) at pthread_create.c:309
#6  0x00007f7a3215c04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f7a2e121700 (LWP 16033))]
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) where
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f7a32b654d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f7a32b65336 in __GI___pthread_mutex_lock (mutex=0x1e861f8) at ../nptl/pthread_mutex_lock.c:114
#3  0x00007f7a306f36ed in v8::internal::ThreadManager::Lock() () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#4  0x00007f7a306f3cff in v8::Locker::Initialize(v8::Isolate*) () from /home/matt/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/extensions/x86_64-linux/2.2.0/therubyracer-0.12.2/v8/init.so
#5  0x00007f7a304d98cf in rr::Locker::setupLockAndCall (state=0x7f7a2e11cf94, code=37382800) at locker.cc:47
#6  0x00007f7a304d9897 in rr::Locker::doLock (argc=0, argv=0x7f7a2e122310, self=26362160) at locker.cc:39
#7  0x00007f7a32f45199 in vm_call_cfunc_with_frame (ci=<optimized out>, reg_cfp=0x7f7a2e221840, th=0x290ede0) at vm_insnhelper.c:1360
#8  vm_call_cfunc (th=<optimized out>, reg_cfp=<optimized out>, ci=<optimized out>) at vm_insnhelper.c:1453
#9  0x00007f7a32f4b30f in vm_exec_core (th=th@entry=0x290ede0, initial=initial@entry=0) at insns.def:1024
#10 0x00007f7a32f4fa28 in vm_exec (th=0x290ede0) at vm.c:1407
#11 0x00007f7a32f517ae in vm_call0_body (th=0x1e861f8, th@entry=0x290ede0, ci=0x80, ci@entry=0x7f7a2e11d5f0, argv=0x0) at vm_eval.c:180
...
...
...
#79 0x00007f7a32f55f20 in vm_invoke_proc (th=th@entry=0x290ede0, proc=proc@entry=0x212e6f0, self=26549360, defined_class=26553440, argc=0, argv=0x258d8f0, blockptr=0x0) at vm.c:886
#80 0x00007f7a32f56008 in rb_vm_invoke_proc (th=th@entry=0x290ede0, proc=proc@entry=0x212e6f0, argc=<optimized out>, argv=<optimized out>, blockptr=blockptr@entry=0x0) at vm.c:905
#81 0x00007f7a32f6b360 in thread_start_func_2 (th=th@entry=0x290ede0, stack_start=<optimized out>) at thread.c:577
#82 0x00007f7a32f6b8c0 in thread_start_func_1 (th_ptr=0x290ede0) at thread_pthread.c:809
#83 0x00007f7a32b630a4 in start_thread (arg=0x7f7a2e121700) at pthread_create.c:309
#84 0x00007f7a3215c04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@rmosolgo
Copy link
Member Author

rmosolgo commented Jun 8, 2015

Yeah, I have lock problems sometimes too, I couldn't even get Benchmark.ips to run :S

Thread.new over and over is kind of a stupid way of testing, but I think it shows that there's a lock somewhere further down & ConnectionPool isn't buying us much

@mchristen
Copy link

I just made a pull request with some tweaks to the script that should make it execute more like how a web server would be calling into ExecJS. Should address your concern about creating lots of threads.

I also agree that connection_pool seems to not be doing much.

@rmosolgo
Copy link
Member Author

rmosolgo commented Jun 8, 2015

💸

mine keeps locking here:

rmosolgo ~/code/react-rails $ ruby -I lib  benchmarks/server_rendering_benchmark.rb
                                                    user     system      total        real
threaded, 1 conn, 2 threads, therubyracer (V8)  0.140000   0.020000   0.160000 (  0.156010)
threaded, 1 conn, 2 threads, Duktape            0.780000   0.010000   0.790000 (  0.790320)
threaded, 1 conn, 2 threads, JavaScriptCore     0.180000   0.320000   4.680000 (  4.849217)
threaded, 1 conn, 2 threads, Node.js (V8)       0.140000   0.290000  18.570000 ( 18.563796)
threaded, 1 conn, 4 threads, therubyracer (V8)  0.190000   0.020000   0.210000 (  0.206175)
threaded, 1 conn, 4 threads, Duktape            1.490000   0.020000   1.510000 (  1.500667)
threaded, 1 conn, 4 threads, JavaScriptCore     0.270000   0.580000  10.590000 ( 11.642282)
threaded, 1 conn, 4 threads, Node.js (V8)       0.250000   0.620000  38.750000 ( 38.782936)
threaded, 10 conn, 2 threads, therubyracer (V8)

But I get the point anyways!

@johnthethird, it looks like ExecJS/V8 has a lock which nullifies the use of ConnectionPool in server rendering. Do you know of any red flags in removing it?

@mchristen
Copy link

I've been testing both with a connection pool and without a connection pool running 10000 renders / test. I modified the script slightly so that when using more than one thread it executes 10000 / # of threads renders per thread, so each test should be calling render the same amount of times.

$ ruby -I lib benchmarks/server_rendering_benchmark.rb # WITHOUT connection_pool
                                                    user     system      total        real
threaded, 1 conn, 1 threads, therubyracer (V8)  4.480000   0.060000   4.540000 (  4.499783)
threaded, 1 conn, 2 threads, therubyracer (V8)  5.020000   0.080000   5.100000 (  4.913696)
threaded, 1 conn, 4 threads, therubyracer (V8)  5.070000   0.100000   5.170000 (  4.942754)
threaded, 1 conn, 8 threads, therubyracer (V8)  5.230000   0.070000   5.300000 (  5.068558)
threaded, 10 conn, 1 threads, therubyracer (V8)  4.580000   0.050000   4.630000 (  4.587420)
threaded, 10 conn, 2 threads, therubyracer (V8)  5.100000   0.110000   5.210000 (  5.010717)
threaded, 10 conn, 4 threads, therubyracer (V8)  4.980000   0.090000   5.070000 (  4.857607)
threaded, 10 conn, 8 threads, therubyracer (V8)  5.040000   0.080000   5.120000 (  4.893360)
threaded, 25 conn, 1 threads, therubyracer (V8)  4.470000   0.000000   4.470000 (  4.443161)
threaded, 25 conn, 2 threads, therubyracer (V8)  5.100000   0.100000   5.200000 (  5.000211)
threaded, 25 conn, 4 threads, therubyracer (V8)  4.990000   0.090000   5.080000 (  4.871171)
threaded, 25 conn, 8 threads, therubyracer (V8)  5.360000   0.070000   5.430000 (  5.197703)
1 conn, 1 threads, therubyracer (V8)            5.000000   0.020000   5.020000 (  4.991949)
1 conn, 2 threads, therubyracer (V8)            5.140000   0.080000   5.220000 (  5.019844)
1 conn, 4 threads, therubyracer (V8)            5.390000   0.080000   5.470000 (  5.260477)
1 conn, 8 threads, therubyracer (V8)            5.420000   0.080000   5.500000 (  5.276437)
10 conn, 1 threads, therubyracer (V8)           4.810000   0.020000   4.830000 (  4.801155)
10 conn, 2 threads, therubyracer (V8)           5.330000   0.070000   5.400000 (  5.179808)
10 conn, 4 threads, therubyracer (V8)           5.130000   0.070000   5.200000 (  4.998896)
10 conn, 8 threads, therubyracer (V8)           5.340000   0.100000   5.440000 (  5.207626)
25 conn, 1 threads, therubyracer (V8)           4.690000   0.030000   4.720000 (  4.684538)
25 conn, 2 threads, therubyracer (V8)           5.220000   0.050000   5.270000 (  5.059232)
25 conn, 4 threads, therubyracer (V8)           5.440000   0.050000   5.490000 (  5.282391)
25 conn, 8 threads, therubyracer (V8)           5.380000   0.110000   5.490000 (  5.257271)

[matt@popcorn:~/work/react-rails]
$ ruby -I lib benchmarks/server_rendering_benchmark.rb # WITH connection_pool
                                                    user     system      total        real
threaded, 1 conn, 1 threads, therubyracer (V8)  4.540000   0.080000   4.620000 (  4.579024)
threaded, 1 conn, 2 threads, therubyracer (V8)  5.430000   0.120000   5.550000 (  5.310940)
threaded, 1 conn, 4 threads, therubyracer (V8)  5.500000   0.150000   5.650000 (  5.459235)
threaded, 1 conn, 8 threads, therubyracer (V8)  5.620000   0.120000   5.740000 (  5.525929)
threaded, 10 conn, 1 threads, therubyracer (V8)  4.860000   0.040000   4.900000 (  4.851019)
threaded, 10 conn, 2 threads, therubyracer (V8)  6.150000   0.100000   6.250000 (  5.992818)
threaded, 10 conn, 4 threads, therubyracer (V8)  6.910000   0.100000   7.010000 (  6.786146)
threaded, 10 conn, 8 threads, therubyracer (V8)  9.820000   0.230000  10.050000 (  9.466074)
threaded, 25 conn, 1 threads, therubyracer (V8)  4.990000   0.030000   5.020000 (  4.985817)
threaded, 25 conn, 2 threads, therubyracer (V8)  6.520000   0.090000   6.610000 (  6.355268)
threaded, 25 conn, 4 threads, therubyracer (V8)  7.560000   0.150000   7.710000 (  7.434552)
threaded, 25 conn, 8 threads, therubyracer (V8)  9.790000   0.290000  10.080000 (  9.522219)
1 conn, 1 threads, therubyracer (V8)            4.930000   0.060000   4.990000 (  4.951394)
1 conn, 2 threads, therubyracer (V8)            5.720000   0.100000   5.820000 (  5.598103)
1 conn, 4 threads, therubyracer (V8)            5.900000   0.070000   5.970000 (  5.771040)
1 conn, 8 threads, therubyracer (V8)            5.860000   0.080000   5.940000 (  5.730886)
10 conn, 1 threads, therubyracer (V8)           4.970000   0.020000   4.990000 (  4.951558)
10 conn, 2 threads, therubyracer (V8)           6.670000   0.100000   6.770000 (  6.535869)
10 conn, 4 threads, therubyracer (V8)           7.700000   0.120000   7.820000 (  7.576611)
10 conn, 8 threads, therubyracer (V8)           9.120000   0.190000   9.310000 (  8.935771)
25 conn, 1 threads, therubyracer (V8)           4.980000   0.030000   5.010000 (  4.968515)
25 conn, 2 threads, therubyracer (V8)           7.150000   0.100000   7.250000 (  6.979404)
25 conn, 4 threads, therubyracer (V8)           8.150000   0.090000   8.240000 (  7.999867)
25 conn, 8 threads, therubyracer (V8)          10.100000   0.190000  10.290000 (  9.712077)

The first results are without using a connection_pool and the second is with using the pool. You can clearly see the performance degradation when using the pool, I would definitely go ahead and remove it.

I was able to fix the deadlock issue by wrapping all calls to ExecJS with Mutex#synchronize so that only one thread can be compiling or rendering at the same time. It seems silly to have to also provide our own 'GIL' for interacting with ExecJS, but it definitely fixes the deadlock problem, and it only seems to matter for therubyracer, none of the other runtimes would deadlock.

@johnthethird
Copy link
Contributor

@rmosolgo I wouldnt think the connection pool would matter much on MRI. (Also, V8 has its own GIL, similar to MRI). But JRuby is a different story. The connection pool gem is battle hardened from its use in Sidekiq, works great on JRuby/JVM, and is used to mediate access to potentially long-running external resources/processes. So it was a perfect fit for mediating access to the JS engine, or so I thought. Ill try to come up with a benchmark that shows what happens with a truly multi-threaded ruby runtime.

@mchristen
Copy link

@johnthethird I thought about JRuby as well, but wouldn't the GIL in V8 negate any sort of performance from the JRuby threads? Regardless of accessing an ExecJS context through connection_pool or not they will still share the same GIL in V8.

@johnthethird
Copy link
Contributor

Jruby has TheRubyRhino which is wicked fast and multithreaded, and lives in the same JVM.

@rmosolgo
Copy link
Member Author

rmosolgo commented Jun 9, 2015

Thanks @johnthethird , I forgot about JRuby!

Here's the benchmark with Ruby Rhino & Node.js on jruby:

rmosolgo ~/code/react-rails $ jruby -I lib  benchmarks/server_rendering_benchmark.rb
                                                    user     system      total        real
threaded, 1 conn, 2 threads, therubyrhino (Rhino)  11.690000   0.180000  11.870000 (  2.922000)
threaded, 1 conn, 2 threads, Node.js (V8)           5.410000   0.280000  24.250000 ( 19.702000)
threaded, 1 conn, 4 threads, therubyrhino (Rhino)   6.460000   0.120000   6.580000 (  2.127000)
threaded, 1 conn, 4 threads, Node.js (V8)           2.340000   0.380000  39.480000 ( 38.058000)
threaded, 10 conn, 2 threads, therubyrhino (Rhino)  4.030000   0.050000   4.080000 (  1.025000)
threaded, 10 conn, 2 threads, Node.js (V8)          1.480000   0.160000  21.260000 ( 10.446000)
threaded, 10 conn, 4 threads, therubyrhino (Rhino)  5.150000   0.060000   5.210000 (  1.078000)
threaded, 10 conn, 4 threads, Node.js (V8)          1.930000   0.390000  64.290000 ( 16.491000)
1 conn, 2 threads, therubyrhino (Rhino)         2.560000   0.030000   2.590000 (  1.086000)
1 conn, 2 threads, Node.js (V8)                 0.720000   0.170000  19.420000 ( 19.410000)
1 conn, 4 threads, therubyrhino (Rhino)         1.680000   0.030000   1.710000 (  1.075000)
1 conn, 4 threads, Node.js (V8)                 2.060000   0.400000  40.220000 ( 38.874000)
10 conn, 2 threads, therubyrhino (Rhino)        0.750000   0.010000   0.760000 (  0.357000)
10 conn, 2 threads, Node.js (V8)                0.520000   0.130000  20.160000 ( 10.280000)
10 conn, 4 threads, therubyrhino (Rhino)        1.860000   0.030000   1.890000 (  0.560000)
10 conn, 4 threads, Node.js (V8)                1.960000   0.350000  59.300000 ( 15.169000)

@mchristen
Copy link

Gotcha, that doesn't use V8 at all so is not subject to its GIL.

The whole reason this pull request exists is because using ExecJS w/ therubyracer in a threaded environment exhibits a deadlock. So while trying to debug/figure out what/where the problem is we discovered that using the connection_pool with MRI actually hurts performance.

So it looks like perhaps the best solution is one where a connection_pool is used when the underlying JS runtime is thread-safe and JRuby is being used, otherwise don't bother because it hurts performance on MRI.

The issue of therubyracer deadlocking still isn't fixed though. until the real bug is fixed whether it is in therubyracer, libv8, or MRI itself, there needs to be some locking going on inside of react-rails to prevent against concurrent calls to anything in ExecJS. Perhaps that could also be conditional on whether or not therubyracer is being used so that we don't step in the way of JRuby.

@mchristen
Copy link

Here's a full test run using jruby and therubyrhino running 20k renders per test. Node.js is so slow that I'm not bothering to test against it.

I altered the script test loop order to change the size of the connection pool before it changes the thread count and you can clearly see the performance benefit of using a pool size >= thread count by looking at the real column. My machine has hyperthreading so I guess that's why everything above 4 threads fails to show any improvement.

$ ruby -I lib benchmarks/server_rendering_benchmark.rb  20000                                                                                                                                                                                                                                                                
                                                              user     system      total        real
threaded, 1 conn, 1 threads, therubyrhino (Rhino)        85.400000   0.140000  85.540000 ( 77.309994)
threaded, 2 conn, 1 threads, therubyrhino (Rhino)        75.520000   0.080000  75.600000 ( 73.073037)
threaded, 4 conn, 1 threads, therubyrhino (Rhino)        73.250000   0.060000  73.310000 ( 72.023527)
threaded, 8 conn, 1 threads, therubyrhino (Rhino)        73.640000   0.080000  73.720000 ( 72.338433)
threaded, 10 conn, 1 threads, therubyrhino (Rhino)       73.250000   0.040000  73.290000 ( 71.963621)
threaded, 1 conn, 2 threads, therubyrhino (Rhino)        76.500000   0.200000  76.700000 ( 73.928030)
threaded, 2 conn, 2 threads, therubyrhino (Rhino)        87.760000   0.040000  87.800000 ( 43.708759)
threaded, 4 conn, 2 threads, therubyrhino (Rhino)        85.640000   0.070000  85.710000 ( 42.704853)
threaded, 8 conn, 2 threads, therubyrhino (Rhino)        89.220000   0.070000  89.290000 ( 44.385474)
threaded, 10 conn, 2 threads, therubyrhino (Rhino)       89.370000   0.060000  89.430000 ( 44.542108)
threaded, 1 conn, 4 threads, therubyrhino (Rhino)        78.490000   0.550000  79.040000 ( 74.063415)
threaded, 2 conn, 4 threads, therubyrhino (Rhino)        96.940000   0.260000  97.200000 ( 47.619800)
threaded, 4 conn, 4 threads, therubyrhino (Rhino)       112.190000   0.080000 112.270000 ( 28.976933)
threaded, 8 conn, 4 threads, therubyrhino (Rhino)       110.300000   0.050000 110.350000 ( 28.598516)
threaded, 10 conn, 4 threads, therubyrhino (Rhino)      115.960000   0.020000 115.980000 ( 29.758068)
threaded, 1 conn, 8 threads, therubyrhino (Rhino)        80.580000   1.110000  81.690000 ( 73.936175)
threaded, 2 conn, 8 threads, therubyrhino (Rhino)       100.400000   0.950000 101.350000 ( 48.156891)
threaded, 4 conn, 8 threads, therubyrhino (Rhino)       124.540000   0.510000 125.050000 ( 30.938125)
threaded, 8 conn, 8 threads, therubyrhino (Rhino)       235.120000   0.290000 235.410000 ( 31.244846)
threaded, 10 conn, 8 threads, therubyrhino (Rhino)      236.010000   0.180000 236.190000 ( 31.007223)

@mchristen
Copy link

Here's a full run using MRI(2.2.1).

Maybe I'm doing something wrong with JRuby, but that performance is quite bad. I was using version 9.0.0.0.pre2 installed through rbenv and the runtime numbers definitely show it was utilizing more than one CPU core at a time(I could see through sysmon tools as well this was the case).

So MRI running on one CPU core is almost 4x faster than JRuby running on all 4 CPU cores?? There's obviously something else going on... right?

The way this benchmark script is setup it pretty much purely tests the performance of the associated JS runtimes, so does that mean Rhino is that much slower than V8, at least when used in this context?

$ ruby -I lib benchmarks/server_rendering_benchmark.rb  20000
                                                              user     system      total        real
threaded, 1 conn, 1 threads, therubyracer (V8)            8.700000   0.090000   8.790000 (  8.723473)
threaded, 2 conn, 1 threads, therubyracer (V8)            8.800000   0.080000   8.880000 (  8.800348)
threaded, 4 conn, 1 threads, therubyracer (V8)            9.020000   0.090000   9.110000 (  9.035414)
threaded, 8 conn, 1 threads, therubyracer (V8)            9.130000   0.080000   9.210000 (  9.141726)
threaded, 10 conn, 1 threads, therubyracer (V8)           9.100000   0.060000   9.160000 (  9.089403)
threaded, 1 conn, 2 threads, therubyracer (V8)           10.560000   0.190000  10.750000 ( 10.345181)
threaded, 2 conn, 2 threads, therubyracer (V8)           11.510000   0.190000  11.700000 ( 11.248498)
threaded, 4 conn, 2 threads, therubyracer (V8)           12.870000   0.180000  13.050000 ( 12.605508)
threaded, 8 conn, 2 threads, therubyracer (V8)           11.710000   0.210000  11.920000 ( 11.525172)
threaded, 10 conn, 2 threads, therubyracer (V8)          14.330000   0.240000  14.570000 ( 14.143688)
threaded, 1 conn, 4 threads, therubyracer (V8)           11.120000   0.220000  11.340000 ( 11.010513)
threaded, 2 conn, 4 threads, therubyracer (V8)           14.400000   0.230000  14.630000 ( 14.241322)
threaded, 4 conn, 4 threads, therubyracer (V8)           14.610000   0.220000  14.830000 ( 14.364411)
threaded, 8 conn, 4 threads, therubyracer (V8)           15.190000   0.230000  15.420000 ( 14.863919)
threaded, 10 conn, 4 threads, therubyracer (V8)          15.790000   0.180000  15.970000 ( 15.485036)
threaded, 1 conn, 8 threads, therubyracer (V8)           11.300000   0.200000  11.500000 ( 11.143031)
threaded, 2 conn, 8 threads, therubyracer (V8)           13.400000   0.270000  13.670000 ( 13.197583)
threaded, 4 conn, 8 threads, therubyracer (V8)           15.700000   0.350000  16.050000 ( 15.256964)
threaded, 8 conn, 8 threads, therubyracer (V8)           18.940000   0.440000  19.380000 ( 18.190847)
threaded, 10 conn, 8 threads, therubyracer (V8)          18.520000   0.330000  18.850000 ( 18.015099)

@mchristen
Copy link

I replaced the JS code that is being rendered with

var SlowComponent = React.createClass({                                 
  render: function() {                                                  
    var i, items, rand;                                                 
    rand = 0;                                                           
    i = 0;                                                              
    items = [];                                                         
    while (i < 10) {                                                    
      items.push(React.createElement('div', i + ' :)'));                
      i++;                                                              
    }                                                                   
    return React.createElement('div', items);                           
  }                                                                     
})        

and now I get the following results for JRuby

$ ruby -I lib benchmarks/server_rendering_benchmark.rb  20000                                                                                                                                                                                                                                                                
                                                              user     system      total        real
threaded, 1 conn, 2 threads, therubyrhino (Rhino)        32.790000   0.290000  33.080000 ( 22.342218)
threaded, 2 conn, 2 threads, therubyrhino (Rhino)        21.270000   0.040000  21.310000 (  9.462662)
threaded, 4 conn, 2 threads, therubyrhino (Rhino)        17.340000   0.020000  17.360000 (  8.563669)
threaded, 8 conn, 2 threads, therubyrhino (Rhino)        17.390000   0.030000  17.420000 (  8.699205)
threaded, 10 conn, 2 threads, therubyrhino (Rhino)       18.570000   0.030000  18.600000 (  9.317547)
threaded, 1 conn, 4 threads, therubyrhino (Rhino)        19.690000   0.370000  20.060000 ( 17.576572)
threaded, 2 conn, 4 threads, therubyrhino (Rhino)        20.380000   0.180000  20.560000 (  9.834104)
threaded, 4 conn, 4 threads, therubyrhino (Rhino)        20.540000   0.050000  20.590000 (  5.342520)
threaded, 8 conn, 4 threads, therubyrhino (Rhino)        20.680000   0.020000  20.700000 (  5.593378)
threaded, 10 conn, 4 threads, therubyrhino (Rhino)       22.150000   0.050000  22.200000 (  5.902722)

and for MRI

$ ruby -I lib benchmarks/server_rendering_benchmark.rb  20000
                                                              user     system      total        real
threaded, 1 conn, 2 threads, therubyracer (V8)           13.300000   0.120000  13.420000 ( 13.034083)
threaded, 2 conn, 2 threads, therubyracer (V8)           14.440000   0.160000  14.600000 ( 14.172245)
threaded, 4 conn, 2 threads, therubyracer (V8)           14.490000   0.230000  14.720000 ( 14.261971)
threaded, 8 conn, 2 threads, therubyracer (V8)           14.850000   0.180000  15.030000 ( 14.610182)
threaded, 10 conn, 2 threads, therubyracer (V8)          15.150000   0.190000  15.340000 ( 14.904837)
threaded, 1 conn, 4 threads, therubyracer (V8)           14.700000   0.190000  14.890000 ( 14.527581)
threaded, 2 conn, 4 threads, therubyracer (V8)           17.250000   0.200000  17.450000 ( 17.055383)
threaded, 4 conn, 4 threads, therubyracer (V8)           19.420000   0.260000  19.680000 ( 19.221227)
threaded, 8 conn, 4 threads, therubyracer (V8)           18.680000   0.210000  18.890000 ( 18.426053)
threaded, 10 conn, 4 threads, therubyracer (V8)          19.380000   0.200000  19.580000 ( 19.127787)

Those results make more sense, so my conclusion is that Math.random() causes drastically different performance on the two JS runtimes.

@johnthethird
Copy link
Contributor

Yea, V8 is a lot faster than Rhino, from a pure speed perspective. The new JS engine in Java 8 (Nashorn) approaches V8 in speed.

Focusing on just the pool concept, I changed to benchmark script to show the advantages of a pool (I had to normalize the renders so that each test is doing the same number of total renders taking into account the number of threads, to make it a fair test. Also, instead of Math.random do some more real life React things.)

https://gist.github.com/johnthethird/19074284591b6091e367

The thing to note in the results, is that 10 threads against a size 10 pool, makes a huge difference, obviously, for JRuby and for even for MRI if it uses an out-of-process JS engine like Node.

Re the MRI deadlock issue, it looks like not only does V8 have a GIL of its own but ExecJS also has locks around compile and eval

https://github.com/sstephenson/execjs/blob/master/lib/execjs/ruby_racer_runtime.rb#L32

So Im not sure why putting a lock around our render fixes the deadlock, but it sure does.

For MRI/RubyRacer, probably the right thing to do is have a pool size of 1. Having the pool there shouldnt really hurt performance, as the overhead of checking in/out of the pool is minimal. The question is whether to make this a hard-coded limit or just issue a warning if pool size > 1 when using RubyRacer.

@rmosolgo
Copy link
Member Author

How's this for takeaways:

  • Default Pool Size to 1
  • Document: on JRuby, you can up the pool size
  • Recommend RubyRacer for MRI

??

@mchristen
Copy link

👍

Perhaps also include a note about the deadlock condition if you configure Pool Size > 1 with RubyRacer on MRI.

@glittershark
Copy link
Contributor

Per my comments on #222, I'm definitely throwing my weight behind officially recommending therubyracer on MRI. Should definitely throw in a note there saying that you should do it even though the Heroku docs say not to, to avoid chumps like me thinking one suggestion overrides the other.

@rmosolgo
Copy link
Member Author

Anybody want to cook up a PR 😬 ?? TODO:

  • Default Pool Size to 1
  • Document: Recommend RubyRacer for MRI
  • Document: on MRI, you get a deadlock when pool size > 1
  • Document: on JRuby, you can up the pool size and get effective multi-threading

@rmosolgo
Copy link
Member Author

@rmosolgo rmosolgo closed this Jun 23, 2015
@rmosolgo rmosolgo deleted the execjs-benchmark branch July 1, 2015 01:43
n-rodriguez pushed a commit to jbox-web/easy-app that referenced this pull request Dec 26, 2016
kimat added a commit to kimat/s2i-ruby-alpine-mysql that referenced this pull request Feb 15, 2018
compilation of libv8 takes allot of time though

reactjs/react-rails#290
@BobbyMcWho
Copy link

People finding this thread later should note that per #886 'therubyracer' should not be used anymore.

@BookOfGreg
Copy link
Member

👍 @BobbyMcWho Yep! It was deprecated because it only supports LibV8 version 3, when the current version is v5. It's also notorious for segfaulting your servers in production. MiniRacer is the current recommendation and it was even a little faster since it's using a newer version of libV8.

If you remind me in a few, I might get a benchmark out of it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants