With the growing attention and maturing community, JavaScript has risen from an embedded scripting language to a full-blown programming language in its own right. But just like a super car needs a great engine, JavaScript has been expecting a top-notch engine for far too long.

Performance comparison between C and JavaScript

Here is an example to calculate the 50,000th prime number using C and JavaScript respectively.

#include <stdio.h>

class Primes {
     public:
          int getPrimeCount() const { return prime_count; }
          int getPrime(int i) const { return primes[i]; }
          void addPrime(int i) { primes[prime_count++] = i; }
          bool isDivisible(int i, int by) { return (i % by) == 0; }
          bool isPrimeDivisible(int candidate) {
               for (int i = 1; i < prime_count; ++i){
                    if (isDivisible(candidate, primes[i])) return true;
               }
               return false;
          }

     private:
          volatile int prime_count;
          volatile int primes[50000];
};

int main() {
     Primes p;
     int c = 1;
     while (p.getPrimeCount() < 50000) {
          if (!p.isPrimeDivisible(c)) {
               p.addPrime(c);
          }
          c++;
     }
     printf("%d\n", p.getPrime(p.getPrimeCount()-1));
}

function Primes() {
  this.prime_count = 0;
  this.primes = new Array(50000);
  this.getPrimeCount = function () {
    return this.prime_count;
  }
  this.getPrime = function (i) {
    return this.primes[i];
  }
  this.addPrime = function (i) {
    this.primes[this.prime_count++] = i;
  }
  this.isPrimeDivisible = function (candidate) {
    for (var i = 1; i < this.prime_count; ++i) {
      if ((candidate % this.primes[i]) == 0) {
        return true;
      }
    }
    return false;
  }
};

function main() {
  p = new Primes();
  var c = 1;
  while (p.getPrimeCount() < 50000) {
    if (!p.isPrimeDivisible(c)) {
      p.addPrime(c);
    }
    c++;
  }
  console.log(p.getPrime(p.getPrimeCount()-1));
};

main();

G++ execution result:

real     0m3.557s
user     0m3.552s
sys     0m0.003s

Rhino execution result:

real     2m8.270s
user     2m10.093s
sys     0m0.717s

OK…It’s just JavaScript, right?

As a clarification, the scenario has its limitations and the result may vary given different configurations.

That being said, any sensible people would obviously choose languages like C to handle hefty calculation in this case to avoid the 50x performance gap. While JavaScript is just better off focusing on the front-end. Or does it have to be this way?

To answer this question, I start out by asking another question.

What’s holding JavaScript back?

In static OO language, a particular property is always found at a known offset in instances of a particular class, so are the method calls. What that means is the properties and functions’ location of a class in the heap is predictable given a specific type information.

While in JavaScript, there are no classes, which means JavaScript will perform a dynamic dictionary lookup to find out if an object has a particular property every time the code is executed. This is all because JavaScript has limited compile-time type information.

It is expensive to reason about JavaScript types at compile time, because type confirmation is part of the execution time of JavaScript. Every cycle spent on compiling the code is actually a cycle longer before the code gets to actually work.

In order to execute JavaScript code as fast as possible, the compiler needs know type information as soon as it can. And a compiler tailored to interpret, optimise and execute JavaScript code to get the most out of its great features, like dynamic typing, properties of an object can be added or removed on the fly, is exactly what’s needed.

So I need to drop what I am doing and work on a JavaScript engine now?

Rather than re-inventing the wheel, just embrace what Google has come up with, V8.

What makes V8 really stand out?

There are several areas V8 does differently. By understanding the most distinct features listed below, I hope this can give a brief insight of V8.

1. Hidden classes

One of the key optimisations in V8 is introducing the concept of hidden class.

V8 internally creates hidden classes for objects at runtime. And by enforcing the objects with same hidden class to share the same piece of optimised code, V8 gets around the time consumption problem while figuring out JavaScript type information at runtime.

Given the following example:

function Point(x, y) {
     this.x = x;
     this.y = y;
}
var p1 = new Point(1, 2);
var p2 = new Point(3, 4);

The Point object will be created twice to get p1 and p2 in normal JavaScript compilers.

But instead of having p1 and p2 as completely independent objects, V8 creates a chain of hidden classes, and identifies which hidden class should it use as a template to create p2.

In the process of creating p1, the chain of hidden classes is generated: Point -> Point(x) -> Point(x, y). And when it comes to create p2, V8 will find the hidden class with the same number of properties as given in p2, and uses the hidden class for p2’s creation.

2. Inline caching

By applying the hidden classes strategy, the number of different hidden classes at a site can be easily measured. And it is not hard to get to the conclusion that: around 90% of all access sites share the same hidden classes. What that means is even though JavaScript is written in a dynamic way, the runtime will always seek to fall back to more static approach. Therefore, class-base OO optimisation techniques, such as inline caching can be applied, and in fact, this is one the most important improvements in V8.

Inline caching (IC) is type dependent small chunk of code, which knows how to perform an operation (read or write object properties, or making function calls) given inputs of hidden class. It validates type assumptions to choose a hidden class to use or to generate new IC with new type information.

To be more specific, there are four states of inline cache in V8:

Uninitialised: it is used to tag a complete new object. A dynamic lookup is required when generating the code, then the state will be changed to pre-monomorphic state.
Pre-monomorphic: the object of this state behaves mostly like uninitialised state, a dynamic lookup is also performed. But it rewrites the state to monomorphic in the end. The reason of having this state is to avoid generating customised optimised machine code for the stuff that’s used only once.
Monomorphic: this state records the hidden class of an object (on the second property access), so that it can be blazing fast to access from this point onwards.
Megamorphic: it generally locks an object to the state and requires dynamic lookup every time, until the object is garbage collected and re-assigned a different state.

3. Garbage collection

As one of the crucial components of a compiler, V8 also re-invents the garbage collector, a generational garbage collector.

The heap is split into two separate areas:

Young generation: where the newly allocated objects are stored. Because most objects created are short lived or used only once, the space can be collected frequently.
Old generation: for the objects that survived the selection mechanism, they get promoted into the old generation, which is likely to live longer. Inside old generation, it is further split into different space: code space (for executable code), old data space (for objects have no pointer to young generation), large object space (to avoid moving of objects), hidden class space (a fixed size area with special GC process) and old pointer space (rest of the old objects).

In order to reclaim storage in smaller chunks and more frequent basis instead of collecting the entire heap at once, there are three different kinds of garbage collectors:

Scavenge collector: which targets young generation. It collects frequently, and promotes objects to old generation. It generally has a very short pause time, but depending on size of live data in young generation)
Mark-sweep collector: which targets the entire heap. The pause time depends on size of heap.
Mark-sweep-compact collector: which targets the entire heap to eliminate the fragmentation problem of the heap. And it is likely to have a long pause time.

4. Compiling JavaScript to native code

Last but not least performance gain is actually by translating JavaScript directly into machine code rather than any intermediate byte code.

V8 has two different kinds of compilers to make the performance leap:

Full-codegen compiler generates good code in compile time, without any knowledge about type information. But this compiler uses ICs to refine its type knowledge on the fly.
Crankshaft compiler, or a optimising compiler, re-compiles hot functions to generate even better code. It takes type information collected in ICs (type feedback) and inlines operations speculatively, and monomorphic functions and constructors entirely.

Fine, so how much has V8 improved JavaScript performance?

By running the prime number calculator in Node, here is the result:

real     0m5.790s
user     0m5.787s
sys     0m0.031s

Still almost twice the time as C spent, but it is a big leap from Rhino.

It sounds great, how can I use V8?

V8 has already been included in google chrome and chromium, and it is also available as an open source project ready to be embedded in any application, such as Node.js.

Some tips and tricks

There are some bits and pieces to consider when writing JavaScript to make the most out of V8:

Initialise all object members in constructor functions to teach V8 which hidden classes to have after the object is constructed.
Always initialise object members in the same order, because V8 will create different branch of hidden classes if members of a same object are constructed differently.
Use numeric values that can be represented as 31-bit signed integers for performance critical calculation to take advantage of V8’s tagging mechanism. Because if the value exceeds 31-bit signed, it will be converted into a double value.
Use contiguous keys starting at 0 for Arrays to take advantage of Fast elements’ linear lookup.
Don’t delete elements in arrays, especially numeric arrays. Because it makes an array a sparse array, and V8 will switch to Dictionary mode to do the hash table lookup.
Don’t load uninitialised or deleted elements.
Preallocate small arrays to correct size before using them
Don’t store non-numeric values (objects) in numeric arrays, which could cause a type switch.

So break free and have fun!