Don't Follow The Masses: Bug Hunting in JavaScript Engines

Analysis of CVE-2019-5790 and how the search for unexplored attack surface in V8 led to its discovery.

I. Introduction

If you take a look at all the recent publications on browser security and JavaScript engine security in particular, you could easily get the naive impression that the only place to look for new bugs in modern JavaScript implementations are just-in-time (JIT) compilers. The huge complexity of these engines, the code churn (500 commits alone for V8 during the last month [1]) and the sheer amount and seemingly never ending flood of publicly disclosed bugs [2] rightfully suggests that it's probably an area worthwhile to look at as a bug hunter.

Focussing on publicly known attack surface in high profile targets

However when looking at high profile targets such as web browsers, chances are that the focus of many researchers is mainly driven by publicly disclosed bugs and publications. On the one hand this approach can give you a good quick overview of the potentially error prone areas of a code base when starting off with a new target. There's also no dispute that some areas are more complex than others and thus deserve the attention (like JIT engines). On the other hand, people all too often forget that other parts of a code base, which are currently not so much in public focus, might provide some interesting attack surface (and bugs) as well and shouldn't be overlooked. This is true in particular if your goal is to find bugs with a lifetime longer than a few weeks or months.

In this blog post we describe our successful attempt at finding a vulnerability in V8 and how focussing on a component which didn't look like it would provide a huge attack surface initially, enabled us to find a high severity vulnerability which was awarded with a $7,500 bug bounty payout by Google.

II. JavaScript Pipeline

A high-level description of the different stages involved in the JavaScript pipeline is given below to provide a very rough overview of the possible attack surface. A more detailed and highly recommended introduction can be found at [3].

                             AST               Bytecode
+-------------+   +--------+    +-------------+        +--------------+
| JavaScript  |-->| Parser |--->| Interpreter |------->| JIT Compiler |----+
| source code |   |        |    | (Ignition)  |        | (TurboFan)   |    |
+-------------+   +--------+    +-------------+        +--------------+    | Assembly 
                                       |                                   | code
                                       |               +---------+         |
                                       +-------------->| Runtime |<--------+
                                           Bytecode    +---------+

 The following descriptions will focus on V8, but similar concepts apply to other engines as well.

The first step of the JavaScript engine is to parse the JavaScript source code. The goal is to transform the source code into an abstract syntax tree (AST) representation. Even such a seemingly simple task such as scanning text for known tokens from a character stream is highly optimized for speed and continuesly improved in modern JavaScript engines such as V8 [4,5]. It was exactly this first stage, in which we were able to identify the vulnerability which will be described below.

After the AST is built it is converted to custom bytecode which is then consumed by the interpreter or JIT compiler. V8 is using Ignition [6] as its interpreter. The bytecode is either executed directly by a register machine or is passed on to the JIT compiler. At this stage in the JavaScript pipeline we already have a few optimization stages and in consequence the potential for vulnerabilities as well.

After a function was executed for a certain number of times in the interpeter, it is marked as "hot" and will be compiled to machine code by the JIT compiler. V8 is using TurboFan [7] as its JIT compiler. Without going into any more detail, this phase is a highly complex process and already was the source of a vast number of vulnerabilities [8] in the past.

In parallel to the JavaScript pipeline we have the garbage collector [9,10] which allows the programmer to not having to explicitly manage memory. Although this reduces a large class of bugs such as memory leaks, it can lead to interesting vulnerabilities as well [11].

III. JavaScript Parsing

The implementation of the parser in V8 is described in some detail at [4,5]. The code implementing the parser can be found in src/parsing/ in the V8 source tree.

                                        +---------->| PreParser |
                                        |  tokens   +-----------+
                                        |                 |
                                        |                 v
+-------+       +--------+        +---------+        +--------+     
| Blink |------>| Stream |------->| Scanner |------->| Parser |
+-------+       +--------+        +---------+        +--------+
          ASCII            UTF-16             tokens      |
                                                          | AST               
                            +----------+            +----------+
                            | TurboFan |<-----------| Ignition |
                            +----------+            +----------+   

The first step in parsing JavaScript source code is scanning the text for tokens. The Scanner class consumes the input and generates Token objects which are consumed by the parser. The UTF16CharacterStream class is used as an abstraction for the text input stream to provide tokens in UTF-16 format to the scanner and abstract away the different possible encoding formats of JavaScript received from the network. The Parser class then generates the final AST based on the consumed tokens.

IV. LiteralBuffer Integer Overflow (CVE-2019-5790)

The following bug was found by our researcher Dimitri Fourny (@DimitriFourny) and reported to Google on the 13th of December 2018. It was fixed in Chrome version 73.0.3683.75. The corresponding bug tracking entry can be found at [12].

The Scanner::Scan method starts of by calling Scanner::ScanSingleToken to find the next non-whitespace token in the stream. Depending on the encountered token, it implements some special cases to handle them appropriately. E.g. single character tokens such as brace, bracket or semicolon are just returned while other tokens cause the consumption of more characters from the stream.

One such example is the TOKEN::String token which is e.g. returned for the quote character. If this token is encountered, the Scanner::ScanString method is called. That method calls Scanner::AddLiteralChar in a loop, until the closing quote character is found.

The Scanner::AddLiteralChar method calls Scanner::LiteralBuffer::AddChar which in the end calls Scanner::LiteralBuffer::AddTwoByteChar if the initial quote character is followed by two-byte characters.

void Scanner::LiteralBuffer::AddTwoByteChar(uc32 code_unit) {
  if (position_ >= backing_store_.length()) ExpandBuffer();
  if (code_unit <=
      static_cast(unibrow::Utf16::kMaxNonSurrogateCharCode)) {
    *reinterpret_cast<uint16_t*>(&backing_store_[position_]) = code_unit;
    position_ += kUC16Size;
  } else {
    *reinterpret_cast<uint16_t*>(&backing_store_[position_]) =
    position_ += kUC16Size;
    if (position_ >= backing_store_.length()) ExpandBuffer();
    *reinterpret_cast<uint16_t*>(&backing_store_[position_]) =
    position_ += kUC16Size;

The backing_store_ byte vector buffers the already scanned part of the string and is dynamically resized on demand. If the Scanner::LiteralBuffer::AddTwoByteChar method detects that the vector needs to grow, it calls Scanner::LiteralBuffer::ExpandBuffer which allocates a larger buffer and then copies the bytes from the old buffer into the new one.

void Scanner::LiteralBuffer::ExpandBuffer() {
  Vector new_store = Vector::New(NewCapacity(kInitialCapacity));
  MemCopy(new_store.start(), backing_store_.start(), position_);
  backing_store_ = new_store;

The method Scanner::LiteralBuffer::NewCapacity is used to calculate the size of the new vector.

int Scanner::LiteralBuffer::NewCapacity(int min_capacity) {
  int capacity = Max(min_capacity, backing_store_.length());
  int new_capacity = Min(capacity * kGrowthFactory, capacity + kMaxGrowth);
  return new_capacity;

We can control backing_store_.length() by varying the number of characters following the initial quote character. A huge JavaScript string leads to a huge capacity value, which can make the expression capacity * kGrowthFactory overflow, so that new_capacity will be set to a smaller value than the previous capacity. In consequence, the next MemCopy call will write more bytes into the vector than were previously allocated, causing heap memory corruption.

The following simple proof-of-concept can trigger the bug:

let s = String.fromCharCode(0x4141).repeat(0x10000001) + "A";
s = "'"+s+"'";

The bug seemed quite obvious by reading the code, but was probably hard to spot by fuzzing because it requires around 20 GB of memory and quite some time to trigger it on a typical desktop machine.

V. Conclusion

When targeting a high profile target, it makes sense to dig deep into the most complex and already known to be error-prone areas, because chances are many bugs can be found in these places. Nevertheless huge targets such as web browsers provide a plethora of attack surface and beeing successful at doing vulnerability research on these targets often times means just identifying new attack surface, where nobody else was looking before.