WebAssembly for people in a hurry

Interested in WebAssembly (Wasm), but don’t have the time for in-depth research? Read this guide to familiarize yourself with some of its core concepts and use cases.

Reading Time17 minutes

WebAssembly (Wasm) is a binary instruction format for a stack-based virtual machine.

I like your funny words magic man, but I don't know what they mean...

Let's break it down.

Binary instruction format

In other words, it is the content of `.wasm` files and it contains the bytecode executed by WASM runtimes.

The WebAssembly text format (WAT) is a textual representation of the bytecode. WAT makes it easier for us mere mortals to figure out what's going on. These are usually stored in `.wat` files.

An example of a `.wat` file with comments explaining what is being declared:

and its corresponding `.wasm` file:

We can see the module is organized into sections. All sections consist of:

  • a section ID (section code)
  • the section size in u32 (section size)
  • the contents (i.e. opcodes) of the section

Stack-based

A stack is a data structure heavily used in the programming world because of its efficiency. A call stack keeps track of currently called functions and their arguments and local variables, among other things.

The gist of it is the following:

  • A called function initiates what is called a stack frame. The function's arguments and return address are also contained in the stack frame.
  • Local variables declared by the function are pushed to the stack. A function can call other functions, which initiate their own stack frames.
  • When executed, arbitrary instructions (i.e. the function body) are run in stack frames.
  • When finished, the stack frame is deallocated and control returns to the caller of the function and if the function has a return value, the caller gets access to it.


All of this happens in a first-in-last-out manner, essentially the first function that gets called is the last one to return.

The Wasm stack operates on three kinds of entries:

  • Values - operands of instructions (think adding values, subtracting them...)
  • Labels - control instructions (think `if` blocks, `loop`s...)
  • Activations - stack frames for function calls

Wasm's stack is internal to the virtual machines (VMs) - when using a higher level language to write Wasm (C/C++/Rust...), it is inaccessible directly to you as the developer.

Virtual machine

Whereas plain old assembly operates on physical registers and memory, Wasm works in a virtual machine environment where it gets access to growable memory provided by the runtime. Browsers include a Wasm runtime and take care of setting up the plumbing necessary to run Wasm.

Native Wasm runtimes are also being developed, making it possible to run Wasm directly on a runtime on the host machine without needing the browser.

I don't really have to write assembly, right?

You do not need to actually learn WAT syntax as you'll most likely be using a higher-level language that can be compiled down to the binary format described above.

However, having an idea of what's going on in the WAT will give you more insight when potentially (inevitably) debugging Wasm code that gets shipped to the browser.

Browser debug tools display Wasm in this format so it's useful to understand it.

For a deeper dive into the `.wat` format, I highly suggest this great article by MDN.

Well, ok, but how do I use it then?

I know you're in a hurry, but it is well worth taking the time to familiarise yourself with some of WebAssembly's core concepts.

Core concepts

Here's the rundown. All of these components will contain a link to MDN so you can see an example for yourself.

Store

The Wasm store represents all instances of functions, memories, and tables, along with all allocated globals, data, and references. In other words, it keeps track of all Wasm modules initialized in the virtual machine.

Module

A Wasm module consists of the following components:

  • Types (store function signatures)
  • Functions (store procedures)
  • Memory (stores the data)
  • Tables (store references)
  • Globals (global variables available to the module)
  • Elements (belong to tables)
  • Data (belong to memory)
  • Imports (define imports from the host or other Wasm modules)
  • Exports (define what is exported to the host, ultimately dictating what can be used from the module)
  • Start (a function invoked immediately after the module is instantiated and after the memories and tables are initialized)

Further reading material: MDN

Types and Functions

Function types are located in a `Types` component. A function type represents a function signature.

The functions themselves are located in the `Functions` component. The formal definition for a function is the following:

Wait, that looks nothing like the function from the start of the post!

You're right, it doesn't. The one at the start is in shorthand notation. Here's another function that adds two numbers and returns the result, in shorthand:

And here's the same function, using the formal notation:

The locals are the function arguments, which are a bit clearer in the shorthand notation. The `local.get` invocations are pushing the function's arguments onto the stack.

The `i32.add` instruction pops the last 2 values off the stack and pushes their sum to the stack. The result is implicitly returned, meaning whatever called this function gets access to its return value.

MDN doesn't really have an example page for these as they use shorthand syntax everywhere, but here's a post that touches on these topics.

Memory

Memories hold data.

In browsers, we can manipulate memory instances in the runtime either via the JS API or by exporting them from Wasm. Other runtimes will generally expose similar functionality.

Currently, only a single memory instance per Wasm module is allowed.

From Wasm's point of view, a memory instance is a growable sequence of bytes. We access this memory via indexes, therefore an index can be thought of as a memory address.

From JS's point of view, a memory instance is an object that holds a read-only buffer of type `ArrayBuffer` (local to the module) or `SharedArrayBuffer` (shared between modules).

Wasm specifies the memory pages to be 64KiB. A single `Memory` instance can hold an arbitrary number of pages. Growing memory means we are adding memory pages to the instance.

Consequently, growing unshared memory invalidates any previously held `ArrayBuffer` references to it as there is no guarantee the memory is at the same location.

The `ArrayBuffer` length becomes 0, however, a new valid reference with the correct length can be obtained by accessing the `buffer` property of the memory instance after growing it.

Conversely, growing shared memory is guaranteed not to move the actual contents. However, any references to its corresponding `SharedArrayBuffer` will not be updated. More specifically, their length will not be updated, but just like a regular `ArrayBuffer`, accessing the `buffer` from the grown instance will return the correct reference.

The runtime handles the paging, all an application is aware of is a completely contiguous linear memory.

Wasm modules are isolated from each other, meaning a Wasm module can only access its own memory instances unless explicitly given access to outside ones. This is important for security as a malicious module cannot access memory from another one.

Further reading material: MDN 

Tables

Tables hold function references or host object references. A table can only hold a single type of reference.

Indexing a function reference table will return (you guessed it) a reference to the underlying Wasm function which can then be invoked.

Host object references are references to objects that exist outside a Wasm module. These objects are arbitrary and may include, but are not limited to host functions, object instances, etc.

By indexing into a table holding e.g. host functions, the Wasm runtime can get a hold of it and call the function, provided the host function is valid.

When invoked in Wasm, host functions can manipulate the module instance, as every activation frame gets a reference to the function's module instance.

Further reading material: MDN

Imports and exports

Wasm can import and export functions from and to the host, respectively. These have their own keywords and are straightforward so we won't dwell too much on these.

You can see an import/export example here

Again, no MDN article for this, but the link from Tables and Functions section of this blog glances over these.

Host interoperability

WebAssembly was designed to be run on the web (go figure) and browsers provide an API to interact with it. Currently, efforts are being made to provide a standardized API so Wasm can be run anywhere, not just on browsers. The WebAssembly System Interface (WASI) is one such effort.

Wasm can be invoked solely through an API and WASI would provide one. Functions, memories, and tables, among other things, can be passed to Wasm from the host and vice versa. This allows Wasm to interact with the host system in arbitrary ways, as long as the calling process has the right permissions.

Compilation

There are numerous compiler implementations that can transform their respective language source codes to Wasm.

I can only speak for `wasm-bindgen`, a Rust crate that makes compiling to Wasm possible. We won't be coding anything spectacular, just a simple hello world to demonstrate JS and Wasm interoperability. I've left links at the bottom of this post if you are interested in delving deeper into the world of Wasm.

Hello world

You've made it this far, you must really be interested in Wasm! I know all this rambling probably got you sleepy, so let's see Wasm in action!

Now, before we begin I just want to say there is a plethora of examples available online on how to compile Rust for WebAssembly, but for completeness' sake, we'll just be showcasing how to wire the final binary to the browser.

We are not reinventing the wheel here, this example is available here.

Without further ado, here are the `lib.rs` and `Cargo.toml` files:

As you can see, we are defining a simple function that delegates to the imported `window.alert` function. We will see this in action when we import the script.

We now need to build the binary. Our target is `wasm32-unknown-unknown`. We need this target in the compiler toolchain. We can add it by invoking

Now that we have the compiler toolchain, we build our application for Wasm with

This will create an artifact at `target/wasm32-unknown-unknown/release/wasme_baby.wasm`. These are the bytecode instructions we discussed previously.

Now that we have it, we have to somehow ship it to the browser. Creating the plumbing for this can be done manually, but it is generally less hassle if we have `wasm-bindgen-cli`.

Once we have the tool, from the project root we can invoke

After this command, we should have a `public` directory with all the necessary plumbing:

  • wasme_baby.js - a script file that we can load as a plain ol' JS script.
  • wasme_baby.d.ts - Typescript definitions for the `wasme_baby.js` file.
  • wasme_baby_bg.wasm - an internal file that gets used by the `wasme_baby.js` script and that contains the Wasm instructions.
  • wasme_baby_bg.wasm.d.ts - Typescript definitions for exported Wasm functions.

All that's left to do is create an `index.html`, import the script in it, and serve it via some web server.

We can now serve this index to load the Wasm and make it greet us (as all respectful web apps should). You can use any web server, I'll be using miniserve:

By default, miniserve will serve on 8080 and if we go to `http://127.0.0.1:8080/index.html`, we should be able to see the welcome. Yay!

That concludes the practical example.

Cool! So, when should I use Wasm?

Wasm is generally faster than JS. This makes intuitive sense because a precompiled binary is almost always faster than its interpreted/JIT compiled counterpart. Even better, compilers can optimize Wasm so the downloaded binary ships to the browser already optimized, instead of optimizing it on the fly as with JS.

This is not to say Wasm trumps JS in terms of performance. While the performance gain in some areas like video editing, rendering, and physics simulations is clear, JS still provides a better experience for DOM manipulation and the browser APIs. What's more is that invoking Wasm through JS incurrs overhead, so it won't perform so great if you have a lot of back and forth between JS and Wasm land.

Like with any tool, we should use the right one for the job (of course the Rust way is to find the right job for the tool, but this post is not about that!) and when JS is just too loosy goosy, Wasm can step in to save the day.

If you are interested in delving deeper into the world of Wasm, here is a list of some useful links:

Hey, you! What do you think?

They say knowledge has power only if you pass it on - we hope our blog post gave you valuable insight.

If you want to share your opinion or you have some questions for us, feel free to contact us.

We'd love to hear what you have to say!