WebAssembly or simply WASM is an effort undertaken by the World Wide Web Consortium (W3C) which is the same consortium that oversees most major web standards that include: CSS, DOM, HTML and XML, among many others.
To understand the 'why' behind WebAssembly let's go straight to the WebAssembly specification: WebAssembly is a safe, portable, low-level code format designed for efficient execution and compact representation. Its main goal is to enable high performance applications on the Web, but it does not make any Web-specific assumptions or provide Web-specific features, so it can be employed in other environments as well.
Getting to machine language from other languages: Interpreted, compiled, assembly, higher level, mid level, system level and low level
The binary files --
.bin files -- which most people click on mindlessly to install software on their computers are made up of machine language. If you open an
.bin file with a text editor, you'll notice it's exclusively made up of number sequences that represent instructions for a computer's central processing unit (CPU). CPUs unlike humans are designed to work with these low level type of instructions for efficiency reasons.
Machine language is considered the lowest level and also the highest performant of all languages. But along with the upside of being the highest performant, the downside of working at such a low level -- number sequences -- also means it's hard, if not impossible, to modify and it's also optimized for specific types of CPUs, which is why you'll often find binary files/machine code for different processor architectures (e.g. i386, amd64, arm64, mips).
Next in the continuum of low to high level languages is assembly language. While machine language is limited to number sequences, assembly language is a little more verbose relying on text instructions to describe its tasks. Still, assembly language is considered a low level programming language to the point it's even nicknamed symbolic machine language, mainly because even its text instructions are not that obvious and the instructions must also take into account the type of CPU on which it will run. Assembly language like all programming languages ends up transformed into machine language, so performance wise there's not much difference, except that assembly language requires an assembler to make the transformation from assembly language to machine language, in turn making development somewhat slower, but this extra step is done in the name of working with a friendlier modifiable syntax vs. number sequences used by machine language.
Following assembly language, as higher level languages, are compiled languages such as C and C++. Unlike assembly language and machine language which contain hardware/CPU specific instructions, C and C++ avoid these type of instructions making them truly portable across different hardware/CPUs. Now, this doesn't mean C and C++ don't require working with hardware/CPU specific instructions, it simply means they work at a higher level and let a compiler worry about things like hardware/CPU specific instructions. Just like assembly language, compiled languages like C and C++ also end up transformed into machine language so performance is also a non-issue, however, what does change is the amount of steps required to transform a compiled language like C or C++ into machine language. Compilers are much more complex tools than assemblers, producing intermediate object language, requiring header files and using linkers to achieve their end goal, among other things. The important takeaway from this is not so much that compilers are elaborate tools -- there are many resources that explain compilers in a very detailed way -- it's that compiled languages like C and C++ introduce an extra layer in the form of a tool between them and machine language to faciliate working at a higher level than assembly language or machine language.
Before moving on, it's important to understand the classification of low to high level languages is generally a contentious one. Strictly speaking, low level languages are those that require handling hardware/CPU specific instructions, which would only leave assembly language and machine language to this group. However, although languages like C and C++ don't require dealing with hardware/CPU specific instructions, they still have control over some important hardware details, chief among them memory management, which are instructions to assign and unassign memory resources. So is managing memory resources low level or high level ? Purists would say high level since memory management is the same across different processor architectures. On the other hand, people who've worked with higher level languages that automatically do memory management, would say memory management is low level since it can be painstakingly time consuming and a distraction from other tasks. To a certain extent both arguments hold truth, so to settle the argument about this fuzzy boundry you'll often hear the term mid level languages to describe languages that offer both low level and high level features. Since it's not good to get lost on semantics, I would recommend the following as a rule of thumb: lower level languages are more difficult to implement, are more difficult to read by humans and are also created for specific hardware/CPU; where as higher level languages are easier to implement, are easier to read by humans and are also shielded from hardware/CPU specifics by relying on tools (e.g. compilers, run-times) to take care of specific hardware/CPU details.
Another important point to make before moving on is that languages like C and C++ represent a sweet spot in the low to high level language ladder. On the one hand, they're sufficiently low level they let you manage things like memory without needing to know assembly, but yet they're sufficiently high level to be portable across hardware/CPUs. In summary, they're just right for software that requires low level control and high abstraction, such as software that serves as a foundation for other software (e.g. operating systems and run-times). For this reason, in addition to being classified as mid level languages, languages like C and C++ are also often referred to as system languages.
Although C and C++ represent a step forward from assembly language, they can be tedious to work with for application software. If you're creating an operating system or a browser then having control over something like memory management can be essential, but if you're creating software for accounting or shipping workflows then needing to deal with memory management can be a burden. For this reason, higher level languages emerged to further simplify and shield engineers from working at the levels offered by C and C++.
The strategy for higher level languages like Java and C# consists of using a run-time to take care of all the execution intricacies -- similar to how compilers and assemblers take care of the low level details in other languages. These run-times -- which are mostly built in C, C++ and assembly language -- are tasked with generating the final machine language executed by a CPU, in addition to taking care of other low level details like memory management. To accomodate this architecture, these type of languages are initially pre-compiled to a low level language -- Java bytecode in Java and Common Intermediate Language (CIL) in C# -- designed to run on their own run-time -- Java Virtual Machine (JVM) in Java and Common Language Runtime (CLR) in C#. By introducing this level of abstraction, the majority of the low level work (e.g. machine language compilation, memory management, support for different processor architectures) is shifted to the run-time which is made available for multiple hardware/CPUs, so those creating the software are spared from dealing with such issues to focus on higher level programming tasks. This language strategy of using run-times is often described as "Write once, run anywhere" (WORA), a marketing theme popularized by the creators of Java.
A more ambitious approach taken by Microsoft was the creation of ActiveX to enable the execution of compiled-type languages across any networked application -- not just browsers, but software in general (e.g. Browser, Office, Media Player). Although ActiveX quickly gained support for a wide array of languages (e.g. C++, Delphi, VisualBasic) to be compiled into 'ActiveX Controls' -- its core execution components -- its limits were quick to show. Because ActiveX Controls contained compiled instructions (i.e. machine language) they became limited to run on a single processor architecture and Windows operating systems, on top of which, ActiveX was also born flawed with its security scheme to allow full access to a host computer (e.g. file system, applications) vs. restricted access to a sandboxed environment like that of a browser.
Based on a similar premise to ActiveX, but confined to a browser to address any security concerns, Google created Google Native Client (NaCl). Although NaCl delivered on the promise to execute compiled-type languages -- C and C++ -- securly in a browser through 'nexe executables', it like ActiveX containing compiled instructions (i.e. machine language) became limited to a single processor architecture or the need to distribute multiple 'nexe executables' for different processor architectures. To allow portability between processor architecture Google created Portable Native Client (PNaCl), which has a similar design to NaCL, except that it produces processor agnostic compiled instructions (i.e. machine language) through 'pexe executables'. PNaCL achieves its portability by requiring a browser to translate these agnostic compiled instructions into processor specific compiled instructions, a process which is pretty similar to other languages that rely on intermediate bytecode formats to achieve portability (e.g. Java bytecode, C# CIL). Because both NaCl and PNaCl were developed by Google they were designed for Google's Chrome browser and have no portability for other browsers, however, with the appearance of WebAssembly this has become a moot point as Google has begun to phase out the use PNaCl in favor of WebAssembly.
WebAssembly covers practically all of the shortcomings present in the fragmented technologies that preceded it.
- WebAssembly is safe and offers efficient execution.- WebAssembly modules execute in a sandboxed environment, ensuring applications execute independently and can't communicate with the outside except through the appropriate APIs; compared to technologies like ActiveX with serious security/access issues.
- WebAssembly is portable and offers high performance.- WebAssembly uses a binary format designed to be executable across operating systems and processor architectures; compared to technologies like NaCl requiring compilation for different processor architectures.
- WebAssembly is low-level and offers compact representation.- Web assembly uses a conventional Instruction Set Architecture (ISA) with low-level facilities, in order for higher level languages to be easily compiled into WebAssembly; compared to technologies like asm.js which force the exclusion of certain language features in order to perform efficiently.
But more importantly probably than covering the shortcomings of previous technologies, WebAssembly has achieved one the most difficult feats in technology: traction. That WebAssembly has managed to bring together the makers of four mass-market browsers -- Google Chrome, Microsoft Edge, Firefox, Apple Safari -- to not only support, but ship WebAssembly activated in their browsers, should be a testament that the future of WebAssembly is bright.
WebAssembly: The future of the web