TCC RISC-V Compiler runs in the Web Browser (thanks to Zig Compiler)

đź“ť 4 Feb 2024

TCC RISC-V Compiler runs in the Web Browser (thanks to Zig Compiler)

(Try the Online Demo)

(Watch the Demo on YouTube)

TCC is a Tiny C Compiler for 64-bit RISC-V (and other platforms)…

Can we run TCC Compiler in a Web Browser?

Let’s do it! We’ll compile TCC (Tiny C Compiler) from C to WebAssembly with Zig Compiler.

In this article, we talk about the tricky bits of our TCC ported to WebAssembly…

Why are we doing this?

Today we’re running Apache NuttX RTOS inside a Web Browser, with WebAssembly + Emscripten + 64-bit RISC-V.

(Real-Time Operating System in a Web Browser on a General-Purpose Operating System!)

What if we could Build and Test NuttX Apps in the Web Browser…

  1. We type a C Program into our Web Browser (pic below)

  2. Compile it into an ELF Executable with TCC

  3. Copy the ELF Executable to the NuttX Filesystem

  4. And NuttX Emulator runs our ELF Executable inside the Web Browser

Learning NuttX becomes so cool! This is how we made it happen…

(Watch the Demo on YouTube)

(Not to be confused with TTC Compiler)

Online Demo of TCC Compiler in WebAssembly

Online Demo of TCC Compiler in WebAssembly

§1 TCC in the Web Browser

Click this link to try TCC Compiler in our Web Browser (pic above)

This C Program appears…

// Demo Program for TCC Compiler
int main(int argc, char *argv[]) {
  printf("Hello, World!!\n");
  return 0;
}

Click the “Compile” button. Our Web Browser calls TCC to compile the above program…

## Compile to RISC-V ELF
tcc -c hello.c

And it downloads the compiled RISC-V ELF a.out. We inspect the Compiled Output…

## Dump the RISC-V Disassembly
## of TCC Output
$ riscv64-unknown-elf-objdump \
    --syms --source --reloc --demangle \
    --line-numbers --wide  --debugging \
    a.out

main():
   ## Prepare the Stack
   0: fe010113  addi   sp, sp, -32
   4: 00113c23  sd     ra, 24(sp)
   8: 00813823  sd     s0, 16(sp)
   c: 02010413  addi   s0, sp, 32
  10: 00000013  nop

   ## Load to Register A0: "Hello World"
  14: fea43423  sd     a0, -24(s0)
  18: feb43023  sd     a1, -32(s0)
  1c: 00000517  auipc  a0, 0x0
  1c: R_RISCV_PCREL_HI20 L.0
  20: 00050513  mv     a0, a0
  20: R_RISCV_PCREL_LO12_I .text

   ## Call printf()
  24: 00000097  auipc  ra, 0x0
  24: R_RISCV_CALL_PLT printf
  28: 000080e7  jalr   ra  ## 24 <main+0x24>

   ## Clean up the Stack and
   ## return 0 to Caller
  2c: 0000051b  sext.w a0, zero
  30: 01813083  ld     ra, 24(sp)
  34: 01013403  ld     s0, 16(sp)
  38: 02010113  addi   sp, sp, 32
  3c: 00008067  ret

Yep the 64-bit RISC-V Code looks legit! Very similar to our NuttX App. (So it will probably run on NuttX)

What just happened? We go behind the scenes…

(See the Entire Disassembly)

(About the RISC-V Instructions)

Zig Compiler compiles TCC Compiler to WebAssembly

§2 Zig compiles TCC to WebAssembly

Will Zig Compiler happily compile TCC to WebAssembly?

Amazingly, yes! (Pic above)

## Zig Compiler compiles TCC Compiler
## from C to WebAssembly. Produces `tcc.o`
zig cc \
  -c \
  -target wasm32-freestanding \
  -dynamic \
  -rdynamic \
  -lc \
  -DTCC_TARGET_RISCV64 \
  -DCONFIG_TCC_CROSSPREFIX="\"riscv64-\""  \
  -DCONFIG_TCC_CRTPREFIX="\"/usr/riscv64-linux-gnu/lib\"" \
  -DCONFIG_TCC_LIBPATHS="\"{B}:/usr/riscv64-linux-gnu/lib\"" \
  -DCONFIG_TCC_SYSINCLUDEPATHS="\"{B}/include:/usr/riscv64-linux-gnu/include\""   \
  -DTCC_GITHASH="\"main:b3d10a35\"" \
  -Wall \
  -O2 \
  -Wdeclaration-after-statement \
  -fno-strict-aliasing \
  -Wno-pointer-sign \
  -Wno-sign-compare \
  -Wno-unused-result \
  -Wno-format-truncation \
  -Wno-stringop-truncation \
  -I. \
  tcc.c

(See the TCC Source Code)

(About the Zig Compiler Options)

We link the TCC WebAssembly with our Zig Wrapper (that exports the TCC Compiler to JavaScript)…

## Compile our Zig Wrapper `tcc-wasm.zig` for WebAssembly
## and link it with TCC compiled for WebAssembly `tcc.o`
## Generates `tcc-wasm.wasm`
zig build-exe \
  -target wasm32-freestanding \
  -rdynamic \
  -lc \
  -fno-entry \
  -freference-trace \
  --verbose-cimport \
  --export=compile_program \
  zig/tcc-wasm.zig \
  tcc.o

## Test everything with Web Browser
## or Node.js
node zig/test.js

(See the Zig Wrapper tcc-wasm.zig)

(See the Test JavaScript test.js)

What’s inside our Zig Wrapper?

Our Zig Wrapper will…

  1. Receive the C Program from JavaScript

  2. Receive the TCC Compiler Options from JavaScript

  3. Call TCC Compiler to compile our program

  4. Return the compiled RISC-V ELF to JavaScript

Like so: tcc-wasm.zig

/// Call TCC Compiler to compile a
/// C Program to RISC-V ELF
pub export fn compile_program(
  options_ptr: [*:0]const u8, // Options for TCC Compiler (Pointer to JSON Array:  ["-c", "hello.c"])
  code_ptr:    [*:0]const u8, // C Program to be compiled (Pointer to String)
) [*]const u8 { // Returns a pointer to the `a.out` Compiled Code (Size in first 4 bytes)

  // Receive the C Program from
  // JavaScript and set our Read Buffer
  // https://blog.battlefy.com/zig-made-it-easy-to-pass-strings-back-and-forth-with-webassembly
  const code: []const u8 = std.mem.span(code_ptr);
  read_buf = code;

  // Omitted: Receive the TCC Compiler
  // Options from JavaScript
  // (JSON containing String Array: ["-c", "hello.c"])
  ...

  // Call the TCC Compiler
  _ = main(@intCast(argc), &args_ptrs);

  // Return pointer of `a.out` to
  // JavaScript. First 4 bytes: Size of
  // `a.out`. Followed by `a.out` data.
  const slice = std.heap.page_allocator.alloc(u8, write_buflen + 4)   
    catch @panic("Failed to allocate memory");
  const size_ptr: *u32 = @alignCast(@ptrCast(slice.ptr));
  size_ptr.* = write_buflen;
  @memcpy(slice[4 .. write_buflen + 4], write_buf[0..write_buflen]);
  return slice.ptr; // TODO: Deallocate this memory
}

Plus a couple of Magical Bits that we’ll cover in the next section.

(How JavaScript calls our Zig Wrapper)

Zig Compiler compiles TCC without any code changes?

Inside TCC, we stubbed out the setjmp / longjmp to make it compile with Zig Compiler.

Everything else compiles OK!

Is it really OK to stub them out?

setjmp / longjmp are called to Handle Errors during TCC Compilation. Assuming everything goes hunky dory, we won’t need them.

Later we’ll find a better way to express our outrage. (Instead of jumping around)

We probe the Magical Bits inside our Zig Wrapper…

TCC Compiler in WebAssembly needs POSIX Functions

§3 POSIX for WebAssembly

What’s this POSIX?

TCC Compiler was created as a Command-Line App. So it calls the typical POSIX Functions like fopen, fprintf, strncpy, malloc, …

But WebAssembly running in a Web Browser ain’t No Command Line! (Pic above)

(WebAssembly doesn’t have a C Standard Library libc)

Is POSIX a problem for WebAssembly?

We counted 72 POSIX Functions needed by TCC Compiler, but missing from WebAssembly.

Thus we fill in the Missing Functions ourselves.

(About the Missing POSIX Functions)

Surely other Zig Devs will have the same problem?

Thankfully we can borrow the POSIX Code from other Zig Libraries…

72 POSIX Functions? Sounds like a lot of work…

We might not need all 72 POSIX Functions. We stubbed out many of the functions to identify the ones that are called: tcc-wasm.zig

// Stub Out the Missing POSIX
// Functions. If TCC calls them, 
// we'll see a Zig Panic. Then we 
// implement them. The Types don't
// matter because we'll halt anyway.

pub export fn atoi(_: c_int) c_int {
  @panic("TODO: atoi");
}
pub export fn exit(_: c_int) c_int {
  @panic("TODO: exit");
}
pub export fn fopen(_: c_int) c_int {
  @panic("TODO: fopen");
}

// And many more functions...

Some of these functions are especially troubling for WebAssembly…

File Input and Output are especially troubling for WebAssembly

§4 File Input and Output

Why no #include in TCC for WebAssembly? And no C Libraries?

WebAssembly runs in a Secure Sandbox. No File Access allowed, sorry! (Like for Header and Library Files)

That’s why our Zig Wrapper Emulates File Access for the bare minimum 2 files…

Reading a Source File hello.c is extremely simplistic: tcc-wasm.zig

/// Emulate the POSIX Function `read()`
/// We copy from One Single Read Buffer
/// that contains our C Program
export fn read(fd0: c_int, buf: [*:0]u8, nbyte: size_t) isize {

  // TODO: Support more than one file
  const len = read_buf.len;
  assert(len < nbyte);
  @memcpy(buf[0..len], read_buf[0..len]);
  buf[len] = 0;
  read_buf.len = 0;
  return @intCast(len);
}

/// Read Buffer for read
var read_buf: []const u8 = undefined;

(read_buf is populated at startup)

Writing the Compiled Output a.out is just as barebones: tcc-wasm.zig

/// Emulate the POSIX Function `write()`
/// We write to One Single Memory
/// Buffer that will be returned to 
/// JavaScript as `a.out`
export fn fwrite(ptr: [*:0]const u8, size: usize, nmemb: usize, stream: *FILE) usize {

  // TODO: Support more than one `stream`
  const len = size * nmemb;
  @memcpy(write_buf[write_buflen .. write_buflen + len], ptr[0..]);
  write_buflen += len;
  return nmemb;
}

/// Write Buffer for fputc and fwrite
var write_buf = std.mem.zeroes([8192]u8);
var write_buflen: usize = 0;

(write_buf will be returned to JavaScript)

Can we handle Multiple Files?

Right now we’re trying to embed the simple ROM FS Filesystem into our Zig Wrapper.

The ROM FS Filesystem will be preloaded with the Header and Library Files needed by TCC. See the details here…

Our Zig Wrapper uses Pattern Matching to match the C Formats and substitute the Zig Equivalent

§5 Fearsome fprintf and Friends

Why is fprintf particularly problematic?

Here’s the fearsome thing about fprintf and friends: sprintf, snprintf, vsnprintf…

Hence we hacked up an implementation of String Formatting that’s safer, simpler and so-barebones-you-can-make-soup-tulang.

Soup tulang? Tell me more…

Our Zig Wrapper uses Pattern Matching to match the C Formats and substitute the Zig Equivalent (pic above): tcc-wasm.zig

// Format a Single `%d`
// like `#define __TINYC__ %d`
FormatPattern{

  // If the C Format String contains this...
  .c_spec = "%d",
  
  // Then we apply this Zig Format...
  .zig_spec = "{}",
  
  // And extract these Argument Types
  // from the Varargs...
  .type0 = c_int,
  .type1 = null
}

This works OK (for now) because TCC Compiler only uses 5 Patterns for C Format Strings: tcc-wasm.zig

/// Pattern Matching for C String Formatting:
/// We'll match these patterns when
/// formatting strings
const format_patterns = [_]FormatPattern{

  // Format a Single `%d`, like `#define __TINYC__ %d`
  FormatPattern{
    .c_spec = "%d",  .zig_spec = "{}", 
    .type0  = c_int, .type1 = null
  },

  // Format a Single `%u`, like `L.%u`
  FormatPattern{ 
    .c_spec = "%u",  .zig_spec = "{}", 
    .type0  = c_int, .type1 = null 
  },

  // Format a Single `%s`, like `.rela%s`
  // Or `#define __BASE_FILE__ "%s"`
  FormatPattern{
    .c_spec = "%s", .zig_spec = "{s}",
    .type0  = [*:0]const u8, .type1 = null
  },

  // Format Two `%s`, like `#define %s%s\n`
  FormatPattern{
    .c_spec = "%s%s", .zig_spec = "{s}{s}",
    .type0  = [*:0]const u8, .type1 = [*:0]const u8
  },

  // Format `%s:%d`, like `%s:%d: `
  // (File Name and Line Number)
  FormatPattern{
    .c_spec = "%s:%d", .zig_spec = "{s}:{}",
    .type0  = [*:0]const u8, .type1 = c_int
  },
};

That’s our quick hack for fprintf and friends!

(How we do Pattern Matching)

So simple? Unbelievable!

Actually we’ll hit more Format Patterns as TCC Compiler emits various Error and Warning Messages. But it’s a good start!

Later our Zig Wrapper shall cautiously and meticulously parse all kinds of C Format Strings. Or we do the parsing in C, compiled to WebAssembly. (160 lines of C!)

See the updates here…

(Funny how printf is the first thing we learn about C. Yet it’s incredibly difficult to implement!)

Compile and Run NuttX Apps in the Web Browser

§6 Test with Apache NuttX RTOS

TCC in WebAssembly has compiled our C Program to RISC-V ELF…

Will the ELF run on NuttX?

Apache NuttX RTOS is a tiny operating system for 64-bit RISC-V that runs on QEMU Emulator. (And many other devices)

We build NuttX for QEMU and copy our RISC-V ELF a.out to the NuttX Apps Filesystem (pic above)…

## Copy RISC-V ELF `a.out`
## to NuttX Apps Filesystem
cp a.out apps/bin/
chmod +x apps/bin/a.out

(How we build NuttX for QEMU)

Then we boot NuttX and run a.out…

## Boot NuttX on QEMU 64-bit RISC-V
## Remove __`-bios none`__ for newer versions of NuttX
$ qemu-system-riscv64 \
  -semihosting \
  -M virt,aclint=on \
  -cpu rv64 \
  -bios none \
  -kernel nuttx \
  -nographic

## Run `a.out` in NuttX Shell
NuttShell (NSH) NuttX-12.4.0
nsh> a.out
Loading /system/bin/a.out
Exported symbol "printf" not found
Failed to load program 'a.out'

(See the Complete Log)

NuttX politely accepts the RISC-V ELF (produced by TCC). And says that printf is missing.

Which makes sense: We haven’t linked our C Program with the C Library!

(Loading a RISC-V ELF should look like this)

How else can we print something in NuttX?

To print something, we can make a System Call (ECALL) directly to NuttX Kernel (bypassing the POSIX Functions)…

// NuttX System Call that prints
// something. System Call Number
// is 61 (SYS_write). Works exactly
// like POSIX `write()`
ssize_t write(
  int fd,           // File Descriptor (1 for Standard Output)
  const char *buf,  // Buffer to be printed
  size_t buflen     // Buffer Length
);

// Which makes an ECALL with these Parameters...
// Register A0 is 61 (SYS_write)
// Register A1 is the File Descriptor (1 for Standard Output)
// Register A2 points to the String Buffer to be printed
// Register A3 is the Buffer Length

That’s the same NuttX System Call that printf executes internally.

Final chance to say hello to NuttX…

TCC WebAssembly compiles a NuttX System Call

§7 Hello NuttX!

We’re making a System Call (ECALL) to NuttX Kernel to print something…

How will we code this in C?

We execute the ECALL in RISC-V Assembly like this: test-nuttx.js

int main(int argc, char *argv[]) {

  // Make NuttX System Call
  // to write(fd, buf, buflen)
  const unsigned int nbr = 61; // SYS_write
  const void *parm1 = 1;       // File Descriptor (stdout)
  const void *parm2 = "Hello, World!!\n"; // Buffer
  const void *parm3 = 15; // Buffer Length

  // Load the Parameters into
  // Registers A0 to A3
  // Note: This doesn't work with TCC,
  // so we load again below
  register long r0 asm("a0") = (long)(nbr);
  register long r1 asm("a1") = (long)(parm1);
  register long r2 asm("a2") = (long)(parm2);
  register long r3 asm("a3") = (long)(parm3);

  // Execute ECALL for System Call
  // to NuttX Kernel. Again: Load the
  // Parameters into Registers A0 to A3
  asm volatile (

    // Load 61 to Register A0 (SYS_write)
    "addi a0, zero, 61 \n"
    
    // Load 1 to Register A1 (File Descriptor)
    "addi a1, zero, 1 \n"
    
    // Load 0xc0101000 to Register A2 (Buffer)
    "lui   a2, 0xc0 \n"
    "addiw a2, a2, 257 \n"
    "slli  a2, a2, 0xc \n"
    
    // Load 15 to Register A3 (Buffer Length)
    "addi a3, zero, 15 \n"
    
    // ECALL for System Call to NuttX Kernel
    "ecall \n"
    
    // NuttX needs NOP after ECALL
    ".word 0x0001 \n"

    // Input+Output Registers: None
    // Input-Only Registers: A0 to A3
    // Clobbers the Memory
    :
    : "r"(r0), "r"(r1), "r"(r2), "r"(r3)
    : "memory"
  );

  // Loop Forever
  for(;;) {}
  return 0;
}

We copy this into our Web Browser and compile it. (Pic above)

(Why so complicated? Explained here)

(Caution: SYS_write 61 may change)

Does it work?

TCC in WebAssembly compiles the code above to RISC-V ELF a.out. When we copy it to NuttX and run it…

NuttShell (NSH) NuttX-12.4.0-RC0
nsh> a.out
...
## NuttX System Call for SYS_write (61)
riscv_swint:
  cmd: 61
  A0:  3d  ## SYS_write (61)
  A1:  01  ## File Descriptor (Standard Output)
  A2:  c0101000  ## Buffer
  A3:  0f        ## Buffer Length
...
## NuttX Kernel says hello
Hello, World!!

NuttX Kernel prints “Hello World” yay!

Indeed we’ve created a C Compiler in a Web Browser, that produces proper NuttX Apps!

OK so we can build NuttX Apps in a Web Browser… But can we run them in a Web Browser?

Yep, a NuttX App built in the Web Browser… Now runs OK with NuttX Emulator in the Web Browser! 🎉 (Pic below)

TLDR: We called JavaScript Local Storage to copy the RISC-V ELF a.out from TCC WebAssembly to NuttX Emulator… Then we patched a.out into the ROM FS Filesystem for NuttX Emulator. Nifty!

NuttX App built in a Web Browser… Runs inside the Web Browser!

NuttX App built in a Web Browser… Runs inside the Web Browser!

§8 What’s Next

Check out the next article…

Thanks to the TCC Team, we have a 64-bit RISC-V Compiler that runs in the Web Browser…

How will you use TCC in a Web Browser? Please lemme know 🙏

(Build and run RISC-V Apps on iPhone?)

Many Thanks to my GitHub Sponsors (and the awesome NuttX and Zig Communities) for supporting my work! This article wouldn’t have been possible without your support.

Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…

lupyuen.github.io/src/tcc.md

Online Demo of TCC Compiler in WebAssembly

Online Demo of TCC Compiler in WebAssembly

§9 Appendix: Compile TCC with Zig

This is how we run Zig Compiler to compile TCC Compiler from C to WebAssembly (pic below)…

## Download the (slightly) Modified TCC Source Code.
## Configure the build for 64-bit RISC-V.

git clone https://github.com/lupyuen/tcc-riscv32-wasm
cd tcc-riscv32-wasm
./configure
make cross-riscv64

## Call Zig Compiler to compile TCC Compiler
## from C to WebAssembly. Produces `tcc.o`

## Omitted: Run the `zig cc` command from earlier...
## https://lupyuen.github.io/articles/tcc#zig-compiles-tcc-to-webassembly
zig cc ...

## Compile our Zig Wrapper `tcc-wasm.zig` for WebAssembly
## and link it with TCC compiled for WebAssembly `tcc.o`
## Generates `tcc-wasm.wasm`

## Omitted: Run the `zig build-exe` command from earlier...
## https://lupyuen.github.io/articles/tcc#zig-compiles-tcc-to-webassembly
zig build-exe ...

(See the Build Script)

How did we figure out the “zig cc” options?

Earlier we saw a long list of Zig Compiler Options…

## Zig Compiler Options for TCC Compiler
zig cc \
  tcc.c \
  -DTCC_TARGET_RISCV64 \
  -DCONFIG_TCC_CROSSPREFIX="\"riscv64-\""  \
  -DCONFIG_TCC_CRTPREFIX="\"/usr/riscv64-linux-gnu/lib\"" \
  -DCONFIG_TCC_LIBPATHS="\"{B}:/usr/riscv64-linux-gnu/lib\"" \
  -DCONFIG_TCC_SYSINCLUDEPATHS="\"{B}/include:/usr/riscv64-linux-gnu/include\""   \
  ...

We got them from “make --trace”, which reveals the GCC Compiler Options…

## Show the GCC Options for compiling TCC
$ make --trace cross-riscv64

gcc \
  -o riscv64-tcc.o \
  -c \
  tcc.c \
  -DTCC_TARGET_RISCV64 \
  -DCONFIG_TCC_CROSSPREFIX="\"riscv64-\""  \
  -DCONFIG_TCC_CRTPREFIX="\"/usr/riscv64-linux-gnu/lib\"" \
  -DCONFIG_TCC_LIBPATHS="\"{B}:/usr/riscv64-linux-gnu/lib\"" \
  -DCONFIG_TCC_SYSINCLUDEPATHS="\"{B}/include:/usr/riscv64-linux-gnu/include\""   \
  -DTCC_GITHASH="\"main:b3d10a35\"" \
  -Wall \
  -O2 \
  -Wdeclaration-after-statement \
  -fno-strict-aliasing \
  -Wno-pointer-sign \
  -Wno-sign-compare \
  -Wno-unused-result \
  -Wno-format-truncation \
  -Wno-stringop-truncation \
  -I. 

And we copied above GCC Options to become our Zig Compiler Options.

(See the Build Script)

Zig Compiler compiles TCC Compiler to WebAssembly

§10 Appendix: JavaScript calls TCC

Previously we saw some JavaScript (Web Browser and Node.js) calling our TCC Compiler in WebAssembly (pic above)…

This is how we test the TCC WebAssembly in a Web Browser with a Local Web Server…

## Download the (slightly) Modified TCC Source Code
git clone https://github.com/lupyuen/tcc-riscv32-wasm
cd tcc-riscv32-wasm

## Start the Web Server
cargo install simple-http-server
simple-http-server ./docs &

## Whenever we rebuild TCC WebAssembly...
## Copy it to the Web Server
cp tcc-wasm.wasm docs/

Browse to this URL and our TCC WebAssembly will appear…

## Test TCC WebAssembly with Web Browser
http://localhost:8000/index.html

Check the JavaScript Console for Debug Messages.

(See the JavaScript Log)

How does it work?

On clicking the Compile Button, our JavaScript loads the TCC WebAssembly: tcc.js

// Load the WebAssembly Module and start the Main Function.
// Called by the Compile Button.
async function bootstrap() {

  // Load the WebAssembly Module `tcc-wasm.wasm`
  // https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/instantiateStreaming
  const result = await WebAssembly.instantiateStreaming(
    fetch("tcc-wasm.wasm"),
    importObject
  );

  // Store references to WebAssembly Functions
  // and Memory exported by Zig
  wasm.init(result);

  // Start the Main Function
  window.requestAnimationFrame(main);
}        

(importObject exports our JavaScript Logger to Zig)

(wasm is our WebAssembly Helper)

Which triggers the Main Function and calls our Zig Function compile_program: tcc.js

// Main Function
function main() {
  // Allocate a String for passing the Compiler Options to Zig
  // `options` is a JSON Array: ["-c", "hello.c"]
  const options = read_options();
  const options_ptr = allocateString(JSON.stringify(options));
  
  // Allocate a String for passing the Program Code to Zig
  const code = document.getElementById("code").value;
  const code_ptr = allocateString(code);

  // Call TCC to compile the program
  const ptr = wasm.instance.exports
    .compile_program(options_ptr, code_ptr);

  // Get the `a.out` size from first 4 bytes returned
  const memory = wasm.instance.exports.memory;
  const data_len = new Uint8Array(memory.buffer, ptr, 4);
  const len = data_len[0] | data_len[1] << 8 | data_len[2] << 16 | data_len[3] << 24;
  if (len <= 0) { return; }

  // Encode the `a.out` data from the rest of the bytes returned
  // `encoded_data` looks like %7f%45%4c%46...
  const data = new Uint8Array(memory.buffer, ptr + 4, len);
  let encoded_data = "";
  for (const i in data) {
    const hex = Number(data[i]).toString(16).padStart(2, "0");
    encoded_data += `%${hex}`;
  }

  // Download the `a.out` data into the Web Browser
  download("a.out", encoded_data);

  // Save the ELF Data to Local Storage for loading by NuttX Emulator
  localStorage.setItem("elf_data", encoded_data);
};

Our Main Function then downloads the a.out file returned by our Zig Function.

(allocateString allocates a String from Zig Memory)

(download is here)

What about Node.js calling TCC WebAssembly?

## Test TCC WebAssembly with Node.js
node zig/test.js

For Easier Testing (via Command-Line): We copied the JavaScript above into a Node.js Script: test.js

// Allocate a String for passing the Compiler Options to Zig
const options = ["-c", "hello.c"];
const options_ptr = allocateString(JSON.stringify(options));

// Allocate a String for passing Program Code to Zig
const code_ptr = allocateString(`
  int main(int argc, char *argv[]) {
    printf("Hello, World!!\\n");
    return 0;
  }
`);

// Call TCC to compile a program
const ptr = wasm.instance.exports
  .compile_program(options_ptr, code_ptr);

(See the Node.js Log)

(Test Script for NuttX QEMU: test-nuttx.js)

(Test Log for NuttX QEMU: test-nuttx.log)

Our Zig Wrapper doing Pattern Matching for Formatting C Strings

§11 Appendix: Pattern Matching

A while back we saw our Zig Wrapper doing Pattern Matching for Formatting C Strings…

How It Works: We search for Format Patterns in the C Format Strings and substitute the Zig Equivalent (pic above): tcc-wasm.zig

// Format a Single `%d`
// like `#define __TINYC__ %d`
FormatPattern{

  // If the C Format String contains this...
  .c_spec = "%d",
  
  // Then we apply this Zig Format...
  .zig_spec = "{}",
  
  // And extract these Argument Types
  // from the Varargs...
  .type0 = c_int,
  .type1 = null
}

(FormatPattern is defined here)

(See the Format Patterns)

To implement this, we call comptime Functions in Zig: tcc-wasm.zig

/// CompTime Function to format a string by Pattern Matching.
/// Format a Single Specifier, like `#define __TINYC__ %d\n`
/// If the Spec matches the Format: Return the number of bytes written to `str`, excluding terminating null.
/// Else return 0.
fn format_string1(
  ap: *std.builtin.VaList,  // Varargs passed from C
  str:    [*]u8,            // Buffer for returning Formatted String
  size:   size_t,           // Buffer Size
  format: []const u8,       // C Format String, like `#define __TINYC__ %d\n`
  comptime c_spec:   []const u8,  // C Format Pattern, like `%d`
  comptime zig_spec: []const u8,  // Zig Equivalent, like `{}`
  comptime T0:       type,        // Type of First Vararg, like `c_int`
) usize {  // Return the number of bytes written to `str`, excluding terminating null

  // Count the Format Specifiers: `%`
  const spec_cnt   = std.mem.count(u8, c_spec, "%");
  const format_cnt = std.mem.count(u8, format, "%");

  // Check the Format Specifiers: `%`
  // Quit if the number of specifiers are different
  // Or if the specifiers are not found
  if (format_cnt != spec_cnt or
      !std.mem.containsAtLeast(u8, format, 1, c_spec)) {
    return 0;
  }

  // Fetch the First Argument from the C Varargs
  const a = @cVaArg(ap, T0);

  // Format the Argument
  var buf: [512]u8 = undefined;
  const buf_slice = std.fmt.bufPrint(&buf, zig_spec, .{a}) catch {
    @panic("format_string1 error: buf too small");
  };

  // Replace the C Format Pattern by the Zig Equivalent
  var buf2 = std.mem.zeroes([512]u8);
  _ = std.mem.replace(u8, format, c_spec, buf_slice, &buf2);

  // Return the Formatted String and Length
  const len = std.mem.indexOfScalar(u8, &buf2, 0).?;
  assert(len < size);
  @memcpy(str[0..len], buf2[0..len]);
  str[len] = 0;
  return len;
}

// Omitted: Function `format_string2` looks similar,
// but for 2 Varargs (instead of 1)

The function above is called by a comptime Inline Loop that applies all the Format Patterns that we saw earlier: tcc-wasm.zig

/// Runtime Function to format a string by Pattern Matching.
/// Return the number of bytes written to `str`, excluding terminating null.
fn format_string(
  ap: *std.builtin.VaList,  // Varargs passed from C
  str:    [*]u8,            // Buffer for returning Formatted String
  size:   size_t,           // Buffer Size
  format: []const u8,       // C Format String, like `#define __TINYC__ %d\n`
) usize {  // Return the number of bytes written to `str`, excluding terminating null

  // If no Format Specifiers: Return the Format, like `warning: `
  const len = format_string0(str, size, format);
  if (len > 0) { return len; }

  // For every Format Pattern...
  inline for (format_patterns) |pattern| {

    // Try formatting the string with the pattern...
    const len2 =
      if (pattern.type1) |t1|
      // Pattern has 2 parameters
      format_string2(ap, str, size, format, // Output String and Format String
        pattern.c_spec, pattern.zig_spec,   // Format Specifiers for C and Zig
        pattern.type0, t1 // Types of the Parameters
      )
    else
      // Pattern has 1 parameter
      format_string1(ap, str, size, format, // Output String and Format String
        pattern.c_spec, pattern.zig_spec,   // Format Specifiers for C and Zig
        pattern.type0 // Type of the Parameter
      );

    // Loop until we find a match pattern
    if (len2 > 0) { return len2; }
  }

  // Format String doesn't match any Format Pattern.
  // We return the Format String and Length.
  const len3 = format.len;
  assert(len3 < size);
  @memcpy(str[0..len3], format[0..len3]);
  str[len3] = 0;
  return len3;
}

(format_string2 is here)

And the above function is called by fprintf and friends: tcc-wasm.zig

/// Implement the POSIX Function `fprintf`
export fn fprintf(stream: *FILE, format: [*:0]const u8, ...) c_int {

  // Prepare the varargs
  var ap = @cVaStart();
  defer @cVaEnd(&ap);

  // Format the string
  var buf = std.mem.zeroes([512]u8);
  const format_slice = std.mem.span(format);
  const len = format_string(&ap, &buf, buf.len, format_slice);

  // TODO: Print to other File Streams.
  // Right now we assume it's stderr (File Descriptor 2)
  return @intCast(len);
}

// Do the same for sprintf, snprintf, vsnprintf

Right now we’re doing simple Pattern Matching. But it might not be sufficient when TCC compiles Real Programs. See the updates here…

(See the Formatting Log)

(Without comptime: Our code gets super tedious)

NuttX Apps make a System Call to print to the console

§12 Appendix: NuttX System Call

Just now we saw a huge chunk of C Code that makes a NuttX System Call…

Why so complicated?

We refer to the Sample Code for NuttX System Calls (ECALL). Rightfully this shorter version should work…

// Make NuttX System Call to write(fd, buf, buflen)
const unsigned int nbr = 61; // SYS_write
const void *parm1 = 1;       // File Descriptor (stdout)
const void *parm2 = "Hello, World!!\n"; // Buffer
const void *parm3 = 15; // Buffer Length

// Execute ECALL for System Call to NuttX Kernel
register long r0 asm("a0") = (long)(nbr);
register long r1 asm("a1") = (long)(parm1);
register long r2 asm("a2") = (long)(parm2);
register long r3 asm("a3") = (long)(parm3);

asm volatile (
  // ECALL for System Call to NuttX Kernel
  "ecall \n"

  // NuttX needs NOP after ECALL
  ".word 0x0001 \n"

  // Input+Output Registers: None
  // Input-Only Registers: A0 to A3
  // Clobbers the Memory
  :
  : "r"(r0), "r"(r1), "r"(r2), "r"(r3)
  : "memory"
);

Strangely TCC generates mysterious RISC-V Machine Code that mashes up the RISC-V Registers…

main():
// Prepare the Stack
   0:  fc010113  add     sp, sp, -64
   4:  02113c23  sd      ra, 56(sp)
   8:  02813823  sd      s0, 48(sp)
   c:  04010413  add     s0, sp, 64
  10:  00000013  nop
  14:  fea43423  sd      a0, -24(s0)
  18:  feb43023  sd      a1, -32(s0)

// Correct: Load Register A0 with 61 (SYS_write)
  1c:  03d0051b  addw    a0, zero, 61
  20:  fca43c23  sd      a0, -40(s0)

// Nope: Load Register A0 with 1?
// Mixed up with Register A1! (Value 1)
  24:  0010051b  addw    a0, zero, 1
  28:  fca43823  sd      a0, -48(s0)

// Nope: Load Register A0 with "Hello World"?
// Mixed up with Register A2!
  2c:  00000517  auipc   a0,0x0  2c: R_RISCV_PCREL_HI20  L.0
  30:  00050513  mv      a0,a0   30: R_RISCV_PCREL_LO12_I        .text
  34:  fca43423  sd      a0, -56(s0)

// Nope: Load Register A0 with 15?
// Mixed up with Register A3! (Value 15)
  38:  00f0051b  addw    a0, zero, 15
  3c:  fca43023  sd      a0, -64(s0)

// Execute ECALL with Register A0 set to 15.
// Nope: A0 should be 61!
  40:  00000073  ecall
  44:  0001      nop

Thus we hardcode Registers A0 to A3 in RISC-V Assembly: test-nuttx.js

// Load 61 to Register A0 (SYS_write)
addi  a0, zero, 61

// Load 1 to Register A1 (File Descriptor)
addi  a1, zero, 1

// Load 0xc0101000 to Register A2 (Buffer)
lui   a2, 0xc0
addiw a2, a2, 257
slli  a2, a2, 0xc

// Load 15 to Register A3 (Buffer Length)
addi  a3, zero, 15

// ECALL for System Call to NuttX Kernel
ecall

// NuttX needs NOP after ECALL
.word 0x0001

And it prints “Hello World”!

TODO: Is there a workaround? Do we paste the ECALL Assembly Code ourselves? NuttX Libraries won’t link with TCC

(See the TCC WebAssembly Log)

What’s with the addi and nop?

TCC won’t assemble the “li” and “nop” instructions.

So we used this RISC-V Online Assembler to assemble the code above.

“addi” above is the longer form of “li”, which TCC won’t assemble…

// Load 61 to Register A0 (SYS_write)
// But TCC won't assemble `li a0, 61`
// So we do this...

// Add 0 to 61 and save to Register A0
addi a0, zero, 61

“lui / addiw / slli” above is our expansion of “li a2, 0xc0101000”, which TCC won’t assemble…

// Load 0xC010_1000 to Register A2 (Buffer)
// But TCC won't assemble `li a2, 0xc0101000`
// So we do this...

// Load 0xC0 << 12 into Register A2 (0xC0000)
lui   a2, 0xc0

// Add 257 to Register A2 (0xC0101)
addiw a2, a2, 257

// Shift Left by 12 Bits (0xC010_1000)
slli  a2, a2, 0xc

How did we figure out that the buffer is at 0xC010_1000?

We saw this in our ELF Loader Log…

NuttShell (NSH) NuttX-12.4.0
nsh> a.out
...
Read 576 bytes from offset 512
Read 154 bytes from offset 64
1. 00000000->c0000000
Read 0 bytes from offset 224
2. 00000000->c0101000
Read 16 bytes from offset 224
3. 00000000->c0101000
4. 00000000->c0101010

Which says that the NuttX ELF Loader copied 16 bytes from our NuttX App Data Section (.data.ro) to 0xC010_1000.

That’s all 15 bytes of “Hello, World!!\n”, including the terminating null.

Thus our buffer in NuttX QEMU should be at 0xC010_1000.

(NuttX WebAssembly Emulator uses 0x8010_1000 instead)

(More about the NuttX ELF Loader)

Why do we Loop Forever?

// Omitted: Execute ECALL for System Call to NuttX Kernel
asm volatile ( ... );

// Loop Forever
for(;;) {}

That’s because NuttX Apps are not supposed to Return to NuttX Kernel.

We should call the NuttX System Call __exit to terminate peacefully.

Online Demo of Apache NuttX RTOS

Online Demo of Apache NuttX RTOS

§13 Appendix: Build NuttX for QEMU

Here are the steps to build and run NuttX for QEMU 64-bit RISC-V (Kernel Mode)

  1. Install the Build Prerequisites, skip the RISC-V Toolchain…

    “Install Prerequisites”

  2. Download the RISC-V Toolchain for riscv64-unknown-elf…

    “Download Toolchain for 64-bit RISC-V”

  3. Download and configure NuttX…

    ## Download NuttX Source Code
    mkdir nuttx
    cd nuttx
    git clone https://github.com/apache/nuttx nuttx
    git clone https://github.com/apache/nuttx-apps apps
    
    ## Configure NuttX for QEMU RISC-V 64-bit (Kernel Mode)
    cd nuttx
    tools/configure.sh rv-virt:knsh64
    make menuconfig
    

    We use Kernel Mode because it allows loading of NuttX Apps as ELF Files.

    (Instead of Statically Linking the NuttX Apps into NuttX Kernel)

  4. (Optional) To enable ELF Loader Logging, select…

    Build Setup > Debug Options > Binary Loader Debug Features:

  5. (Optional) To enable System Call Logging, select…

    Build Setup > Debug Options > SYSCALL Debug Features:

  6. Save and exit menuconfig.

  7. Build the NuttX Kernel and NuttX Apps…

    ## Build NuttX Kernel
    make -j 8
    
    ## Build NuttX Apps
    make -j 8 export
    pushd ../apps
    ./tools/mkimport.sh -z -x ../nuttx/nuttx-export-*.tar.gz
    make -j 8 import
    popd
    

This produces the NuttX ELF Image nuttx that we may boot on QEMU RISC-V Emulator…

## For macOS: Install QEMU
brew install qemu

## For Debian and Ubuntu: Install QEMU
sudo apt install qemu-system-riscv64

## Boot NuttX on QEMU 64-bit RISC-V
## Remove `-bios none` for newer versions of NuttX
qemu-system-riscv64 \
  -semihosting \
  -M virt,aclint=on \
  -cpu rv64 \
  -bios none \
  -kernel nuttx \
  -nographic

NuttX Apps are located in apps/bin.

We may copy our RISC-V ELF a.out to that folder and run it…

NuttShell (NSH) NuttX-12.4.0-RC0
nsh> a.out
Hello, World!!

POSIX Functions aren’t supported for TCC in WebAssembly

§14 Appendix: Missing Functions

Remember we said that POSIX Functions aren’t supported in WebAssembly? (Pic above)

We dump the Compiled WebAssembly of TCC Compiler, and we discover that it calls 72 POSIX Functions…

## Dump the Compiled WebAssembly
## for TCC Compiler `tcc.o`
$ sudo apt install wabt
$ wasm-objdump -x tcc.o

Import:
 - func[0] sig=1  <env.strcmp> <- env.strcmp
 - func[1] sig=12 <env.memset> <- env.memset
 - func[2] sig=1  <env.getcwd> <- env.getcwd
 ...
 - func[69] sig=2  <env.localtime> <- env.localtime
 - func[70] sig=13 <env.qsort>     <- env.qsort
 - func[71] sig=19 <env.strtoll>   <- env.strtoll

(See the Complete List)

Do we need all 72 POSIX Functions? We scrutinise the list…


Filesystem Functions

(Implemented here)

We’ll simulate these functions for WebAssembly, by embedding the simple ROM FS Filesystem into our Zig Wrapper…

getcwdremoveunlink
openfopenfdopen
closefclosefprintf
fputcfputsread
freadfwritefflush
fseekftelllseek
puts

Varargs Functions

(Implemented here)

As discussed earlier, Varargs will be tricky to implement in Zig. Probably we should do it in C.

(Similar to ziglibc)

Right now we’re doing simple Pattern Matching. But it might not be sufficient when TCC compiles Real Programs. See the updates here…

printfsnprintfsprintf
vsnprintfsscanf

String Functions

(Implemented here)

We’ll borrow the String Functions from ziglibc…

atoistrcatstrchr
strcmpstrncmpstrncpy
strrchrstrstrstrtod
strtofstrtolstrtold
strtollstrtoulstrtoull
strerror

Semaphore Functions

(Implemented here)

Not sure why TCC uses Semaphores? Maybe we’ll understand when we support #include files.

(Where can we borrow the Semaphore Functions?)

sem_initsem_postsem_wait

Standard Library

qsort isn’t used right now. Maybe for the Linker later?

(Borrow qsort from where? We can probably implement exit)

exitqsort

Time and Math Functions

Not used right now, maybe later.

(Anyone can lend us ldexp? How will we do the Time Functions? Call out to JavaScript to fetch the time?)

timegettimeofdaylocaltime
ldexp

Outstanding Functions

(Implemented here)

We have implemented (fully or partially) 48 POSIX Functions from above.

The ones that we haven’t implemented? These 24 POSIX Functions will Halt when TCC WebAssembly calls them…

atoiexitfopen
freadfseekftell
getcwdgettimeofdayldexp
localtimelseekprintf
qsortremovestrcat
strerrorstrncpystrtod
strtofstrtolstrtold
strtollstrtoultime