InkWASM: The Dark Art of Fighting Golang Inefficiencies
Compara??o de performance entre o InkWASM e o original do Go, o syscall/js.

InkWASM: The Dark Art of Fighting Golang Inefficiencies

Syscalls imply high cost, and nobody can doubt that. However, things get worse when we try to make developer's lives easier. There's no free lunch, and the CPU will pay the price. Abstractions and allocations are evil, and we already know that from the OOP disaster.

But we use some dark arts, exploiting Go internals, to get things faster (for the CPU, not for humans!). Almost two years ago, I created a code generator that improved the speed of one application by 1.6 times, and by 2 times in benchmarks. The use case heavily relies on WebGL, WebAssembly, and Go. You can test it in your browser: https://gio-bench.pages.dev.

Unfortunately, I was defeated by the removal of one magical assembly instruction that I used in Go 1.21.

Well, losing a battle doesn't mean losing the war. I declare that I'm backporting it to Go 1.22! Not only porting it, but making it even safer and easier to use, and maybe even faster.

In this post, I'll explain the goal of InkWASM and how it works behind the scenes. Of course, I'll also explain why it's faster than the native syscall/js.


This is the first, of series of posts that I'll discuss about open-source projects that I contribute/maintain and how it works internally. My goal is to help others and bring future contributors, primary who never worked if that kind of stuff before. :)


This post assumes that you know the basics of WebAssembly, Web Browsers, ABI, Meta-Programming, AST, FFI, Javascript and Go.


The WebAssembly Problem

Calling Javascript functions from WebAssembly implies a cost. This fundamentally problematic with strings since JavaScript uses UTF-16 (or WTF-16), while most other languages (Zig, Odin, Go, Rust, Swift) have moved to UTF-8.

The issue of UTF-16 is not on-topic here. However, the situation is critical. It's so bad that some folks have already proposed new extensions to WebAssembly: StringRef (https://github.com/WebAssembly/stringref). In my opinion, that extension makes things worse. The Web must move to UTF-8 instead, and then functions like Encoder will become no-ops.

However, strings are the bases of Javascript. Consider the following line:

globalThis.document.body.insertAdjacentHTML("beforebegin", "Hello World")        

That line is entirely string-based. You can't call that function without using strings. You can improve it by using JavaScript Bind and re-use objects, as shown in the line below:

let fn = HTMLElement.prototype.insertAdjacentHTML.bind(globalThis.document.body)
let ptn = "beforebegin"

fn(ptn, "Hello World")        

In this case the "fn" function and "pattern" is already defined, so it can be used by WASM. The WASM creates both objects, store it into a map and then uses numbers (to lookup "fn" and "pattern"), instead of string. The only string, is "Hello World", and if it's re-used often can also be "cached", re-using the same object of "Hello World". So, we reduce the string usage.

We will discuss it further! But the point is: someone needs to pay the price of such translation, or the price of optimisation. To make matters worse, you need to consider Arrays, Slices, ArrayLists, DynamicArrays (or whatever naming your language chooses to use). Such data needs to be copied from/to WebAssembly. If you are transferring that from WASM, you can create a view without copying, but things can also move away from you if your language has a Garbage Collector based on mark-and-compact.

The Golang Problem

Golang compiles to WASM since Go 1.11. The most common way to call functions from WebAssembly is using "syscall/js". That expose a simple API that allows you to call Javascript.

Considering the example above, that is the Go equivalent, using syscall/js:

js.Global().Get("document").Get("body").Call("insertAdjacentHTML", "beforebegin", "Hello World")        

Let's create a small benchmark, of that exactly code. However, since it's a benchmark about syscall, we can ignore the time of rendering/processing of the browser.

func Benchmark_InsertAdjacentHTML_SYSCALL(b *testing.B) {
	hijackInsertAdjacentHTML()

	b.ReportAllocs()
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		js.Global().
			Get("document").
			Get("body").
			Call("insertAdjacentHTML", "beforebegin", i)
	}
}        
func Benchmark_InsertAdjacentHTML_SYSCALL_3(b *testing.B) {
	hijackInsertAdjacentHTML()

	b.ReportAllocs()
	b.ResetTimer()

	body := js.Global().
		Get("document").
		Get("body")

	bind := js.Global().
		Get("HTMLElement").
		Get("prototype").
		Get("insertAdjacentHTML").
		Call("bind", body)

	pattern := js.Global().
		Get("String").
		New("beforebegin")

	for i := 0; i < b.N; i++ {
		bind.Invoke(pattern, i)
	}
}        

All of them allocates and have poor performance, however that makes clear how inefficient it can be.

_SYSCALL      334728              6990 ns/op              72 B/op          5 allocs/op
_SYSCALL_3   8210740               295.0 ns/op            48 B/op          2 allocs/op        

The common way of using syscall/js is very expensive. It can get really fast by re-using the same string, when possible, and using binds. Also, reusing just the "body" already drops to "2675 ns/op". I omit that test, to make this posts smaller and with less code.

But, I'm not satisfied with 295 ns/op, also why it still allocating? Well, we need to investigate how the function works internally.


Allocs, allocs, allocs everywhere!

One of the major issues comes from "makeArgs", that Golang uses:

func makeArgs(args []any) ([]Value, []ref) {
    argVals := make([]Value, len(args))
    argRefs := make([]ref, len(args))
    for i, arg := range args {
       v := ValueOf(arg)
       argVals[i] = v
       argRefs[i] = v.ref
    }
    return argVals, argRefs
}        

So, it always creates new slices on heap, without any optimisation. If you provide two values, it will create two slices with two values. The "ValueOf" also hides some issues, it's a long switch-case:

func ValueOf(x any) Value {
    switch x := x.(type) {
    // ...

    case string:
       return makeValue(stringVal(x))
    case []any:
       a := arrayConstructor.New(len(x))
       for i, s := range x {
          a.SetIndex(i, s)
       }
       return a
 
    // ...
    }
}        

If it's a string or slices, then we have issues. If it's string, it will create a String object (the Javascript-land object) and then store the in previously created slice. In case of slices, then it will copy, value-by-value.

Go, don't go away!

One issue is that Go calls Javascript too often. You can notice that one single call makes multiple calls to Javascript. So, if you Call("Something"), you are actually calling Javascript twice and allocating more data on both sides, the end you have the following situation:

Interaction between Golang and Javascript, to call one function containing one string as argument.


I don't think it's necessary, and we can improve that, if we know the type of each function ahead-of-time. Also, even if we didn't know the type, we could just check the type on Javascript side, in a single call. That prevents new allocations and also additional calls.

Well, one point that Golang does well is garbaging-collecting all the resources, instead of potentially leaking it. I'm talking about you: .NET!


Copy, why copy?

If you use WebAssembly with WebGL, then you need to transfer bytes, for texture, shaders and anything in between. You need to... copy?! Copy, every-single-frame?

That is the current state of Go. It provides options to "CopyBytesToJS" and "CopyBytesToGo". This last one is tricky, and we can't do much to improve it, but the first one we can avoid.


The InkWasm as an Antidote

It's quite obvious what we need to do, we need to avoid allocations, avoid necessary calls and, if possible, make it simple to use. So, here I stand, healing what we thought would forever be broken.


Using InkWASM, you can simply use comments! That will interpreted by InkWASM generator. Also, InkWASM uses InkWASM itself! Considering the previous example, you can write the same function, in InkWASM, as such:

func Benchmark_InsertAdjacentHTML_INKWASM(b *testing.B) {
    hijackInsertAdjacentHTML()

    b.ReportAllocs()
    b.ResetTimer()

    pattern, _ := Global().Get("String").New("beforebegin")
    defer pattern.Free()

    for i := 0; i < b.N; i++ {
       gen_InsertAdjacentHTML(pattern, i)
    }
}

//inkwasm:func globalThis.document.body.insertAdjacentHTML
func gen_InsertAdjacentHTML(Object, int)        

Notice something different where? Well, take a look:

//inkwasm:func globalThis.document.body.insertAdjacentHTML
func gen_InsertAdjacentHTML(Object, int)        

That function don't have body, but describes what it wants to do. In that case, it's a function that calls "globalThis.document.body.insertAdjacentHTML" with one "Object" and one "int".

That drastically improves the performance, it's now only 39.82 ns/op and with 0 allocs/op. Yes, zero allocations!


How it works?

First, we need to read your source-code and identify comments and functions declarations. So, guess what??We have a small parser, that relies on Go AST. It's used to identify the comment, the function and the signature of the function.


Parser, boring Parser.

I'll not waste so much time explaning the parser, since it's just reads the AST from Golang. AST is just boring.


Consider this code, which is similar to the previous one:

//inkwasm:func .insertAdjacentHTML
func gen_InsertAdjacentHTML(Object, Object, int)        

  • We need to know "hint type": it can be either "inkwasm:func", "inkwasm:get", "inkwasm:set" and "inkwasm:new".
  • We need to know all arguments type, their sizes, paddings: in that case we have two values with 16 bytes and one with 8 byte, so it's already aligned by luck.
  • We need to know all returning values, we don't have any. It's possible to have up to two values, the last one must be one boolean.
  • We need to know the Javascript function signature, in that case it starts with one ".", which will call: "ObjectFromParameter1.insertAdjacentHTML". If the function starts with "." it also must start with one Object.


In order to parse, we need to identify each kind of high-level token, in case we have something like that, oversimplified:

func (p *Parser) ParseFile(...) (...) {
    astutil.Apply(file, nil, func(c *astutil.Cursor) bool {
       n := c.Node()
       switch x := n.(type) {
       case *ast.Comment:
          p.parseComment(x.Text)
       case *ast.FuncDecl:
           p.parseFunction()
       case *ast.TypeSpec:
          info.FunctionGolang.Name = x.Name.Name
       case *ast.StructType:
          p.parseStruct(pkg, x)
       default:
       }
       return true
    })

    return b, err
}        

When we identify one Comment, we check if it's a "inkwasm", or not. If it's then we need to parse the next function declaration.

The full-source: https://github.com/inkeliz/go_inkwasm/blob/master/parser


The Assembly

Declaring one function without body requires something to be called. In that case we need assembly code, or Plan-9 Golang Assembly. The generator will generate one assembly for each function:

TEXT ·gen_InsertAdjacentHTML(SB), NOSPLIT, $0
    JMP ·_gen_InsertAdjacentHTML(SB)
    RET        

The JMP contains a really annoying symbol, which is "·", and don't ask me why Go uses it.

The idea here is to jump to somewhere, but where? Well, the generator needs to create new functions!

Note from the past: Previously, we also generate another assembly code, which would be something like:

TEXT ·__gen_InsertAdjacentHTML(SB), NOSPLIT, $0
	CallImport
	RET        

That CallImport calls the imported function, but that got removed in Go 1.21.


The Go-Stub function:

The assembly will just jump, but to here? Well, this function is also generated on Golang-side, and it's responsible to translate Javascript-Objects to string and slices, if applicable. In the example, that is the generated code:

func _gen_InsertAdjacentHTML(p0 Object, p1 Object, p2 int) {
	__gen_InsertAdjacentHTML(p0, p1, p2)

}

//go:wasmimport gojs github.com/path/to.__gen_InsertAdjacentHTML
func __gen_InsertAdjacentHTML(p0 Object, p1 Object, p2 int)        

In that case, it will just call another function. That function uses the new "wasmimport" directive/annotation. That makes possible to import function from "github.com/path/to.__gen_InsertAdjacentHTML".

The "wasmimport" requires two values:

  • Module name: But, it contains one hidden-gem, the "gojs". I will explain it later.
  • Function name: which can be anything, but I use the full path of the file and function, since it's known to be unique. This information will be public, so if you care about obfuscation, I'm sorry. We could also hash it, and that might be possible in the future.


The Import:

I think anyone knows the basics of WASM. But, keep it simple, WASM uses Import/Export to communicate with the host (Javascript, in that case). So, we also generate the Javascript code, that code is quite... strange.

Object.assign(go.importObject.gojs, {

"github.com/path/to.__gen_InsertAdjacentHTML.__gen_InsertAdjacentHTML": (sp) => {
    	globalThis.inkwasm.Load.InkwasmObject(go, sp, 8).insertAdjacentHTML(globalThis.inkwasm.Load.InkwasmObject(go, sp, 24), globalThis.inkwasm.Load.Int(go, sp, 40))

},
}        

I think almost no one understands anything. So, here is best part.


When you use "wasmimport" you can implicit opt-in between two ABIs:


So, let's start re-translating the code to humans:

"github.com/path/to.__gen_InsertAdjacentHTML.__gen_InsertAdjacentHTML": (sp) => {
globalThis.inkwasm.Load.InkwasmObject(go, sp, 8) . insertAdjacentHTML(...)
}        

The function will receive the "sp" from Go, which is the stack-pointer. The "insertAdjacentHTML" was specified by the comment, as we mention before and the first parameter is one Object. So, how can we take the Object?

The "Object" is just a single struct, defined on InkWASM module, which takes 16 bytes:

// Object represents one Javascript Object
type Object struct {
    _ [0]func() // not comparable
    // value holds the value of the js-object
    // if typ == Object it's the value of the index
    value     [8]byte
    typ       ObjectType // type of js object
    protected bool       // protected prevents been released
    _         [2]uint8   // padding, reserved
    len       uint32     // len for array/string
}        

So, if we know the stack-pointer of the function we can read the Object struct, in the Javascript-side! The Object contains two important fields:

  • Typ: The type is a single-byte on the offset 8. It allows to identify what kind of Object it's here. That changes the meaning of the "Value" field. That is basic Data-Oriented Design to avoid useless OOP abstraction and interfaces here.
  • Value: The value can be either the value the numeric value, if Typ is Number, or can be one boolean (1/0), if it's Boolean. It can be ignored if Typ is Undefined or Null. However, if it's Symbol or Object, or anything else: the Value is the index of such Javascript Object, stored in "Objects" array on Javascript.


This is the code which is called as "globalThis.inkwasm.Load.InkwasmObject":

InkwasmObject: function (go, sp, offset) {
    switch (globalThis.inkwasm.Load.Uint8(go, sp, offset + 8)) {
        case ObjectTypes.TypeUndefined:
            return undefined
        case ObjectTypes.TypeNull:
            return null
        case ObjectTypes.TypeBoolean:
            return globalThis.inkwasm.Load.Uint8(go, sp, offset) !== 0
        case ObjectTypes.TypeNumber:
            return globalThis.inkwasm.Load.Int(go, sp, offset)
        default:
            return Objects[globalThis.inkwasm.Load.Int(go, sp, offset)]
    }
}        

It will verify the type and then re-creates (or retrieve) the object. If it's a real Javascript Object, it will get the reference from a pool.

But, what "globalThis.inkwasm.Load.Uint8" and "globalThis.inkwasm.Load.Int" is doing? It simply reading the Golang memory, from the Stack-Pointer that we give:

Uint8: function (go, sp, offset) {
    return go.mem.getUint8(sp + offset)
},
 Int: function (go, sp, offset) {
            return go.mem.getUint32(sp + offset, true) + go.mem.getInt32(sp + offset + 4, true) * 4294967296;
},        

Considering the function:

__gen_InsertAdjacentHTML(p0, p1, p2)        

You have three parameters/arguments:

  • P0: 16 Bytes, 8 Offset
  • P1: 16 Bytes, 8 + 16 Offset
  • P2: 8 Bytes, 8 + 16 + 16 Offset


The reason why P0 is from 8 and not from 0 is something related to Golang's internals, that is described on https://github.com/teh-cmc/go-internals/blob/master/chapter1_assembly_primer/README.md.


P0 is on the offset 8. If the "SP" is 1024, then the P0 is on 1032, so:

globalThis.inkwasm.Load.InkwasmObject(go, 1024, 8) . insertAdjacentHTML(...)        

That will cause the switch to read from:

 switch (globalThis.inkwasm.Load.Uint8(go, 1024, 8 + 8)) {
    default:
         return Objects[globalThis.inkwasm.Load.Int(go, 1024, 8)]
}        

The "globalThis.inkwasm.Load.Uint8(go, 1024, 8 + 8)" will read the Typ, which is 8 bytes away from the start of the struct. Meanwhile, the "default" clause will take the "Value" from the start of the struct, which is the first field.

You can repeat the same process of all argument. In some case, it requires some alignment. The result (if any) is stored after the inputs.


This also allow the InkWASM on Javascript to read Arrays and String, without need to travel multiple times. So, assuming you have:

func TestAlert(t *testing.T) {
    alert("Hello, 世界")
}

//inkwasm:func alert
func alert(s string)        

That will generate:

"github.com/path/to.__alert": (sp) => {
    alert(globalThis.inkwasm.Load.String(go, sp, 8))

},        

The globalThis.inkwasm.Load.String will then process the string. The Golang string consists of two uintptr:

  • Pointer: The first value of the String is the pointer to the text, intended to be UTF-8, on Offset 0.
  • Size: The second value of the String is the size, on Offset 8 (in case of WASM, and others 64bits machines).

That is what we do in the Javascript:

String: function (go, sp, offset) {
    return StringDecoder.decode(new DataView(go._inst.exports.mem.buffer, globalThis.inkwasm.Load.UintPtr(go, sp, offset), globalThis.inkwasm.Load.Int(go, sp, offset + 8)));
},        

In the end we can use string directly, instead of translating it, storing and then sending it back. We still paying the price of UTF-8 to UTF-16, but that is unavoidable.


The same happens with []byte! You don't need to copy bytes to JS. It will create a view, from go._inst.exports.mem.buffer.


What If... GOJS ABI get removed?!

First time here, uh? I already had issues when CallImport got removed. But, now, I already have a solution. We can simply store the SP and change the generated code a little bit, something similar to:

TEXT ·getBasicDecoder(SB), NOSPLIT, $0
    Get SP
    Get SP
    I32Store
    CALL ·_getBasicDecoder(SB)
    RET        

We also export using a single value:

//go:wasmimport gojs github.com/inkeliz/go_inkwasm/inkwasm.__getBasicDecoder
func __getBasicDecoder(sp int)        

Then, we just need to consider the additional space:

sp = (go._inst.exports.getsp() >>> 0) + 8        

The go._inst.exports.getsp() is used because the stack can grow or move. So, we need to get a new SP. In that case, we need to sum the SP with 8, which is not what we current do.

Another option is to use Call instead of CALL. The "Call" is the WASM one, and "CALL" is the Golang, which creates a new stack-ish. But, I had hard time trying to use the GlobalSet. That might be possible, and might be faster, since uses the same stack.


The bootstrap!

Since InkWasm is capable of creating the Javascript bridge, we can use that to create runtime functions.

Let's consider the following code:

func Benchmark_InsertAdjacentHTML_INKWASM_RUNTIME(b *testing.B) {
	hijackInsertAdjacentHTML()

	b.ReportAllocs()
	b.ResetTimer()

	pattern, _ := Global().Get("String").New("beforebegin")
	defer pattern.Free()

	body := Global().Get("document").Get("body")
	defer body.Free()

	fn, _ := Global().Get("HTMLElement").Get("prototype").Get("insertAdjacentHTML").Call("bind", body)
	defer fn.Free()

	for i := 0; i < b.N; i++ {
		fn.Invoke(pattern, i)
	}
}        

That is very similar to syscall/js, it doesn't use any InkWASM generator directly, and you need to release any resource manually. However, each function uses InkWASM behind the scenes.


When you use "fn.Invoke", you are calling:

//inkwasm:func globalThis.inkwasm.Internal.Invoke
//go:noescape
func invoke(o Object, args []interface{}) (Object, bool)        

That code will be generated, similar to any other code mentioned before. That is why InkWASM uses InkWASM itself. I really like the concept of bootstrapping.


In the last update, I also introduce the possibility of use "[]interface{}" as argument, and any translation will happen on Javascript. In the end, it will be 0alloc/op. Also, it's faster than syscall/js counterpart, and took 184.5 ns/op.


How that works, in my next chapter...

Ending

I'm happy to update it to Go 1.22 and introduce support for "interface{}", without using the older type-replacement feature. It was easier than I thought, the issue is how badly documented are those features, reminds me of PHP. But, we are using unsafe and experimental stuff.

I promises that, my next post will explain how InkWASM works on runtime, and how it still faster than syscall/js and without additional allocations, without using pre-compiled functions. Yes, InkWASM have less safety features and also don't use GC, that is trade-off.

要查看或添加评论,请登录

Lucas R.的更多文章

社区洞察

其他会员也浏览了