WebGL 与 WebGPU 比对[1] 前奏(Webgl and webgpu comparison [1] Prelude)

  • 1 为什么是 WebGPU 而不是 WebGL 3.0显卡驱动图形 API 的简单年表WebGL 能运行在各个浏览器的原因WebGPU 的名称由来
  • 显卡驱动
  • 图形 API 的简单年表
  • WebGL 能运行在各个浏览器的原因
  • WebGPU 的名称由来
  • 2 与 WebGL 比较编码风格OpenGL 的编码风格CPU 负载问题WebGPU 的装配式编码风格厨子戏法
  • OpenGL 的编码风格
  • CPU 负载问题
  • WebGPU 的装配式编码风格
  • 厨子戏法
  • 3 多线程与强大的通用计算(GPGPU)能力WebWorker 多线程通用计算(GPGPU)
  • WebWorker 多线程
  • 通用计算(GPGPU)
  • 4 浏览器的实现
  • 5 未来参考资料
  • 参考资料

这篇讲讲历史,不太适合直奔主题的朋友们。

1 为什么是 WebGPU 而不是 WebGL 3.0

你若往 Web 图形技术的底层去深究,一定能追溯到上个世纪 90 年代提出的 OpenGL 技术,也一定能看到,WebGL 就是基于 OpenGL ES 做出来的这些信息。OpenGL 在那个显卡羸弱的年代发挥了它应有的价值。

显卡驱动

我们都知道现在的显卡都要安装显卡驱动程序,通过显卡驱动程序暴露的 API,我们就可以操作 GPU 完成图形处理器的操作。

问题就是,显卡驱动和普通编程界的汇编一样,底层,不好写,于是各大厂就做了封装 —— 码界的基操。

图形 API 的简单年表

OpenGL 就是干这个的,负责上层接口封装并与下层显卡驱动打交道,但是,众所周知,它的设计风格已经跟不上现代 GPU 的特性了。

Microsoft 为此做出来最新的图形API 是 Direct3D 12,Apple 为此做出来最新的图形API 是 Metal,有一个著名的组织则做出来 Vulkan,这个组织名叫 Khronos。D3D12 现在在发光发热的地方是 Windows 和 PlayStation,Metal 则是 Mac 和 iPhone,Vulkan 你可能在安卓手机评测上见得多。这三个图形 API 被称作三大现代图形API,与现代显卡(无论是PC还是移动设备)的联系很密切。

WebGL 能运行在各个浏览器的原因

噢,忘了一提,OpenGL 在 2006 年把丢给了 Khronos 管,现在各个操作系统基本都没怎么装这个很老的图形驱动了。

那问题来了,基于 OpenGL ES 的 WebGL 为什么能跑在各个操作系统的浏览器?

因为 WebGL 再往下已经可以不是 OpenGL ES 了,在 Windows 上现在是通过 D3D 转译到显卡驱动的,在 macOS 则是 Metal,只不过时间越接近现在,这种非亲儿子式的实现就越发困难。

苹果的 Safari 浏览器最近几年才珊珊支持 WebGL 2.0,而且已经放弃了 OpenGL ES 中 GPGPU 的特性了,或许看不到 WebGL 2.0 的 GPGPU 在 Safari 上实现了,果子哥现在正忙着 Metal 和更伟大的 M 系列自研芯片呢。

WebGPU 的名称由来

所以,综上所述,下一代的 Web 图形接口不叫 WebGL 3.0 的原因,你清楚了吗?已经不是 GL 一脉的了,为了使现代巨头在名称上不打架,所以采用了更贴近硬件名称的 WebGPU,WebGPU 从根源上和 WebGL 就不是一个时代的,无论是编码风格还是性能表现上。

题外话,OpenGL 并不是没有学习的价值,反而它还会存在一段时间,WebGL 也一样。

2 与 WebGL 比较编码风格

WebGL 实际上可以说是 OpenGL 的影子,OpenGL 的风格对 WebGL 的风格影响巨大。

学习过 WebGL 接口的朋友都知道一个东西: 变量,准确的说是 对象,WebGL 2.0 则是 .

gl
WebGLRenderingContext
WebGLRenderingContext2

OpenGL 的编码风格

无论是操作着色器,还是操作 VBO,亦或者是创建一些 Buffer、Texture 对象,基本上都得通过 gl 变量一条一条函数地走过程,顺序是非常讲究的,例如,下面是创建两大着色器并执行编译、连接的代码:

const vertexShaderCode = `
attribute vec4 a_position;
void main() {
	gl_Position = a_position;
}
`

const fragmentShaderCode = `
precision mediump float;
void main() {
  gl_FragColor = vec4(1, 0, 0.5, 1);
}
`

const vertexShader = gl.createShader(gl.VERTEX_SHADER)
gl.shaderSource(vertexShader, vertexShaderCode)
gl.compileShader(vertexShader)
const fragmentShader = gl.createShader(gl.FRAGMENT_SHADER)
gl.shaderSource(fragmentShader, fragmentShaderCode)
gl.compileShader(fragmentShader)

const program = gl.createProgram()
gl.attachShader(program, vertexShader)
gl.attachShader(program, fragmentShader)
gl.linkProgram(program)

// 还需要显式指定你需要用哪个 program
gl.useProgram(program)
// 继续操作顶点数据并触发绘制
// ...

创建着色器、赋予着色器代码并编译的三行 js WebGL 调用,可以说是必须这么写了,顶多 vs 和 fs 的创建编译顺序可以换一下,但是又必须在 program 之前完成这些操作。

CPU 负载问题

有人说这无所谓,可以封装成 JavaScript 函数,隐藏这些过程细节,只需传递参数即可。是,这是一个不错的封装,很多 js 库都做过,并且都很实用。

但是,这仍然有难以逾越的鸿沟 —— 那就是 OpenGL 本身的问题。

每一次调用 时,都会完成 CPU 到 GPU 的信号传递,改变 GPU 的状态,是立即生效的。熟悉计算机基础的朋友应该知道,计算机内部的时间和硬件之间的距离有多么重要,世人花了几十年时间无不为信号传递付出了努力,上述任意一条 gl 函数改变 GPU 状态的过程,大致要走完 CPU ~ 总线 ~ GPU 这么长一段距离。

gl.xxx

我们都知道,办事肯定是一次性备齐材料的好,不要来来回回跑那么多遍,而 OpenGL 就是这样子的。有人说为什么要这样而不是改成一次发送的样子?历史原因,OpenGL 盛行那会儿 GPU 的工作没那么复杂,也就不需要那么超前的设计。

综上所述,WebGL 是存在 CPU 负载隐患的,是由于 OpenGL 这个状态机制决定的。

现代三大图形API 可不是这样,它们更倾向于先把东西准备好,最后提交给 GPU 的就是一个完整的设计图纸和缓冲数据,GPU 只需要拿着就可以专注办事。

WebGPU 的装配式编码风格

WebGPU 虽然也有一个总管家一样的对象 —— device,类型是 ,表示可以操作 GPU 设备的一个高层级抽象,它负责创建操作图形运算的各个对象,最后装配成一个叫 “CommandBuffer(指令缓冲,GPUCommandBuffer)”的对象并提交给队列,这才完成 CPU 这边的劳动。

GPUDevice

所以,device.createXXX 创建过程中的对象时,并不会像 WebGL 一样立即通知 GPU 完成状态的改变,而是在 CPU 端写的代码就从逻辑、类型上确保了待会传递给 GPU 的东西是准确的,并让他们按自己的坑站好位,随时等待提交给 GPU。

在这里,指令缓冲对象具备了完整的数据资料(几何、纹理、着色器、管线调度逻辑等),GPU 一拿到就知道该干什么。

// 在异步函数中
const device = await adapter.requestDevice()
const buffer = device.createBuffer({
  /* 装配几何,传递内存中的数据,最终成为 vertexAttribute 和 uniform 等资源 */
})
const texture = device.createTexture({
  /* 装配纹理和采样信息 */
})

const pipelineLayout = device.createPipelineLayout({
  /* 创建管线布局,传递绑定组布局对象 */
})

/* 创建着色器模块 */
const vertexShaderModule = device.createShaderModule({ /* ... */ })
const fragmentShaderModule = device.createShaderModule({ /* ... */ })

/*
计算着色器可能用到的着色器模块
const computeShaderModule = device.createShaderModule({ /* ... * / })
*/

const bindGroupLayout = device.createBindGroupLayout({
  /* 创建绑定组的布局对象 */
})

const pipelineLayout = device.createPipelineLayout({
  /* 传递绑定组布局对象 */
})

/*
上面两个布局对象其实可以偷懒不创建,绑定组虽然需要绑定组布局以
通知对应管线阶段绑定组的资源长啥样,但是绑定组布局是可以由
管线对象通过可编程阶段的代码自己推断出来绑定组布局对象的
本示例代码保存了完整的过程
*/

const pipeline = device.createRenderPipeline({
  /* 
  创建管线
  指定管线各个阶段所需的素材
  其中有三个阶段可以传递着色器以实现可编程,即顶点、片段、计算 
  每个阶段还可以指定其所需要的数据、信息,例如 buffer 等
  
  除此之外,管线还需要一个管线的布局对象,其内置的绑定组布局对象可以
  让着色器知晓之后在通道中使用的绑定组资源是啥样子的
  */
})

const bindGroup_0 = deivce.createBindGroup({
  /* 
  资源打组,将 buffer 和 texture 归到逻辑上的分组中,
  方便各个过程调用,过程即管线,
  此处必须传递绑定组布局对象,可以从管线中推断获取,也可以直接传递绑定组布局对象本身
  */
})

const commandEncoder = device.createCommandEncoder() // 创建指令缓冲编码器对象
const renderPassEncoder = commandEncoder.beginRenderPass() // 启动一个渲染通道编码器
// 也可以启动一个计算通道
// const computePassEncoder = commandEncoder.beginComputePass({ /* ... */ }) 

/*
以渲染通道为例,使用 renderPassEncoder 完成这个通道内要做什么的顺序设置,例如
*/

// 第一道绘制,设置管线0、绑定组0、绑定组1、vbo,并触发绘制
renderPassEncoder.setPipeline(renderPipeline_0)
renderPassEncoder.setBindGroup(0, bindGroup_0)
renderPassEncoder.setBindGroup(1, bindGroup_1)
renderPassEncoder.setVertexBuffer(0, vbo, 0, size)
renderPassEncoder.draw(vertexCount)

// 第二道绘制,设置管线1、另一个绑定组并触发绘制
renderPassEncoder.setPipeline(renderPipeline_1)
renderPassEncoder.setBindGroup(1, another_bindGroup)
renderPassEncoder.draw(vertexCount)

// 结束通道编码
renderPassEncoder.endPass()

// 最后提交至 queue,也即 commandEncoder 调用 finish 完成编码,返回一个指令缓冲
device.queue.submit([
  commandEncoder.finish()
])

上述过程是 WebGPU 的一般化代码,很粗糙,没什么细节,不过基本上就是这么个逻辑。

对通道编码器的那部分代码,笔者保留的比较完整,让读者更好观察一个指令编码器是如何编码通道,并最后结束编码创建指令缓冲提交给队列的。

厨子戏法

用做菜来比喻,OpenGL 系的编程就好比做一道菜时需要什么调料就去拿什么调料,做好一道菜再继续做下一道菜;而现代图形API 则是多灶台开火,所有材料都在合适的位置上,包括处理好的食材和辅料,即使一个厨师(CPU)都可以同时做好几道菜,效率很高。

3 多线程与强大的通用计算(GPGPU)能力

WebWorker 多线程

WebGL 的总管家对象是 gl 变量,它必须依赖 HTML Canvas 元素,也就是说必须由主线程获取,也只能在主线程调度 GPU 状态,WebWorker 技术的多线程能力只能处理数据,比较鸡肋。

WebGPU 改变了总管家对象的获取方式,adapter 对象所依赖的 对象在 WebWorker 中也可以访问,所以在 Worker 中也可以创建 device,也可以装配出指令缓冲,从而实现多线程提交指令缓冲,实现 CPU 端多线程调度 GPU 的能力。

navigator.gpu

通用计算(GPGPU)

如果说 WebWorker 是 CPU 端的多线程,那么 GPU 本身的多线程也要用上。

能实现这一点的,是一个叫做“计算着色器”的东西,它是可编程管线中的一个可编程阶段,在 OpenGL 中可谓是姗姗来迟(因为早期的显卡并没挖掘其并行通用计算的能力),更别说 WebGL 到了 2.0 才支持了,苹果老兄甚至压根就懒得给 WebGL 2.0 实现这个特性。

WebGPU 出厂就带这玩意儿,通过计算着色器,使用 GPU 中 CU(Compute Unit,计算单元)旁边的共享内存,速度比普通的显存速度快得多。

有关计算着色器的资料不是特别多,目前只能看例子,在参考资料中也附带了一篇博客。

将 GPGPU 带入 Web 端后,脚本语言的运行时(deno、浏览器JavaScript,甚至未来的 nodejs 也有可能支持 WebGPU)就可以访问 GPU 的强大并行计算能力,据说 tensorflow.js 改用 WebGPU 作为后置技术后性能有极为显著的提升,对深度学习等领域有极大帮助,即使用户的浏览器没那么新潮,渲染编程还没那么快换掉 WebGL,WebGPU 的通用计算能力也可以在别的领域发光发热,更别说计算着色器在渲染中也是可以用的。

真是诱人啊!

4 浏览器的实现

Edge 和 Chrome 截至发文,在金丝雀版本均可以通过 flag 打开试用。

Edge 和 Chrome 均使用了 Chromium 核心,Chromium 是通过 Dawn 这个模块实现的 WebGPU API,根据有关资料,Dawn 中的 DawnNative 部分负责与三大图形 API 沟通,向上则给一个叫 DawnWire 的模块传递信息,DawnWire 模块则负责与 JavaScript API 沟通,也就是你写的 WebGPU 代码。WGSL 也是这个部分实现的。Dawn 是 C++ 实现的,你可以在参考资料中找到连接。

FireFox 则使用了 gfx-rs 项目实现 WebGPU,显然是 Rust 语言实现的 WebGPU,也有与 Dawn 类似的模块设计。

Safari 则更新自家的 WebKit 实现 WebGPU。

5 未来

展望宏图之类的话不说,但是随着红绿蓝三家的 GPU 技术越发精湛,加上各个移动端的 GPU 逐渐起色,三大现代图形API肯定还在发展,WebGPU 一定能在 Web 端释放现代图形处理器(GPU)的强大能力,无论是图形游戏,亦或是通用并行计算带来的机器学习、AI能力。

参考资料

  • Google Dawn Page
  • gfx-rs GitHub Home Page
  • Get started with GPU Compute on the web
————————
  • 1 Why is it webgpu instead of the simple chronology of webgl 3.0 graphics card driven graphics API? Why can webgl run in various browsers? Why is the name of webgpu
  • Graphics card driver
  • Simple chronology of graphical API
  • Why webgl can run in various browsers
  • WebGPU 的名称由来
  • 2 compared with webgl coding style OpenGL coding style CPU load webgpu assembly coding style cook trick
  • OpenGL 的编码风格
  • CPU load problem
  • Assembly coding style of webgpu
  • Cook trick
  • 3 multithreading and powerful general purpose computing (GPGPU) capability webworker multithreaded general purpose computing (GPGPU)
  • WebWorker 多线程
  • General purpose computing (GPU)
  • 4. Implementation of browser
  • 5 future references
  • reference material

This talk about history is not suitable for friends who go straight to the subject.

1 为什么是 WebGPU 而不是 WebGL 3.0

If you go deep into the bottom of web graphics technology, you can certainly trace back to OpenGL technology proposed in the 1990s, and you can see that webgl is based on OpenGL es. OpenGL played its due value in the age of weak graphics cards.

Graphics card driver

We all know that the current graphics cards need to install the graphics card driver. Through the API exposed by the graphics card driver, we can operate the GPU to complete the operation of the graphics processor.

The problem is that the graphics card driver is the same as the assembly in the ordinary programming world. It is not easy to write at the bottom. Therefore, major manufacturers have done the packaging – the basic operation of the code world.

Simple chronology of graphical API

OpenGL does this. It is responsible for packaging the upper interface and dealing with the lower graphics card driver. However, as we all know, its design style can’t keep up with the characteristics of modern GPU.

Microsoft’s latest graphics API is Direct3D 12. Apple’s latest graphics API is metal. A famous organization, Vulkan, is called Khronos. D3d12 is now glowing in windows and Playstation, while metal is Mac and iPhone. Vulkan, you may see a lot in Android phone evaluation. These three graphics APIs are called the three modern graphics APIs, which are closely related to modern graphics cards (whether PC or mobile devices).

Why webgl can run in various browsers

Oh, I forgot to mention that OpenGL lost the to the Khronos tube in 2006. Now each operating system basically doesn’t install this very old graphics driver.

The question is, why can webgl based on OpenGL es run in browsers of various operating systems?

Because webgl can no longer be OpenGL es. On windows, it is now translated to the graphics card driver through D3D, and on MacOS, it is metal. However, the closer the time is, the more difficult it is to realize this kind of non son like implementation.

Apple’s Safari browser only supports webgl 2.0 in recent years, and has given up the GPGPU feature in OpenGL es. Maybe we can’t see the GPGPU of webgl 2.0 implemented on safari. Guoge is now busy with metal and greater M-series self-developed chips.

WebGPU 的名称由来

So, to sum up, do you know why the next generation of web graphics interface is not called webgl 3.0? It is no longer the same as GL. In order to prevent modern giants from fighting over names, webgpu, which is closer to the hardware name, is adopted. Webgpu is not the same era as webgl from the root, whether in terms of coding style or performance.

Aside, OpenGL is not without learning value. On the contrary, it will exist for some time, and so will webgl.

2 compare the coding style with webgl

Webgl is actually the shadow of OpenGL. The style of OpenGL has a great impact on the style of webgl.

Friends who have studied webgl interface know one thing: variables, to be exact, are objects, while webgl 2.0 is

gl
WebGLRenderingContext
WebGLRenderingContext2

OpenGL 的编码风格

Whether you operate shaders, VBO, or create buffer and texture objects, you basically have to go through the process one by one through GL variables. The order is very particular. For example, the following is the code for creating two shaders and executing compilation and connection:

const vertexShaderCode = `
attribute vec4 a_position;
void main() {
	gl_Position = a_position;
}
`

const fragmentShaderCode = `
precision mediump float;
void main() {
  gl_FragColor = vec4(1, 0, 0.5, 1);
}
`

const vertexShader = gl.createShader(gl.VERTEX_SHADER)
gl.shaderSource(vertexShader, vertexShaderCode)
gl.compileShader(vertexShader)
const fragmentShader = gl.createShader(gl.FRAGMENT_SHADER)
gl.shaderSource(fragmentShader, fragmentShaderCode)
gl.compileShader(fragmentShader)

const program = gl.createProgram()
gl.attachShader(program, vertexShader)
gl.attachShader(program, fragmentShader)
gl.linkProgram(program)

// 还需要显式指定你需要用哪个 program
gl.useProgram(program)
// 继续操作顶点数据并触发绘制
// ...

It can be said that the three lines of JS webgl calls that create shaders, give shader code and compile must be written like this. At most, the creation and compilation order of VS and FS can be changed, but these operations must be completed before the program.

CPU load problem

Some people say it doesn’t matter. It can be encapsulated into JavaScript functions to hide these process details. Just pass parameters. Yes, this is a good package. Many JS libraries have done it, and they are very practical.

However, there is still an insurmountable gap – that is the problem of OpenGL itself.

Each call will complete the signal transmission from CPU to GPU and change the state of GPU, which takes effect immediately. Friends who are familiar with the basics of computer should know how important the distance between the time inside the computer and the hardware is. The world has spent decades making efforts for signal transmission. The process of changing the GPU state by any of the above GL functions roughly takes a long distance from CPU to main line to GPU.

gl.xxx

As we all know, it must be good to prepare materials at one time. Don’t run back and forth so many times, and OpenGL is like this. Some people say why do you want this instead of sending it once? For historical reasons, when OpenGL was popular, the work of GPU was not so complex, so there was no need for so advanced design.

To sum up, webgl has a hidden danger of CPU load, which is determined by the state mechanism of OpenGL.

The three modern graphics APIs are not like this. They prefer to prepare things first, and finally submit a complete design drawing and buffered data to GPU. GPU can focus on things just by holding it.

Assembly coding style of webgpu

Although webgpu also has an object like the general manager – device, the type is a high-level abstraction that can operate GPU devices. It is responsible for creating various objects that operate graphic operations, and finally assembling them into an object called “command buffer (GPU command buffer)” and submitting them to the queue, so as to complete the work on the CPU side.

GPUDevice

So, device When creating objects in the process of creating, createxxx does not immediately notify the GPU of the change of completion status like webgl, but the code written on the CPU ensures that the things to be passed to the GPU are accurate logically and type, and allows them to stand in their own position according to their own pit and wait for submission to the GPU at any time.

Here, the instruction buffer object has complete data (geometry, texture, shader, pipeline scheduling logic, etc.), and the GPU knows what to do as soon as it gets it.

// 在异步函数中
const device = await adapter.requestDevice()
const buffer = device.createBuffer({
  /* 装配几何,传递内存中的数据,最终成为 vertexAttribute 和 uniform 等资源 */
})
const texture = device.createTexture({
  /* 装配纹理和采样信息 */
})

const pipelineLayout = device.createPipelineLayout({
  /* 创建管线布局,传递绑定组布局对象 */
})

/* 创建着色器模块 */
const vertexShaderModule = device.createShaderModule({ /* ... */ })
const fragmentShaderModule = device.createShaderModule({ /* ... */ })

/*
计算着色器可能用到的着色器模块
const computeShaderModule = device.createShaderModule({ /* ... * / })
*/

const bindGroupLayout = device.createBindGroupLayout({
  /* 创建绑定组的布局对象 */
})

const pipelineLayout = device.createPipelineLayout({
  /* 传递绑定组布局对象 */
})

/*
上面两个布局对象其实可以偷懒不创建,绑定组虽然需要绑定组布局以
通知对应管线阶段绑定组的资源长啥样,但是绑定组布局是可以由
管线对象通过可编程阶段的代码自己推断出来绑定组布局对象的
本示例代码保存了完整的过程
*/

const pipeline = device.createRenderPipeline({
  /* 
  创建管线
  指定管线各个阶段所需的素材
  其中有三个阶段可以传递着色器以实现可编程,即顶点、片段、计算 
  每个阶段还可以指定其所需要的数据、信息,例如 buffer 等
  
  除此之外,管线还需要一个管线的布局对象,其内置的绑定组布局对象可以
  让着色器知晓之后在通道中使用的绑定组资源是啥样子的
  */
})

const bindGroup_0 = deivce.createBindGroup({
  /* 
  资源打组,将 buffer 和 texture 归到逻辑上的分组中,
  方便各个过程调用,过程即管线,
  此处必须传递绑定组布局对象,可以从管线中推断获取,也可以直接传递绑定组布局对象本身
  */
})

const commandEncoder = device.createCommandEncoder() // 创建指令缓冲编码器对象
const renderPassEncoder = commandEncoder.beginRenderPass() // 启动一个渲染通道编码器
// 也可以启动一个计算通道
// const computePassEncoder = commandEncoder.beginComputePass({ /* ... */ }) 

/*
以渲染通道为例,使用 renderPassEncoder 完成这个通道内要做什么的顺序设置,例如
*/

// 第一道绘制,设置管线0、绑定组0、绑定组1、vbo,并触发绘制
renderPassEncoder.setPipeline(renderPipeline_0)
renderPassEncoder.setBindGroup(0, bindGroup_0)
renderPassEncoder.setBindGroup(1, bindGroup_1)
renderPassEncoder.setVertexBuffer(0, vbo, 0, size)
renderPassEncoder.draw(vertexCount)

// 第二道绘制,设置管线1、另一个绑定组并触发绘制
renderPassEncoder.setPipeline(renderPipeline_1)
renderPassEncoder.setBindGroup(1, another_bindGroup)
renderPassEncoder.draw(vertexCount)

// 结束通道编码
renderPassEncoder.endPass()

// 最后提交至 queue,也即 commandEncoder 调用 finish 完成编码,返回一个指令缓冲
device.queue.submit([
  commandEncoder.finish()
])

The above process is the general code of webgpu, which is very rough and has no details, but it is basically such a logic.

For the part of the code of the channel encoder, the author keeps it relatively complete, so that readers can better observe how an instruction encoder encodes the channel, and finally end the coding, create an instruction buffer and submit it to the queue.

Cook trick

Compared with cooking, the programming of OpenGL system is like taking whatever seasoning is needed when making a dish, making a dish and then continuing to do the next dish; The modern graphics API is multi stove firing, and all materials are in the right position, including processed ingredients and accessories. Even a chef (CPU) can cook several dishes at the same time, which is very efficient.

3 multithreading and powerful general purpose computing (GPGPU) capability

WebWorker 多线程

Webgl’s housekeeper object is a GL variable, which must rely on the HTML canvas element, that is, it must be obtained by the main thread, and the GPU state can only be scheduled in the main thread. The multithreading ability of webworker technology can only process data, which is relatively weak.

Webgpu has changed the acquisition method of the general manager object. The objects that the adapter object depends on can also be accessed in webworker. Therefore, devices can also be created in the worker, and instruction buffers can also be assembled, so as to realize multi-threaded instruction buffer submission and the ability of multi-threaded adjustment of GPU on the CPU side.

navigator.gpu

General purpose computing (GPU)

If webworker is a CPU side multithreading, the multithreading of GPU itself should also be used.

What can achieve this is a thing called “computing shader”. It is a programmable stage in the programmable pipeline. It is very late in OpenGL (because the early graphics card did not tap its parallel general computing ability), let alone webgl was not supported until 2.0. Apple was even too lazy to implement this feature for webgl 2.0.

Webgpu comes with this thing from the factory. By calculating shaders and using the shared memory next to Cu (compute unit) in GPU, the speed is much faster than that of ordinary video memory.

There is not much information about computational shaders. At present, we can only look at examples. A blog is also attached in resources.

After bringing GPGPU into the web, the runtime of scripting language (DeNO, browser JavaScript, and even nodejs in the future may support webgpu) can access the powerful parallel computing power of GPU, which is said to be tensorflow JS after using webgpu as a post technology, the performance has been significantly improved, which is of great help to deep learning and other fields. < strong > even if the user’s browser is not so trendy and the rendering programming has not changed webgl so quickly, the general computing power of webgpu can also shine in other fields, let alone the computing shader can also be used in rendering

How tempting!

4. Implementation of browser

As of the time of posting, edge and chrome can be opened and tried in Canary version through flag.

Both edge and chrome use the chromium core. Chromium implements the webgpu API through the dawn module. According to relevant materials, the dawn native part of dawn is responsible for communicating with the three graphics APIs, passing information to a module called dawnwire, and the dawnwire module is responsible for communicating with the JavaScript API, that is, the webgpu code you write. Wgsl is also implemented in this part. Dawn is implemented in C + +. You can find the connection in resources.

Firefox uses the gfx RS project to implement webgpu. Obviously, it is a webgpu implemented in rust language. It also has a module design similar to dawn.

Safari updates its WebKit to implement webgpu.

5 future

Not to mention the prospect of grand plans, but with the increasingly sophisticated GPU technology of red, green and blue, and the gradual improvement of GPUs at each mobile terminal, the three modern graphics APIs must still be developing. Webgpu will be able to release the powerful capabilities of modern graphics processor (GPU) on the Web terminal, whether it is graphics games or machine learning and AI capabilities brought by general parallel computing.

reference material

  • Google Dawn Page
  • gfx-rs GitHub Home Page
  • Get started with GPU Compute on the web