In the last part we set up the building blocks of wgpu that we are now going to use to do some actual graphics programming. To start of, we need an eye to see in our virtual world. This is done by our camera. And the process of how it's done is kind of backwards to what you'd initially expect but will make sense once we take into account how the math works.

Building Virtual Worlds

The basic Idea is that instead of the 'camera' moving through the world, the world is moved in the inverse direction to the camera. The outcome is the same but is quite counter intuitive to how we as humans experience our environment (as we would say that we are moving though the house and not that the house is moving around us). Both physics and mathematics however agree that they simply don't care who is doing the moving (as long as the relative motion is the same). This by the way is one of the key assumptions that led Albert Einstein to come up with the theory of relativity.

So to 'move the world' each vertex of each triangle needs to be moved. Mathematically this can be modeled as a matrix multiplication. After the world has been moved into the right location, another matrix multiplication simulates the effects of 'the camera lense', positioning the vertices such that when they are parallel projected onto our 2D sceen they look as if a perspective projection had taken place (so essentially doing the inverse of a perspective projection). At the last step the GPU automatically performs the parallel projection onto our 'image' surface.

Performing all these matrix multiplications, is one of the steps that take such a long time when they have to be done one after the other on a CPU. As the matrix used to perform these projections is essentially always the same, the outcome only depends on the location of the vertex and the transformation matrix, making it trivial to parallelize and the one of the reasons GPUs are what they are (remember the math existed before GPUs did so people built the GPUs to be good at performing the steps that where prescribed by the mathematics).

The Camera

So to be able to render anything in 3D onto a screen we need something that handles the creation and the maintainence of these matrices on the GPU and exposes a 'camera like' interface to the rest of our program so that four our code it looks like the world stands still and the camera moves around (which is how humans think about things and how a player would expect to travel through one of our 3D worlds). For this we will need to create the Camera.

The camera thusly has two parts, the part that lives on the CPU side (so in our program), and the part that lives on the GPU. The thing on the GPU is going to be nothing more than a 4 by 4 matrix. This matrix will be fixed for each frame, but needs to be able to change from one frame to the other, as our camera could be moving through the world. Then the CPU part is the part that will keep track of the position and viewing direction of the camera and if needed modify the data on the GPU if stuff has changed. So the CPU will compute the matrix (a mere 16 numbers) while the GPU will apply the resulting transformation to possibly millions of vertices.

This is therefore going to be our Camera:

#[derive(Debug)]
pub struct Camera {
    // This is the position of the camera in world space
    pub position: Point3<f32>,
    // The direction in which the lens is pointing
    // as angles relative to the world coordinate frame
    pub pitch: Rad<f32>,
    pub yaw: Rad<f32>,
    // field of view of the camera (something like the difference between
    // a zoom lense and a ultra wide lens)
    pub field_of_view: Rad<f32>,
    // this is the aspect ratio of our screen, which we need to generate
    // the view transformation matrix
    pub aspect_ratio: f32,
    // clipping distances that decide what distance things have to be to
    // to be rendered
    pub znear: f32,
    pub zfar: f32,

    // this is the perspective matrix. We only need to compute this very
    // seldomly so we store it instead of recomputing it each time we
    // update the GPU uniform
    perspective: Matrix4<f32>,
    uniform: CameraUniform,
}

With the camera as it is, it will act similar to a human, it will be able to change it's position and will be able to look up-down and left-right (the pitch and yaw fields). Besides this we have a few things that describe the 'lens' of the camera like the field_of_view as well as the znear and zfar fields.

For now, we will not implement any of the 'dynamic' parts of the camera, that is the controls that allow for the camera to be controlled by keys on the keyboard or by mouse movements. This will mean that the only way to get a different view of our world is to recompile the app with different values during the construction of our camera (we will extend the camera to include user controls at runtime, but I'd like to get an image onto the screen first.

The very last member encapsulates the part of the Camera that lives on the GPU. It is able to set up and also update the GPU data storage that will be used as the transformation matrix and is also it's own struct.

pub struct CameraUniform {
    gpu_buffer: wgpu::Buffer,
    pub bind_group_layout: wgpu::BindGroupLayout,
    pub bind_group: wgpu::BindGroup,
}

Buffers and bind groups

This kind of data (like transformation matrices and also render parameters) are represented on the GPU through Uniform Buffers. Uniform buffers, as well as storage buffers and textures, need to be usable from within our shader program. To be able to do this, a shader is compiled such that it can be linked to Bind groups essentially a set of different 'environment variables' with a particular, pre defined layout. The various buffers and textures are then arranged into Bind Groups that are linked to the shader so that the data in the different buffers is accessible during the execution of the shader code.

For our camera we are going to place a single buffer into a bind group. This buffer stores our transformation matrix that is generated from the camera properties. For this to work, we first need to create a bind group layout, describing the structure of the data on the GPU, then a buffer to hold that data, and finally connect the buffer into a bind group (that in this case will only contain this buffer). To do this there are a bunch of functions in the CameraUniforms impl block.

impl CameraUniform {
    
    // -- *snip* -- //

    /// Build the structure of the bind group from this function and regester it with the device
    fn create_gpu_bind_group_layout(device: &wgpu::Device) -> wgpu::BindGroupLayout {
        device.create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor {
            label: Some("observer bind group layout"),
            entries: &[wgpu::BindGroupLayoutEntry {
                binding: 0,
                visibility: wgpu::ShaderStages::VERTEX | wgpu::ShaderStages::FRAGMENT,
                ty: wgpu::BindingType::Buffer {
                    ty: wgpu::BufferBindingType::Uniform,
                    has_dynamic_offset: false,
                    min_binding_size: None,
                },
                count: None,
            }],
        })
    }

    // actually create the bind group (the thing that is accessable from the shader) and put the
    // buffer containing the camera transformation into it
    fn create_bind_group(
        device: &wgpu::Device,
        layout: &wgpu::BindGroupLayout,
        proj_buffer: &wgpu::Buffer,
    ) -> wgpu::BindGroup {
        device.create_bind_group(&wgpu::BindGroupDescriptor {
            layout,
            label: Some("Observer bind group"),
            entries: &[wgpu::BindGroupEntry {
                binding: 0,
                resource: proj_buffer.as_entire_binding(),
            }],
        })
    }

    // create the buffer for the camera uniform on the GPU
    fn create_gpu_buffer(device: &wgpu::Device) -> wgpu::Buffer {
        device.create_buffer(&wgpu::BufferDescriptor {
            label: Some("Observer projection uniform buffer"),
            size: 16 * 4,
            // This buffer is the place that the view projection is placed in, so
            // we don't need the
            usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST,
            mapped_at_creation: true,
        })
    }
}

In the new method of our camera uniform, we will then use these methods to create the binding that can then be used from the shader code.

impl Camera {
    pub fn new(device: &wgpu::Device) -> Self {
        let gpu_buffer = Self::create_gpu_buffer(device);
        let bind_group_layout = Self::create_gpu_bind_group_layout(device);
        let bind_group = Self::create_bind_group(device, &bind_group_layout, &gpu_buffer);
        Self {
            gpu_buffer,
            bind_group_layout,
            bind_group,
        }
    }
}

And if we take a look at the shader from part one we will indeed find the definition for the transformation matrix.

struct Camera {
    view_proj: mat4x4<f32>,
};
 
@group(1) @binding(0)
var<uniform> camera: Camera;

From this code we can also see, that the bind group was bound as the second bind group (we are counting from 0) in this particular shader, and as expected at position 0 we find the buffer that is the camera transformation of the vertex shader. So now we have a rudimentary camera, now the next thing we need is something to render.