Today I Learned

Decouple layout calculations from rendering

2025-03-08

So, in the new version of my app, I use iced's custom shader widget to render images. I've been using a rendering pipeline and shader that directly calculate vertices of images since I'd like to keep the aspect ratio of images when user resizes them. I had a function updating vertices and screen rect buffer like this:

pub fn update_vertices(&mut self, device: &wgpu::Device, bounds_relative: (f32, f32, f32, f32)) {
    let (x, y, width, height) = bounds_relative;
    let left = 2.0 * x - 1.0;
    let right = 2.0 * (x + width) - 1.0;
    let top = 1.0 - 2.0 * y;
    let bottom = 1.0 - 2.0 * (y + height);

    let vertices: [f32; 16] = [
        left, bottom, 0.0, 1.0, // Bottom-left
        right, bottom, 1.0, 1.0, // Bottom-right
        right, top, 1.0, 0.0, // Top-right
        left, top, 0.0, 0.0, // Top-left
    ];

    self.vertex_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Quad Vertex Buffer"),
        contents: bytemuck::cast_slice(&vertices),
        usage: wgpu::BufferUsages::VERTEX | wgpu::BufferUsages::COPY_DST,
    });
}

pub fn update_screen_uniforms(
    &self,
    queue: &wgpu::Queue,
    image_dimensions: (u32, u32),
    shader_size: (u32, u32),
    bounds_relative: (f32, f32, f32, f32),
) {
    let debug = false;
    let shader_width = shader_size.0 as f32;
    let shader_height = shader_size.1 as f32;
    let image_width = image_dimensions.0 as f32;
    let image_height = image_dimensions.1 as f32;
    let vertices = self.vertices;
    let (_left, bottom, _right, _top) = (vertices[0], vertices[1], vertices[2], vertices[3]);

    // Compute aspect ratios
    let image_aspect = image_width / image_height;
    let shader_aspect = shader_width / shader_height;

    // Calculate scale factors - the key is to use the SMALLER dimension to maintain aspect ratio
    let (scale_x, scale_y, fit_mode) = if image_aspect > shader_aspect {
        // Image is wider than container - fit width
        let scale = shader_width / image_width;
        (scale, scale, "FIT_WIDTH")
    } else {
        // Image is taller than container - fit height
        let scale = shader_height / image_height;
        (scale, scale, "FIT_HEIGHT")
    };

    // Apply scaling to get final dimensions
    let scaled_width = image_width * scale_x;
    let scaled_height = image_height * scale_y;
    
    // Calculate the scale factors relative to the container size
    let final_scale_x = scaled_width / shader_width;
    let final_scale_y = scaled_height / shader_height;
    
    // Calculate the vertical gap that needs to be distributed
    let gap_y = shader_height - scaled_height;
    
    // Calculate offset to center the scaled image vertically
    // Fine-tune the vertical offset with a correction factor to match Image widget
    // The bottom + 1.0 term accounts for asymmetric NDC space
    let offset_correction = 0.001; // Fine-tuning parameter (may need adjustment)
    let offset_y_ndc = (bottom + 1.0) * (1.0 - final_scale_y) / 2.0 + offset_correction;

    let screen_rect_data = [
        final_scale_x,      // Scale X 
        final_scale_y,      // Scale Y
        0.0,                // Offset X (centered horizontally)
        offset_y_ndc,       // Offset Y to center vertically
    ];
    // Update screen rect buffer
    queue.write_buffer(
        &self.screen_rect_buffer,
        0,
        bytemuck::cast_slice(&screen_rect_data),
    );
}

However, I noticed that the rendered image would "jiggle" slightly when resizing the window. At first, I assumed the layout math was off. But it turned out to be a deeper issue with how the layout and rendering were coupled.

Calculating the screen rect buffer at the shader level can be fragile. For example, I was using NDC-space vertex coordinates to calculate uniforms like this:

let vertices = self.vertices;
let (_left, bottom, _right, _top) = (vertices[0], vertices[1], vertices[2], vertices[3]);
...
let offset_y_ndc = (bottom + 1.0) * (1.0 - final_scale_y) / 2.0 + offset_correction;

I did need this to make the screen uniforms work correctly, but notice that NDC coordinates are in the range [-1.0, 1.0]. This means that even a tiny floating-point error (like 0.001) can shift the image by several pixels, especially on high-resolution screens.

Therefore, I decided to decouple layout calculations from rendering calculations like iced does in their widgets. I just pre-calculate the layout bounds in layout(), and then just render the shader full screen within the layout bounds. This way, the shader side doesn't have to worry about the layout calculations, and the layout calculations are only done once in layout(). It became much more smooth as you can see in the video below!

iced's event loop is faster

2025-03-01

I spent the last week debugging the performance bottlneck of the wgpu integration version of my app. At first, I thought it was rendfering or state updates, but those parts only took 20~30ms, which should have given me at least 30FPS, but when I measured FPS it was less than 15 and very slow.

I burned through 500 API calls on Cursor debugging this (lol), but it turns out that the bottleneck was the event loop itself. I noticed weird gaps between the end of window_event() and the start of next event. What had really been bothering me was that the previous version of my app ran way faster on macOS (MBP with M1 chip) using iced's event loop, but the wgpu-integrated version doesn't.

So I ended up hypothesizing that iced's event loop (iced_winit::program) must be faster than winit's default one. I tested this theory by generating a new event loop with Claude 3.7, that adapts iced's event loop. It can be roughly summarized like this:

use std::sync::mpsc::{self, Receiver, Sender};
use winit::{
    event::{Event, WindowEvent},
    event_loop::{ControlFlow, EventLoop, EventLoopProxy},
    window::WindowBuilder,
};

enum Control {
    ChangeFlow(ControlFlow),
    Exit,
}

fn main() {
    let event_loop = EventLoop::with_user_event();
    let proxy: EventLoopProxy<()> = event_loop.create_proxy();
    
    // Create communication channels for event handling
    let (event_sender, event_receiver): (Sender<Event<'_, ()>>, Receiver<Event<'_, ()>>) = mpsc::channel();
    let (control_sender, control_receiver): (Sender<Control>, Receiver<Control>) = mpsc::channel();

    let window = WindowBuilder::new()
        .build(&event_loop)
        .expect("Failed to create window");

    event_loop.run(move |event, _, control_flow| {
        *control_flow = ControlFlow::Poll;

        // Send events to the queue instead of handling them immediately
        if let Err(_) = event_sender.send(event.to_static().unwrap()) {
            return; // Exit if sender is dropped
        }

        // Process control messages asynchronously
        while let Ok(control) = control_receiver.try_recv() {
            match control {
                Control::ChangeFlow(flow) => *control_flow = flow,
                Control::Exit => *control_flow = ControlFlow::Exit,
            }
        }

        // Handle events from queue (non-blocking)
        while let Ok(event) = event_receiver.try_recv() {
            match event {
                Event::WindowEvent { event, .. } => match event {
                    WindowEvent::CloseRequested => {
                        control_sender.send(Control::Exit).ok();
                    }
                    WindowEvent::Resized(size) => {
                        println!("Window resized to: {:?}", size);
                    }
                    _ => {}
                },
                Event::RedrawRequested(_) => {
                    println!("Redraw triggered");
                }
                _ => {}
            }
        }

        // Request a redraw when needed (avoids redundant frames)
        window.request_redraw();
    });
}

I was able to roughly reproduce the performance of the previous version on my MBP, which is 40~50 FPS when rendering relevatively smaller images (~1080p). I uploaded a short comparison video on X.

I haven't fully understood how this works yet, but the key difference seems to be how this loop handles events asynchronously, while the winit's loop handles them synchronously. You can observe this in this part:

let (event_sender, event_receiver): (Sender<Event<'_, ()>>, Receiver<Event<'_, ()>>) = mpsc::channel();
let (control_sender, control_receiver): (Sender<Control>, Receiver<Control>) = mpsc::channel();

This setup queues events in a channel (mpsc::channel()), allowing them to be handled separately from the main event loop. The event_sender pushes incoming events into a queue, while the event_receiver pulls them for processing without blocking the loop.

While debugging this, I also noticed that sometimes winit produces a flood of CursorMoved events, and every time this happens, the app processes them one by one, triggering a re-render for each event. I think this setup keeps the app responsive by throwing those spammy events into an async channel instead of blocking the loop. If you're building a wgpu + winit app, this might help you speed things up too. Here's the link to the full event loop I used in my main.rs.

How to load fonts in wgpu integration with iced

2025-02-21

I currently use the wgpu integration setup for my image viewer app and needed to load custom fonts (including an icon font) without using a Compositor. In iced 0.13.1, you usually set your fonts via Settings and pass it like application(...).settings(your_settings). I found this part in iced_winit::program:run_action(), and it seems like it is the Compositor that loads fonts in the regular iced setup:

Action::LoadFont { bytes, channel } => {
    if let Some(compositor) = compositor {
        // TODO: Error handling (?)
        compositor.load_font(bytes.clone());

        let _ = channel.send(Ok(()));
    }
}

iced/graphics/src/compositor.rs:

/// Loads a font from its bytes.
fn load_font(&mut self, font: Cow<'static, [u8]>) {
    crate::text::font_system()
        .write()
        .expect("Write to font system")
        .load_font(font);
}

Compositor is not accessible in the wgpu integration example because we directly use Engine to render things, but it turns out you can directly access the FontSystem like this:

use std::borrow::Cow;
use iced_wgpu::graphics::text::font_system;

fn register_font_manually(font_data: &'static [u8]) {
    use std::sync::RwLockWriteGuard;

    // Get a mutable reference to the font system
    let font_system = font_system();
    let mut font_system_guard: RwLockWriteGuard<_> = font_system
        .write()
        .expect("Failed to acquire font system lock");

    // Load the font into the global font system
    font_system_guard.load_font(Cow::Borrowed(font_data));
}

and call it after the Engine creation in Self::Loading() block:

let engine = Engine::new(
    &adapter, &device, &queue, format, None);
engine.create_image_cache(&device); // Manually create image cache

// Manually register fonts
register_font_manually(include_bytes!("../assets/fonts/viewskater-fonts.ttf"));
register_font_manually(include_bytes!("../assets/fonts/Iosevka-Regular-ascii.ttf"));
register_font_manually(include_bytes!("../assets/fonts/Roboto-Regular.ttf"));

Now you can use icon fonts like before!

Second TIL

2025-02-20

Placeholdder second note to debug html.

// some code here...
println!("Hello World");

Trying the TIL section

2025-02-19

I usually do tech journaling on my local note taking app, but I realized that sometimes those notes are worth sharing. It feels too technical to publish on platforms like X and the volume tends to be too small for blog posts, so I made a designated section just to dump them. If this doesn't work I'll go back to just making blogs, but we'll see.