I spent the last week debugging the performance bottlneck of the wgpu integration version of my app. At first, I thought it was rendfering or state updates, but those parts only took 20~30ms, which should have given me at least 30FPS, but when I measured FPS it was less than 15 and very slow.
I burned through 500 API calls on Cursor debugging this (lol), but it turns out that the bottleneck was the event loop itself. I noticed weird gaps between the end of window_event()
and the start of next event. What had really been bothering me was that the previous version of my app ran way faster on macOS (MBP with M1 chip) using iced's event loop, but the wgpu-integrated version doesn't.
So I ended up hypothesizing that iced's event loop (iced_winit::program) must be faster than winit's default one. I tested this theory by generating a new event loop with Claude 3.7, that adapts iced's event loop. It can be roughly summarized like this:
use std::sync::mpsc::{self, Receiver, Sender};
use winit::{
event::{Event, WindowEvent},
event_loop::{ControlFlow, EventLoop, EventLoopProxy},
window::WindowBuilder,
};
enum Control {
ChangeFlow(ControlFlow),
Exit,
}
fn main() {
let event_loop = EventLoop::with_user_event();
let proxy: EventLoopProxy<()> = event_loop.create_proxy();
// Create communication channels for event handling
let (event_sender, event_receiver): (Sender<Event<'_, ()>>, Receiver<Event<'_, ()>>) = mpsc::channel();
let (control_sender, control_receiver): (Sender<Control>, Receiver<Control>) = mpsc::channel();
let window = WindowBuilder::new()
.build(&event_loop)
.expect("Failed to create window");
event_loop.run(move |event, _, control_flow| {
*control_flow = ControlFlow::Poll;
// Send events to the queue instead of handling them immediately
if let Err(_) = event_sender.send(event.to_static().unwrap()) {
return; // Exit if sender is dropped
}
// Process control messages asynchronously
while let Ok(control) = control_receiver.try_recv() {
match control {
Control::ChangeFlow(flow) => *control_flow = flow,
Control::Exit => *control_flow = ControlFlow::Exit,
}
}
// Handle events from queue (non-blocking)
while let Ok(event) = event_receiver.try_recv() {
match event {
Event::WindowEvent { event, .. } => match event {
WindowEvent::CloseRequested => {
control_sender.send(Control::Exit).ok();
}
WindowEvent::Resized(size) => {
println!("Window resized to: {:?}", size);
}
_ => {}
},
Event::RedrawRequested(_) => {
println!("Redraw triggered");
}
_ => {}
}
}
// Request a redraw when needed (avoids redundant frames)
window.request_redraw();
});
}
I was able to roughly reproduce the performance of the previous version on my MBP, which is 40~50 FPS when rendering relevatively smaller images (~1080p). I uploaded a short comparison video on X.
I haven't fully understood how this works yet, but the key difference seems to be how this loop handles events asynchronously, while the winit's loop handles them synchronously. You can observe this in this part:
let (event_sender, event_receiver): (Sender<Event<'_, ()>>, Receiver<Event<'_, ()>>) = mpsc::channel();
let (control_sender, control_receiver): (Sender<Control>, Receiver<Control>) = mpsc::channel();
This setup queues events in a channel (mpsc::channel()
), allowing them to be handled separately from the main event loop. The event_sender
pushes incoming events into a queue, while the event_receiver
pulls them for processing without blocking the loop.
While debugging this, I also noticed that sometimes winit produces a flood of CursorMoved
events, and every time this happens, the app processes them one by one, triggering a re-render for each event. I think this setup keeps the app responsive by throwing those spammy events into an async channel instead of blocking the loop. If you're building a wgpu + winit app, this might help you speed things up too. Here's the link to the full event loop I used in my main.rs.