Ubuntu 24.04 upgrade broke my GNOME desktop (ROCm leftovers)
I finally upgraded my Ubuntu 22.04 machine to 24.04. The upgrade itself went smoothly, but rebooting into the new system greeted me with GNOME's dreaded crash screen—the white "X_X" face with "Oh no! Something has gone wrong."
The culprit? Old ROCm 6.4.3 artifacts from the previous installation were incompatible with 24.04's stricter parsing and newer libraries.
The symptoms
After rebooting, I couldn't even get to a TTY login prompt initially. The system would boot, show the crash screen, and that was it. Recovery mode was the only way in.
Issues I encountered
1. PostgreSQL blocking the upgrade
Before I could even start the upgrade, do-release-upgrade complained about PostgreSQL packages being in the removal deny list. Had to purge them first:
sudo apt purge postgresql*
2. Invalid udev rule syntax
Ubuntu 24.04 has stricter udev parsing. The ROCm rule at /etc/udev/rules.d/70-amdgpu.rules had swapped operators:
# Broken (22.04 tolerated this)
KERNEL="kfd", GROUP=="video", MODE="660"
# Fixed (24.04 requires correct operators)
KERNEL=="kfd", GROUP="video", MODE="0660"
The KERNEL field needs == (comparison), while GROUP needs = (assignment). This one showed up in journalctl as "Invalid operator for GROUP."
3. Missing drmModeCloseFB symbol
GNOME Shell was crashing with:
/usr/bin/gnome-shell: symbol lookup error: /lib/x86_64-linux-gnu/libmutter-14.so.0: undefined symbol: drmModeCloseFB
The symbol existed in the system libdrm, but ROCm's old libdrm in /opt/amdgpu/lib/ was being loaded first due to leftover ld.so.conf.d entries. The system was loading the wrong library.
4. Xorg segfault
Even after fixing the libdrm issue, X11 wouldn't start—Xorg was segfaulting in radeonsi_dri.so. Turned out there was an old /etc/X11/xorg.conf from the amdgpu-pro days that was forcing incompatible settings.
The fix
All of these issues traced back to leftover ROCm configuration files. The solution was to back them up and let the system use its defaults:
# Back up the problematic ld.so.conf files
sudo mv /etc/ld.so.conf.d/15-amdgpu-pro.conf{,.bak}
sudo mv /etc/ld.so.conf.d/20-amdgpu.conf{,.bak}
sudo mv /etc/ld.so.conf.d/10-rocm-opencl.conf{,.bak}
sudo mv /etc/ld.so.conf.d/rocm.conf{,.bak}
# Back up the old xorg.conf
sudo mv /etc/X11/xorg.conf{,.bak}
# Rebuild the library cache
sudo ldconfig
After this, Wayland worked fine. I later reinstalled ROCm for noble and everything came back up—PyTorch detects the GPU, compute workloads run, and the desktop is stable.
What to preserve during upgrade
These config files should survive the upgrade intact:
/etc/default/grub— kept my RDNA3 stability params (amdgpu.gfxoff=0,amdgpu.tmz=0,amdgpu.runpm=0,amdgpu.ppfeaturemask=0xfffd7fff)/etc/security/limits.conf— custom resource limits for GPU workloads
The upgrade process asked about these and I chose to keep my versions.
Lesson learned
If you're upgrading from 22.04 with ROCm installed, expect breakage. The third-party repos get disabled automatically by do-release-upgrade, but the old libraries and config files stick around and cause conflicts with the newer system components.
My approach next time: completely purge ROCm before upgrading, then reinstall fresh for the new release. Would've saved me a few hours of debugging in recovery mode.