Fix AWS EC2 Reboot Loop After Kernel Update

Quick answer (for advanced users)

Stop the instance, detach the root volume, attach it to a working recovery instance, chroot in, downgrade or pin the kernel package, then reattach and start. Or use EC2 Serial Console to select an older kernel at boot (if enabled).

Why this happens

You updated the kernel via yum update or apt upgrade on an EC2 instance, rebooted, and now it never comes back. The AWS console shows the status checks failing, and the instance keeps restarting. What's actually happening here is that the new kernel either has a bug specific to the instance type (like missing NVMe drivers on older AMIs) or a configuration mismatch — maybe you built a custom module that the new kernel can't load, or the initramfs didn't regenerate properly. The VM tries to boot, hits a kernel panic or hangs early in the boot process, then the hypervisor restarts it because the instance isn't responding to health checks. Rinse, repeat.

The reason you can't SSH in is simple: the OS never finishes booting. The kernel crashes before networking even comes up. So any remote fix is out — you need to touch the root volume from outside the broken instance.

Step-by-step fix (main method)

Stop the broken instance from the AWS EC2 console. Don't terminate it — you need that root volume. Wait until the state shows "stopped".
Detach the root volume. Go to Volumes, select the root volume attached to your broken instance, and choose Actions > Detach Volume. Wait a few seconds for the status to change to "available".
Launch a temporary recovery instance in the same availability zone. Use any Amazon Linux 2 or Ubuntu AMI that matches your broken instance's OS family. The exact version doesn't matter much — you just need a working environment to mount a drive and chroot.
Attach the broken volume to the recovery instance. Go to Volumes, select the broken volume, choose Actions > Attach Volume, and pick your recovery instance. By default it shows up as /dev/sdf (which becomes /dev/xvdf on Nitro instances).

SSH into the recovery instance and mount the volume:

sudo lsblk  # confirm the device name, usually xvdf1
sudo mkdir -p /mnt/recovery
sudo mount /dev/xvdf1 /mnt/recovery

Chroot into the broken system:

sudo mount --bind /dev /mnt/recovery/dev
sudo mount --bind /proc /mnt/recovery/proc
sudo mount --bind /sys /mnt/recovery/sys
sudo chroot /mnt/recovery

Now you're inside the broken OS. You can run package managers and kernel commands as if you're the original instance.

Check installed kernels:

# On Amazon Linux / RHEL / CentOS:
rpm -qa kernel*

# On Ubuntu / Debian:
dpkg --list | grep linux-image

You'll likely see two or three versions. The newest one is the troublemaker.

Remove or pin the bad kernel:

# Amazon Linux 2 / RHEL 7/8:
yum remove kernel-5.10.210-201.852.amzn2 -y

# Ubuntu:
apt-get remove --purge linux-image-5.15.0-91-generic -y

# Or pin the old one to stop updates picking it again:
echo "exclude=kernel*" >> /etc/yum.conf

The reason step 8 works is you're removing the kernel that panics, leaving the previous one as the only boot option. grubby or update-grub automatically updates the bootloader.

Regenerate initramfs (optional but safe):

# Amazon Linux 2 / RHEL:
dracut -f

# Ubuntu:
update-initramfs -u

Exit chroot, unmount, detach:
```
exit
sudo umount /mnt/recovery/dev
sudo umount /mnt/recovery/proc
sudo umount /mnt/recovery/sys
sudo umount /mnt/recovery
```
Then detach the volume from the recovery instance (same way you attached it) and attach it back to your original instance. Name it as /dev/sda1 (root). Start the instance. It should boot normally on the old kernel.

Alternative fix: EC2 Serial Console

If you've enabled EC2 Serial Console on your account and the instance has it, you can skip the recovery instance entirely. From the console, select your instance, go to Actions > Monitor and troubleshoot > Get system log. That shows serial output. But to interactively select a kernel at boot, you need to use the AWS CLI or SDK to connect to the serial port. This is faster but requires the instance to be configured to show the GRUB menu (most aren't by default). The recovery instance method works 100% of the time.

Alternative fix: Snapshot and launch replacement

Don't want to mess with volumes? Take a snapshot of the broken root volume, launch a new instance from an older AMI, then copy your data over. This is the nuclear option — you lose any system-level customizations. Only do this if the recovery instance approach sounds too complex or you don't have time. A snapshot preserves your data, so you can still grab files later.

Prevention tip

Block automatic kernel updates. On Amazon Linux 2, run:

sudo yum update --security --exclude=kernel*

And pin the kernel in /etc/yum.conf with exclude=kernel*. Do updates manually: yum update kernel* when you're ready to reboot. On Ubuntu, hold the kernel package with apt-mark hold linux-image-$(uname -r) and only upgrade kernels in test instances first. Also enable the EC2 Serial Console before you need it — it's a lifesaver when things go sideways.

The root cause of most reboot loops after kernel updates on EC2 is that you're running a kernel that the instance's underlying hardware (especially Nitro-based types like t3, m5, c5) doesn't support without the right modules baked into the initramfs. Always rebuild the initramfs before rebooting if you manually compile or install a kernel. That's the step everyone skips, and it's what gets you into this loop.