Azure VM disk latency spike – fixes for app timeouts

1. Disk bursting exhausted – the most common culprit

What's actually happening here is that your Azure VM's disk is running on credits-based bursting – and those credits ran dry. Most Premium SSDs under 512 GiB and all Standard SSDs under 1 TiB rely on a burst model. When you hit a sustained read workload that exceeds the disk's baseline IOPS, the disk dips into its burst credit bucket. Once that bucket is empty, the disk performance drops hard – we're talking 80-90% lower IOPS. Your app sees this as a latency wall, connections pile up, and timeouts cascade.

I've seen this most often on dev/test VMs running SQL Server Express or any database with checkpoint-heavy workloads. The giveaway: your monitoring shows a sawtooth pattern – performance is fine for 30-60 minutes, then suddenly goes to hell, then recovers after a quiet period when credits rebuild.

The fix: size up or switch to a larger tier

Check your current disk size in the Azure portal under Disks > Performance. The blade shows baseline IOPS and burst IOPS.
Calculate your sustained IOPS demand. If you're consistently above baseline, you need a disk that doesn't depend on bursting. Easiest path: resize the disk to the next tier where baseline meets your workload. For Premium SSD, that's 512 GiB (P30) or larger. Disks at P30 and above have no bursting – they just deliver their rated performance all day.
Alternative: if you can't resize and you're stuck on a smaller disk, throttle your workload. Add a small delay in your app's I/O loop, or reduce concurrent requests. That keeps your average IOPS under baseline and stops credit exhaustion.

Don't just add more disks and stripe them – that's a band-aid. The real fix is matching the disk tier to your workload's sustained demand.

# On Linux, check disk queue depth and latency with iostat
# If avgqu-sz stays above 1 and await spikes over 20ms, you're likely burst-exhausted
iostat -x 1 10

2. Host caching set to None on read-heavy workloads

This one is subtle and easy to miss. Azure VMs have a host-level cache that's actually local NVMe storage on the Hyper-V host. When you set host caching to ReadOnly on the OS disk or a data disk, reads hit that local cache instead of going to the remote storage backend. The latency difference is enormous – we're talking 0.5ms vs 5-10ms. If your host caching is set to None, every read goes over the network to the storage cluster, and any jitter there shows up as latency spikes.

The scenario where this bites hardest: you've migrated an on-premises app that does lots of small random reads – think an ERP system or a message queue. Locally, those reads were nearly instant. On Azure with cache disabled, every read traverses the Azure fabric. Network storms, storage node contention, or a noisy neighbor on the same storage scale unit all translate directly into your app's response time.

The fix: enable ReadOnly caching

Stop the VM. You cannot change caching on a running VM – Azure will tell you it needs to deallocate.
Go to the disk > Host caching and set it to ReadOnly for disks that handle mostly reads. For data disks with write-heavy transactional workloads (like SQL Server logs), use ReadWrite – but be careful with that, because writes to cached disks are acknowledged only after they land in the cache, and a host crash can lose data. Microsoft says the OS disk can tolerate this, but for data disks, ReadOnly is safer unless you know exactly what you're doing.
Start the VM and test. You'll see average latency drop by 60-80% immediately.

# PowerShell to set caching on a data disk (disk LUN 1, VM named web01)
$vm = Get-AzVM -ResourceGroupName rg-web -Name web01
$vm.StorageProfile.DataDisks[1].Caching = 'ReadOnly'
Update-AzVM -ResourceGroupName rg-web -VM $vm

Why step 3 works: The host cache is backed by local NVMe SSDs on the Hyper-V host. These drives are not shared with other tenants – they're dedicated to your VM's host. Reads that hit this cache bypass Azure Storage's distributed backend entirely, so they're immune to network congestion and storage node load.

3. Queue depth saturation on ultra-low-latency workloads

This one is less common but brutal when it hits. Azure disks have a maximum queue depth – the number of outstanding I/O requests the disk can handle at once. For Premium SSD, that limit is typically high (256 or more), but the problem is when your application or the OS driver sends fewer requests than the disk can optimally handle, or way too many. The sweet spot for Azure Premium SSD is usually a queue depth of 8-16 per disk. If your app is single-threaded and sends one request at a time, latency goes up because the disk isn't doing parallel processing. If your app sends 200 requests, the disk's scheduler gets overwhelmed and latency spikes.

I see this on Java apps using synchronous NIO with low connection pooling, or on Python async apps where the event loop sends too many concurrent fetches to the same disk. The monitoring sign: disk queue depth (avgqu-sz in iostat) consistently above 64, or below 1, and latency is high in both cases.

The fix: adjust concurrency at the app layer

Check your current disk queue depth using iostat (Linux) or Performance Monitor (Windows, counter: Physical Disk\Current Disk Queue Length).
If queue depth is consistently under 2, increase concurrency in your app. For .NET, bump up the thread pool or use ProducerConsumerCollection to parallelize writes. For Java, increase the connection pool in your database connector.
If queue depth is over 60, throttle it. Limit the number of concurrent I/O operations in your app. A common pattern: use a semaphore or a bounded work queue.
If you can't change the app code, consider using multiple smaller disks with RAID0 in software (Linux mdadm or Windows Storage Spaces). Each disk adds its own queue, and the OS spreads I/O across them. This is a hack, but it works.

Tip: On Linux, the default I/O scheduler (CFQ or BFQ) can also cause queue depth issues for SSDs. Switch to none or noop – modern NVMe drives don't benefit from reordering. Add elevator=none to your kernel boot parameters in /etc/default/grub and run update-grub.

Quick-reference summary

Cause	Primary symptom	Fix	Time to implement
Disk bursting exhausted	Latency spikes after 30-60 min of sustained load	Resize disk to P30 or larger (no bursting)	10 min (requires VM stop)
Host caching set to None	High average latency, sensitive to network jitter	Set caching to ReadOnly (ReadWrite for specific workloads)	5 min (requires VM stop)
Queue depth saturation	Latency high at both very low and very high concurrency	Adjust app concurrency; switch I/O scheduler to none	30 min – 2 hrs (code change)