Azure SQL DTU Exhaustion: Fix Query Timeouts Fast
When your Azure SQL Database hits its DTU cap, queries time out. Start with quick indexing, then scale DTU or refactor slow queries. I'll walk you through each step.
Why Your Queries Are Timing Out
You're running a critical report, maybe a SELECT COUNT(*) on a billion-row table, and bam—timeout. Or a web app that worked fine at 9 AM is now throwing Timeout expired errors at 2 PM. Nine times out of ten, this is DTU exhaustion. Azure SQL's DTU model gives you a capped pool of CPU, memory, and I/O. When you hit that cap, new queries queue up—and some just die.
I've seen this on everything from a Standard S2 to a Premium P6. The fix isn't always "buy more." Let's start simple.
Step 1: The Quick Index Fix (30 seconds)
Before you touch anything else, check if a missing index is the culprit. A single bad scan can eat your whole DTU budget.
How to find it
Run this in the Azure portal's Query Editor or SSMS:
SELECT TOP 5
avg_total_user_cost * avg_user_impact * (user_seeks + user_scans) AS impact,
mid.equality_columns,
mid.inequality_columns,
mid.included_columns,
'CREATE NONCLUSTERED INDEX IX_' + OBJECT_NAME(mid.object_id) + '_' + REPLACE(REPLACE(ISNULL(mid.equality_columns, ''), ', ', '_'), ']', '') + ' ON ' + mid.statement + ' (' + ISNULL(mid.equality_columns, '') + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ', ' ELSE '' END + ISNULL(mid.inequality_columns, '') + ') ' + CASE WHEN mid.included_columns IS NOT NULL THEN 'INCLUDE (' + mid.included_columns + ')' ELSE '' END AS create_index_statement
FROM sys.dm_db_missing_index_details mid
CROSS APPLY sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
WHERE migs.avg_user_impact > 50
ORDER BY impact DESC;This gives you the index that'd save the most DTU. Copy the CREATE INDEX statement, run it. If you're in a production database, do it during low traffic—or use ONLINE = ON in SQL Server 2016+ (Azure SQL supports it).
Real-world example: I had a client whose S2 database was timing out every 15 minutes. One missing index on a WHERE status = 'pending' column dropped DTU from 95% to 30%. Problem solved in under a minute.
If DTU drops below 80% after the index, you're done. If not, move to Step 2.
Step 2: The DTU Scale-Up (5 minutes)
Sometimes indexing isn't enough. Your workload genuinely exceeds the tier. Here's where we bump up the service objective.
Check your current DTU consumption
In the Azure portal, go to your SQL Database > Metrics. Add a chart for DTU percentage. If it's consistently above 80% during your timeout windows, scaling is the right call.
Choose the right tier
- Go to the database's Configure pane.
- Switch from Standard to Premium—if you need more I/O, Premium's SSD-backed storage handles high DTU better.
- Or just bump up within Standard: S0 to S1, S1 to S2, etc. Most timeouts happen on S0 or S1.
- Click Apply. This takes about 2–5 minutes, with maybe a 30-second blip in connectivity.
Opinion here: Don't jump to S3 or higher unless you're sure. The price jump is steep. Start with one tier up. If that doesn't help, go to Step 3.
Also, consider elastic pools if you have multiple databases. Pooling DTUs across them can absorb spikes without over-provisioning each one. That's a 10-minute setup, but worth it for multi-tenant apps.
Step 3: The Deep Query Refactor (15+ minutes)
You've indexed. You've scaled. Still timing out? Now we rewrite the queries. This is the nuclear option—but it's also the most sustainable.
Find the worst offenders
Use Azure SQL's Query Store. In the portal, go to your database > Query Performance Insight. Look at the top queries by DTU consumption. You'll see the same ones over and over.
Or run this:
SELECT TOP 10
qs.total_worker_time / qs.execution_count AS avg_cpu_ms,
qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
qs.total_elapsed_time / qs.execution_count AS avg_elapsed_ms,
st.text,
qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY avg_logical_reads DESC;Export the query plan XML. Open it in SSMS to see the actual execution plan. Look for:
- Table scans on big tables (you missed an index in Step 1).
- Key lookups that are expensive—add included columns to your index.
- Sort operations on unfiltered datasets—add a
WHEREclause to reduce rows before sorting. - Parallelism causing waits. Sometimes a
MAXDOP 1hint helps if DTU is pegged.
Rewrite techniques that work
- Break up large batches. Instead of one
UPDATEon 10 million rows, do it in chunks of 10,000 withWAITFOR DELAY '00:00:01'between them. This keeps DTU under the cap. - Use
SELECT ... INTOsparingly. It logs every row and burns transaction log DTU. PreferCREATE TABLE AS SELECTin Azure SQL? Actually, that's not supported—so use batch inserts. - Replace cursors with set-based operations. Cursors in T-SQL are a DTU black hole. Rewrite them as a single
UPDATEwith aJOIN. I've seen a cursor-based report drop from 100% DTU to 15% after that change.
Real-world case
A customer's SELECT * on a 500-column table was causing 90% DTU. The app only needed 5 columns. Changing to SELECT col1, col2, col3 reduced I/O DTU by 40%. Simple wins.
If none of this works, you might need to move to vCore purchasing model. vCore gives you dedicated resources and no DTU cap. But that's a longer migration—save it for last.
When to Call Microsoft Support
If you've done all three steps and still see timeouts, open a support ticket. Have your Query Store data ready. Sometimes it's a bug in Azure SQL's resource governor that needs a hotfix. I've seen that twice in 6 years, but it does happen.
You've got this. Start with the index. Everything else builds from there.
Was this solution helpful?