Fix MongoDB 'Operation exceeded time limit' on aggregations
MongoDB kills aggregation queries that run over 30 seconds. Here's the real fix and why it works.
You ran a big aggregation and got slapped with 'operation exceeded time limit'. I've been there. It's frustrating when the database just gives up on you.
Let's get you back to work. The fix is simpler than you think.
First, the quick fix: allow the cursor to return results
By default, MongoDB returns aggregation results inline in a single response. That's fine for small datasets. For large ones, the server has to hold everything in memory before sending it back—and it has a 30-second limit to do that. If it can't finish, you get the timeout error.
The fix is to tell the aggregation to return a cursor instead. A cursor lets MongoDB send results back in batches as they're processed, so it doesn't need to hold everything at once. This removes the 30-second limit. The server still takes its time, but it won't kill the query.
Here's the change. In your shell or driver, add cursor{} to the aggregation command:
db.collection.aggregate([
// your pipeline stages here
], { cursor: {} })
For example, if you were running this:
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
])
You'd change it to:
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
], { cursor: {} })
After you run this, you should see a cursor object returned instead of the full result set. You'll then iterate over the cursor—in the shell, it's automatic, and in most drivers, you'll use something like forEach to pull results batch by batch.
In Python with PyMongo, that looks like:
cursor = db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
], cursor={})
for doc in cursor:
print(doc)
In Node with the native driver:
const cursor = db.collection('orders').aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
], { cursor: {} })
await cursor.forEach(doc => console.log(doc))
Why does this work?
MongoDB's aggregation engine processes the pipeline in stages. Without a cursor, it builds the entire result set in memory, then sends it as a single BSON document. If that takes longer than 30 seconds, the server kills the operation. With a cursor, the server sends each batch of 101 documents (the default batch size) as soon as it's ready. The server's timeout no longer applies because it's not trying to send everything at once. The client just keeps asking for the next batch.
This isn't a hack. It's the recommended way to handle large aggregation results. The default inline mode exists for backward compatibility with older MongoDB versions and trivial queries. For anything serious, use a cursor.
When this doesn't work: real timeout vs. cursor timeout
There's a twist. Adding cursor{} removes the inline execution timeout, but you can still hit a cursor timeout if you don't read the cursor quickly enough. That's a different error: 'cursor id not found' or 'operation was interrupted'. The cursor has its own 10-minute idle timeout by default. If your query takes longer than 10 minutes to start returning results, or if the client pauses too long between batches, the cursor can die.
Fix: use the maxTimeMS option to extend the cursor's timeout. Or, better, use $out or $merge to write the aggregation results to a new collection. That runs the entire pipeline server-side and doesn't need a cursor at all.
Here's an example with $out:
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $out: "order_summary" }
])
This writes the result to a new collection called order_summary. No cursor needed, no timeout—the server handles everything. But you only want to do this if you actually need the data stored, not just displayed.
What if the pipeline itself is the bottleneck?
Sometimes the issue isn't the output method—it's that the pipeline is so slow that even with a cursor, it takes forever. This usually means you're missing indexes. The first stage in your pipeline should always be a $match that filters down the data. That $match should use an index. If you skip that, MongoDB has to scan the whole collection.
Run explain("executionStats") on your aggregation to see:
db.orders.explain("executionStats").aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
])
Look for totalDocsExamined vs nReturned. If they're far apart, you need an index on status. Also look at stage: "COLLSCAN"—that's a dead giveaway.
Add an index:
db.orders.createIndex({ status: 1 })
Then rerun the aggregation. You should see totalDocsExamined drop significantly.
Less common variations I've seen
- The
$lookuptrap. A$lookupon a foreign collection without an index on the foreign key will kill performance. Always add an index on the field you're joining on. For example, if your pipeline does{"$lookup": {"from": "customers", "localField": "customerId", "foreignField": "_id", "as": "customer"}}, make surecustomers._idis indexed (it is by default, but if you're joining on another field, you'll need to index it manually). - The
$unwindexplosion. If you$unwindan array field that has thousands of elements per document, you're multiplying the dataset massively. Put a$matchbefore the$unwindto reduce the number of documents, or use$unwindwithpreserveNullAndEmptyArrays: falseto skip empties. - Sharded clusters. On a sharded cluster, the
$outstage can cause a timeout if the primary shard is under load. Use$mergeinstead—it distributes the write across shards. - MongoDB Atlas limits. Atlas has a separate aggregation timeout of 60 seconds for M10+ clusters. You can't change that. If you hit it, you have to optimize the pipeline or split the work into smaller aggregation jobs.
How to prevent this from happening again
Do these three things on every aggregation that might touch more than 10,000 documents:
- Always use
cursor{}. Make it a habit. Even for small queries, it doesn't hurt, and it saves you when the data grows. - Add an index for the first
$matchstage. That's the single highest-impact optimization you can make. - Test with
maxTimeMSduring development. Set it to 5000 (5 seconds) to catch slow pipelines early. If it times out in dev, it'll time out in prod. You'll fix it before it becomes a problem.
Here's a development shell snippet to test your pipeline:
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
], {
cursor: {},
maxTimeMS: 5000
})
If it throws a timeout, you know the pipeline needs work before you push it live.
One last thing: don't just blindly increase maxTimeMS to 10 minutes. That masks the problem. Fix the root cause—bad indexes, missing cursor, or a poorly designed pipeline. Your users will thank you, and your database will stop crying.
Was this solution helpful?