Fix ERROR_CLUSTERLOG_CHKPOINT_NOT_FOUND (0x13A8) Fast
Your cluster log has a corrupt or missing checkpoint. The quick fix is to rebuild the cluster database. Here's how.
Let’s get this cluster back online
You’re looking at ERROR_CLUSTERLOG_CHKPOINT_NOT_FOUND (0x000013A8)—and yeah, it usually means your cluster service crashed and won’t restart. The good news? This is fixable without rebuilding the whole cluster. I’ve seen this exact error after a power failure knocked a two-node cluster offline mid-write. The checkpoint file got corrupted, and the cluster log couldn’t find where it left off.
Here’s the fix that works 9 times out of 10.
Fix it: Rebuild the cluster database
You need to force the cluster service to regenerate its database from the surviving nodes. Skip the GUI for this—use PowerShell.
- Stop the Cluster service on all nodes except one. Run this on each node you want to stop:
Stop-Service ClusSvc - On the node where you’ll keep the service running, open PowerShell as Administrator. Verify it’s the only node with the service active:
Get-Service ClusSvc | Select-Object StatusShould say
Running. - Now force the database rebuild. This command tells the cluster to ignore its local checkpoint and use the quorum’s copy:
Start-ClusterNode -ForceQuorumWait about 30 seconds.
- Start the Cluster service on the other nodes, one at a time:
Start-Service ClusSvc - Check that the cluster is healthy:
Get-ClusterNodeAll nodes should show
Up.
If the cluster still fails to come up fully, you may need to specify a node to act as the primary. Use this variant:
Start-ClusterNode -Name "YourNodeName" -ForceQuorum
Replace "YourNodeName" with the actual node name. This forces that node to take the lead and regenerate the log.
Why does this work?
The cluster log contains a checkpoint—a marker that says “the last consistent state was here.” When that checkpoint is missing or corrupt, the cluster service can’t figure out where to start replaying log entries. It panics and refuses to start.
By using -ForceQuorum, you’re telling the cluster: “Ignore the broken checkpoint. Start fresh from the quorum witness (or the surviving node’s copy).” The cluster service then rebuilds the checkpoint from the last stable state it can find. It’s like pressing a reset button on the log, but without losing your cluster roles or resources—assuming you have good quorum data.
Less common variations of this error
Sometimes this 0x13A8 error pops up in a different context. Here are two scenarios I’ve run into:
1. The checkpoint is physically damaged on disk
I had a client where a failing SSD caused the checkpoint file itself—C:\Windows\Cluster\CLUSTER.chk—to become unreadable. The error was the same, but the fix was different. In that case, you need to delete the corrupt checkpoint file on the affected node (after stopping the Cluster service) and let it regenerate from the healthy node:
Stop-Service ClusSvc
Remove-Item "C:\Windows\Cluster\CLUSTER.chk"
Start-Service ClusSvc
Start-ClusterNode -ForceQuorum
Do this only if you’re certain the other node has a clean copy. Otherwise, you risk losing cluster configuration.
2. Mixed OS versions in the cluster
This is rare but real. If you’re running a Windows Server 2016 node alongside a Windows Server 2019 node, the checkpoint format can differ. The older node writes a checkpoint that the newer node can’t read—or vice versa. The error code is the same 0x13A8, but the log message often includes “incompatible version.” The fix is to upgrade the older node’s OS or remove it from the cluster.
How to prevent this from happening again
You can’t completely bulletproof a cluster log, but you can reduce the odds of checkpoint corruption:
- Use a dedicated witness. File share or cloud witness. Don’t rely on a single node for quorum. It forces the cluster to have a consistent backup of the log.
- Keep your nodes on the same Windows build. Run
Get-CimInstance Win32_OperatingSystem | Select-Object BuildNumberon all nodes. They should match. - Patch consistently. Cluster-related hotfixes often fix log handling. Don’t skip security patches on cluster nodes.
- Monitor disk health. Use
wmic diskdrive get statusor Set up a simple scheduled task that checks for disk errors. A dying drive will bite you in the checkpoint file. - Test your backups. Seriously. Restore your cluster configuration to a lab environment at least once a year. I’ve had to rebuild entire clusters because nobody knew the passwords or IPs.
If you get this error again after the fix, check the System event log for disk errors. A failing drive will corrupt the checkpoint over and over. Replace it before it takes down more than just the cluster log.
Was this solution helpful?