When you automate your operational support and maintenance of Azure DevOps, such as updating the pre- and post-approvers of 2500 Azure Pipelines, or creating a detailed report of all Azure Pipelines in your Azure DevOps Organization, you may come across ad-hoc exceptions, "429 Too Many Requests Error", "503 Service Unavailable", or a "last time it 100% worked for sure with no issues" call for help.
Example 1 of an automation meltdown
Service Unavailable
Service Unavailable
HTTP Error 503. The service is unavailable.
Example 2 of an automation meltdown
Azure DevOps Services Unavailable
Azure DevOps Services
Sorry! Our services aren't available right now.
We're working to restore all services as quickly as
possible. Please check back soon.
To see the latest status on our services, please
visit our support page.
In both cases I ended up having to intervene manually and restart the maintenance job. Muda!
Transient fault
You probably triggered a throttling or circuit breaker pattern, which Azure DevOps uses to protect itself against excessive load or potential denial of service
attacks. If you do not deal with the transient fault you will end up with failed automation, wasted time, and manual intervention - more WASTE.
Dealing with transient fault
Here is a simple retry pattern that allows you to retry the operation after going to sleep for a while. You may have to play with and increase the $retryValue
default value, depending on the REST API you are calling.
1.13 has worked for me and my automation scripts to date.
Retry logic
$retryCount = 3
$retrySleep = 1.13
$retryCheck = $retryCount - 1
for ( $failureCount = 0; $failureCount -lt $retryCount; $failureCount++ )
{
try
{
# call Azure DevOps REST API
$result = Invoke-RestMethod -Uri $url -ContentType "application/json" -Headers $headers
# processing
# ...
}
catch
{
# retry logic
if ( $failureCount -eq $retryCheck )
{
# ensure that pipeline shows a warning
Write-Output "##vso[task.complete result=SucceededWithIssues;]SUCCEEDED WITH ISSUES"
break;
}
else
{
# Logging
Write-Output "Sleep and then retry processing loop. Count: " $failureCount " Sleep: " $retrySleep
Write-Output $_.Exception.Message
# retry
$retrySleep = $retrySleep * ( $failureCount + 1 );
Start-Sleep -Seconds $retrySleep
}
}
}
Simple, yet effective.
Read 3 ways to handle transient faults for DevOps for more details on transient faults.