I’ve got 99 problems but Patch Tuesday ain’t one – a dev’s guide to Patch Tuesday

by Hang Tian
As a Senior DevOps Engineer with 6+ years’ experience, I’m here to explain tricky technical concepts, breakdown new trends and share tips on working with cloud-native technologies.
Published on July 2020

Platform-as-a-Service, or PaaS, allows customers to focus on their code rather than the continuous security updates and OS patches required for the underlying Virtual Machines, and other resources.

However, these resources don’t magically update themselves.

What’s the Patch Tuesday issue?

Patch Tuesday is the name given to the Tuesday, usually the second of the month, on which Microsoft updates their platform, although this varies a little across timezone/region. According to the Microsoft App Service team, these patches need to be completed within 10 days, which is a little confusing given the difference between time zones and regions.

These updates may require a web app restart, in which case quicker applications may see a barrage of blips from the whichever web monitoring tool being used, and slower, monolithic beasts may experience outages. 

The question here is – how can you tell if the blips/temporary outages were caused by the updates? We’ll show you!

Our method for determining whether blips and outages were caused by updates 

  1. Login to the Azure Portal and navigate to the App Service in question
  2. Click on Diagnose and Solve Problems, then Availability and Performances

a diagram

       3. Go to Web App Restarted in the next screen, you’ll find  a list of events detected with additional info displayed if there were any Application Stop Events in the past 24 hours. Scroll to the bottom of the page and you’ll find an App Restart Timeline graph showing platform-related events in an easy-to-understand, visual manner.

a diagram

The above steps help you find Web App Restarted events, as we all know that the Azure platform also recycles the VM instances under the hood of each web app. You’d probably notice this if your application took a relatively longer time to warm up, in which case, when Azure adds new VM instances into the pool to server traffic in cold start mode, alerts will likely be triggered by your website monitoring system. 

But how can you determine if your application’s performance issues were caused by the platform updates as opposed to genuine code issues? 

The first two steps in this process were the same as those above, except,  instead of going to Web App Restarted Events,we navigated to High CPU Analysis.

Below is a graph of the VM Instance Recycle Events occurring in a 24-hour range in one of our web apps.

a diagram

In this web app, we scale out to 3 instances, but as you can see, within 24 hours, there were 8 VM instances in total appearing throughout the 24 hours. The times highlighted were the times when new instances were introduced and old instances were removed from the VM pool, since the CPU started from 0 for new instances and for old instances which were to be removed, the CPU went down, gradually, to 0.  

Being able to determine if Azure platform updates were the cause of application performance issues or even unavailability, immensely helped us to take the correct actions to mitigate the problems. 

A real-world example of the problems updates can cause and how to handle them

A few weeks ago, we kept getting intermittent up-time alerts from one of our client’s websites.

This client is a world-leading provider of a specialist eCommerce product. As you can imagine, the website’s intermittent performance issue could have been significant. 

Our team noticed Azure platform update events when we received the very first few alerts, but it kept happening. Within a few hours, we reached out to Microsoft and the response we got back indicated that there were some issues with their underlying infrastructure causing the repeating platform updates, which then resulted in constant web app restarts.

To keep the site stable, we decided to divert all the traffic to the web apps in another region – where there were no platform problems – until Microsoft resolved their platform incidents.   

How we can help

If you’re reading this blog, it’s likely you’ve run into the same Patch-Tuesday problems we have, and though we’ve given you some great ideas for overcoming these, it would be even simpler to just let us handle that for you!

We’ve carried out Azure projects for the world’s leading provider of insulation, Kingspan, global designed surface company Formica, and Translink, the primary rail provider in Northern Ireland. To find out how we could help you with your Azure needs, just get in touch. 

 

SHARE

CONTACT US

With partners across the USA, Europe and APAC, we provide a truly global service. So wherever you or your clients are based, contact us today to find out what we can do.