Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

The Art of Release Management: Keys to a Seamless Rollout

Overview

A little taste of philosophy: Just like how life is unpredictable, so too are software releases. No matter the time and energy invested in planning a release, things go wrong unexpectedly, leaving us (the software team and business) puzzled. 

Through this blog, I will walk you through:

  1. Cures: the actions (or reactions!) from the first touchpoint of a software release gone haywire, scrutinizing it per user role in the software team. 
  2. Preventions: Later, I will introduce you to a framework that I devised by being part of numerous hiccups with the software releases, which eventually led me to strategize and correct the methodology for executing smoother releases. 

Software release hiccups: cures

Production issues are painful. They suck out the energy and impact the software teams and, eventually, the business on different levels. 

No system has ever been built foolproof, and there will always be occasions when things go wrong. 

“It's not what happens to you but how you react to it that matters.”

- Epictetus

I have broken down the cures for a software release gone wrong into three phases: 

1: Discovery phase

Getting into the right mindset

Just after the release, you start receiving alerts or user complaints about the issues they are facing with accessing the application. 

This is the trickiest phase of them all. When a release goes wrong, it is a basic human emotion to find someone to blame or get defensive. But remember, the user is always right.

And this is the time for acceptance that there indeed is a problem with the application.

Keeping the focus on the problem that needs to be resolved helps to a quicker and more efficient resolution. 

As a Business Analyst/Product/Project Manager, you can:

Handle the communications:

  • Keep the stakeholders updated at all the stages of problem-solving
  • Emails, root cause analysis [RCA] initiation
  • Product level executive decisions [rollback, feature flags, etc.]

As an engineer, you can:

  • Check the logs, because logs don’t lie
  • If the logs data is insufficient, check at a code level 

As a QA, you can:

  • Replicate the issue (obviously!)
  • See what test cases missed the scenario and why
  • Was it an edge case?
  • Was it an environment-specific issue?

Even though I have separate actions per role stated above, most of these are interchangeable. More eyes and ears help for a swift recovery from a bad release. 

2: Mitigation phase

Finding the most efficient solutions to the problem at hand

Once you have discovered the whys and whats of the problem, it is time to move onto the how phase. This is a crucial phase, as the clock ticks and the business is hurting. Everyone is expecting a resolution, and that too sooner. 

As a Business Analyst/Product/Project Manager, you can:

  • Have team session/s to come up with the best possible solutions. 
  • Multiple solutions help to gauge the trade-offs and to make a wiser decision.
  • PMs can help with making logical business decisions and analyzing the impacts from the business POV.
  • Communicate the solutions and trade-offs, if needed, with stakeholders to have more visibility on the mindsets.

As an engineer, you can:

  • Check technical feasibility vs. complexity in terms of time vs. code repercussions to help with the decision-making with the solution.
  • Raise red flags upfront, keeping in mind what part of the current problem to avoid reoccurrence. 
  • Avoid quick fixes as much as possible, even when there is pressure for getting the solutions in place.

As a QA, you can:

  • Focus on what might break with the proposed solution. 
  • Make sure to run the test cases or modify the existing ones to accommodate the new changes.
  • Replicate the final environment and scenarios in the sandbox as much as possible.

3: Follow-ups and tollgates

Stop, check and go 

Tollgates help us in identifying slippages and seal them tight for the future. Every phase of the software release brings us new learnings, and it is mostly about adapting and course correction, taking the best course of action as a team, for the team. 

Following are some of the tollgates within the release process: 

Unit Tests

  • Are all the external dependencies accounted for within the test scenarios?
  • Maybe the root cause case wasn’t considered at all, so it was not initially tested?
  • Too much velocity and hence unit tests were ignored to an extent.
  • Avoid the world of quick fixes and workarounds as much as possible.

User Acceptance Testing [UAT]

  • Is the sandbox environment different than the actual live environment?
  • Have similar configurations for servers so that we are welcomed by surprises after a release.
  • User error
  • Some issues may have been slipped due to human errors.
  • Data quality issue
  • The type of data in sandbox vs live environments is different, which is not catching the issues in sandbox.

Software release hiccups: Preventions

Prevention is better than cure; yes, for sure, that sounds cool! 

Now that we have seen how to tackle the releases gone wild, let me take you through the prevention part of the process. 

Though we understand the importance of having the processes and tools to set us up for a smoother release, it is only highlighted when a release goes grim. That’s when the checklists get their spotlight and how the team needs to adhere to the set processes within the team. 

Well, the following is not a checklist, per se, but a framework for us to identify the problems early in the software release and minimize them to some degree. 

The D.I.A.P.E.R Framework

So that you don’t have to do a clean-up later!

This essentially is a set of six activities that should be in place as you are designing your software.

Design

This is not the UI/UX of the app and relates to how the application logs should be maintained. 

Structured logs

  • Logs in a readable and consistent format that monitors for errors.

Centralized logging

  • Logs in one place and accessible to all the devs, which can be queried easily for advanced metrics.
  • This removes the dependency on specific people within the team. The logs are not needed by everyone, but the point is multiple people having access to them helps within the team.

Invest

  • Invest time in setting up processes
  • Software development
  • Release process/checklist
  • QA/UAT sign-offs
  • Invest money in getting the right tools which would cater to the needs
  • Monitoring
  • Alerting
  • Task management

Alerts

Setting up an alert mechanism automatically raises the flags for the team. Also, not everyone needs to be on these alerts, hence make a logical decision about who would be benefitting from the alerts system

  • Setup alerts
  • Email
  • Incident management software
  • Identify stakeholders who need to receive these alerts

Prepare

  • Defining strategies: who take action when things go wrong. This helps in avoiding chaotic situations, and the rest of the folks within the team can work on the solution instead
  • Ex: Identifying color codes for different severities (just like we have in hospitals)
  • Plan of Action for each severity
  • Not all situations are as severe as we think. Hence, it is important to set what action is needed for each of the severities.
  • Ops and dev teams should be tightly intertwined.

Evaluate

Whenever we see a problem, we usually tend to jump to solutions. In my experience, it has always helped me to take some time and identify the answers to the following: 

  • What is the issue?
  • The focus: problem
  • How severe?
  • Severity level and mentioned in the previous step
  • Who needs to be involved?
  • Not everyone within the team needs to be involved immediately to fix the problem; identifying who needs to be involved saves time for the rest of us. 

Resolve

There is a problem at hand, and the business and stakeholders expect a solution. As previously mentioned, keeping a cool head in this phase is of utmost importance.

  • Propose the best possible solution based on
  • Technical feasibility
  • Time
  • Cost
  • Business impact

Always have multiple solutions to gauge the trade-offs; some take lesser time but involve rework in the future. Make a logical decision based on the application and the nature of the problem. 

Takeaways

  • In the discovery phase of the problem, keep the focus on the problem
  • Keep a crisp communication with the stakeholders, making them aware of the severity of the problem and assuring them about a steady solution.
  • In the mitigation phases, identify who needs to be involved in the problem resolution
  • Come up with multiple solutions to pick the most logical and efficient solution out of the lot.
  • Have tollgates in places to catch slippages at multiple levels. 
  • D.I.A.P.E.R framework
  • Design structured and centralized logs.
  • Invest time in setting up the process and invest money in getting the right tools for the team.
  • Alerts: Have a notification system in place, which shall raise flags when things go beyond a certain benchmark.
  • Prepare strategies for different severity levels and assign color codes for the course of action for each level of threat.
  • Evaluate the problem and the action via who, what, and how?
  • Resolution of the problem, which is cost and time efficient and aligns with the business goals/needs. 

Remember that we are building the software for the people with the help of people within the team. Things go wrong even in the most elite systems with sophisticated setups. 

Do not go harsh on yourself and others within the team. Adapt, learn, and keep shipping! 

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

You may also like

No items found.

The Art of Release Management: Keys to a Seamless Rollout

Overview

A little taste of philosophy: Just like how life is unpredictable, so too are software releases. No matter the time and energy invested in planning a release, things go wrong unexpectedly, leaving us (the software team and business) puzzled. 

Through this blog, I will walk you through:

  1. Cures: the actions (or reactions!) from the first touchpoint of a software release gone haywire, scrutinizing it per user role in the software team. 
  2. Preventions: Later, I will introduce you to a framework that I devised by being part of numerous hiccups with the software releases, which eventually led me to strategize and correct the methodology for executing smoother releases. 

Software release hiccups: cures

Production issues are painful. They suck out the energy and impact the software teams and, eventually, the business on different levels. 

No system has ever been built foolproof, and there will always be occasions when things go wrong. 

“It's not what happens to you but how you react to it that matters.”

- Epictetus

I have broken down the cures for a software release gone wrong into three phases: 

1: Discovery phase

Getting into the right mindset

Just after the release, you start receiving alerts or user complaints about the issues they are facing with accessing the application. 

This is the trickiest phase of them all. When a release goes wrong, it is a basic human emotion to find someone to blame or get defensive. But remember, the user is always right.

And this is the time for acceptance that there indeed is a problem with the application.

Keeping the focus on the problem that needs to be resolved helps to a quicker and more efficient resolution. 

As a Business Analyst/Product/Project Manager, you can:

Handle the communications:

  • Keep the stakeholders updated at all the stages of problem-solving
  • Emails, root cause analysis [RCA] initiation
  • Product level executive decisions [rollback, feature flags, etc.]

As an engineer, you can:

  • Check the logs, because logs don’t lie
  • If the logs data is insufficient, check at a code level 

As a QA, you can:

  • Replicate the issue (obviously!)
  • See what test cases missed the scenario and why
  • Was it an edge case?
  • Was it an environment-specific issue?

Even though I have separate actions per role stated above, most of these are interchangeable. More eyes and ears help for a swift recovery from a bad release. 

2: Mitigation phase

Finding the most efficient solutions to the problem at hand

Once you have discovered the whys and whats of the problem, it is time to move onto the how phase. This is a crucial phase, as the clock ticks and the business is hurting. Everyone is expecting a resolution, and that too sooner. 

As a Business Analyst/Product/Project Manager, you can:

  • Have team session/s to come up with the best possible solutions. 
  • Multiple solutions help to gauge the trade-offs and to make a wiser decision.
  • PMs can help with making logical business decisions and analyzing the impacts from the business POV.
  • Communicate the solutions and trade-offs, if needed, with stakeholders to have more visibility on the mindsets.

As an engineer, you can:

  • Check technical feasibility vs. complexity in terms of time vs. code repercussions to help with the decision-making with the solution.
  • Raise red flags upfront, keeping in mind what part of the current problem to avoid reoccurrence. 
  • Avoid quick fixes as much as possible, even when there is pressure for getting the solutions in place.

As a QA, you can:

  • Focus on what might break with the proposed solution. 
  • Make sure to run the test cases or modify the existing ones to accommodate the new changes.
  • Replicate the final environment and scenarios in the sandbox as much as possible.

3: Follow-ups and tollgates

Stop, check and go 

Tollgates help us in identifying slippages and seal them tight for the future. Every phase of the software release brings us new learnings, and it is mostly about adapting and course correction, taking the best course of action as a team, for the team. 

Following are some of the tollgates within the release process: 

Unit Tests

  • Are all the external dependencies accounted for within the test scenarios?
  • Maybe the root cause case wasn’t considered at all, so it was not initially tested?
  • Too much velocity and hence unit tests were ignored to an extent.
  • Avoid the world of quick fixes and workarounds as much as possible.

User Acceptance Testing [UAT]

  • Is the sandbox environment different than the actual live environment?
  • Have similar configurations for servers so that we are welcomed by surprises after a release.
  • User error
  • Some issues may have been slipped due to human errors.
  • Data quality issue
  • The type of data in sandbox vs live environments is different, which is not catching the issues in sandbox.

Software release hiccups: Preventions

Prevention is better than cure; yes, for sure, that sounds cool! 

Now that we have seen how to tackle the releases gone wild, let me take you through the prevention part of the process. 

Though we understand the importance of having the processes and tools to set us up for a smoother release, it is only highlighted when a release goes grim. That’s when the checklists get their spotlight and how the team needs to adhere to the set processes within the team. 

Well, the following is not a checklist, per se, but a framework for us to identify the problems early in the software release and minimize them to some degree. 

The D.I.A.P.E.R Framework

So that you don’t have to do a clean-up later!

This essentially is a set of six activities that should be in place as you are designing your software.

Design

This is not the UI/UX of the app and relates to how the application logs should be maintained. 

Structured logs

  • Logs in a readable and consistent format that monitors for errors.

Centralized logging

  • Logs in one place and accessible to all the devs, which can be queried easily for advanced metrics.
  • This removes the dependency on specific people within the team. The logs are not needed by everyone, but the point is multiple people having access to them helps within the team.

Invest

  • Invest time in setting up processes
  • Software development
  • Release process/checklist
  • QA/UAT sign-offs
  • Invest money in getting the right tools which would cater to the needs
  • Monitoring
  • Alerting
  • Task management

Alerts

Setting up an alert mechanism automatically raises the flags for the team. Also, not everyone needs to be on these alerts, hence make a logical decision about who would be benefitting from the alerts system

  • Setup alerts
  • Email
  • Incident management software
  • Identify stakeholders who need to receive these alerts

Prepare

  • Defining strategies: who take action when things go wrong. This helps in avoiding chaotic situations, and the rest of the folks within the team can work on the solution instead
  • Ex: Identifying color codes for different severities (just like we have in hospitals)
  • Plan of Action for each severity
  • Not all situations are as severe as we think. Hence, it is important to set what action is needed for each of the severities.
  • Ops and dev teams should be tightly intertwined.

Evaluate

Whenever we see a problem, we usually tend to jump to solutions. In my experience, it has always helped me to take some time and identify the answers to the following: 

  • What is the issue?
  • The focus: problem
  • How severe?
  • Severity level and mentioned in the previous step
  • Who needs to be involved?
  • Not everyone within the team needs to be involved immediately to fix the problem; identifying who needs to be involved saves time for the rest of us. 

Resolve

There is a problem at hand, and the business and stakeholders expect a solution. As previously mentioned, keeping a cool head in this phase is of utmost importance.

  • Propose the best possible solution based on
  • Technical feasibility
  • Time
  • Cost
  • Business impact

Always have multiple solutions to gauge the trade-offs; some take lesser time but involve rework in the future. Make a logical decision based on the application and the nature of the problem. 

Takeaways

  • In the discovery phase of the problem, keep the focus on the problem
  • Keep a crisp communication with the stakeholders, making them aware of the severity of the problem and assuring them about a steady solution.
  • In the mitigation phases, identify who needs to be involved in the problem resolution
  • Come up with multiple solutions to pick the most logical and efficient solution out of the lot.
  • Have tollgates in places to catch slippages at multiple levels. 
  • D.I.A.P.E.R framework
  • Design structured and centralized logs.
  • Invest time in setting up the process and invest money in getting the right tools for the team.
  • Alerts: Have a notification system in place, which shall raise flags when things go beyond a certain benchmark.
  • Prepare strategies for different severity levels and assign color codes for the course of action for each level of threat.
  • Evaluate the problem and the action via who, what, and how?
  • Resolution of the problem, which is cost and time efficient and aligns with the business goals/needs. 

Remember that we are building the software for the people with the help of people within the team. Things go wrong even in the most elite systems with sophisticated setups. 

Do not go harsh on yourself and others within the team. Adapt, learn, and keep shipping! 

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings