The Art of Release Management: Keys to a Seamless Rollout

Bharat Barve

Product Management

Tags:

software development

release management

diaper framework

Overview

A little taste of philosophy: Just like how life is unpredictable, so too are software releases. No matter the time and energy invested in planning a release, things go wrong unexpectedly, leaving us (the software team and business) puzzled.

Through this blog, I will walk you through:

Cures: the actions (or reactions!) from the first touchpoint of a software release gone haywire, scrutinizing it per user role in the software team.
Preventions: Later, I will introduce you to a framework that I devised by being part of numerous hiccups with the software releases, which eventually led me to strategize and correct the methodology for executing smoother releases.

Software release hiccups: cures

Production issues are painful. They suck out the energy and impact the software teams and, eventually, the business on different levels.

No system has ever been built foolproof, and there will always be occasions when things go wrong.

“It's not what happens to you but how you react to it that matters.”

- Epictetus

I have broken down the cures for a software release gone wrong into three phases:

1: Discovery phase

Getting into the right mindset

Just after the release, you start receiving alerts or user complaints about the issues they are facing with accessing the application.

This is the trickiest phase of them all. When a release goes wrong, it is a basic human emotion to find someone to blame or get defensive. But remember, the user is always right.

And this is the time for acceptance that there indeed is a problem with the application.

Keeping the focus on the problem that needs to be resolved helps to a quicker and more efficient resolution.

As a Business Analyst/Product/Project Manager, you can:

Handle the communications:

Keep the stakeholders updated at all the stages of problem-solving
Emails, root cause analysis [RCA] initiation
Product level executive decisions [rollback, feature flags, etc.]

As an engineer, you can:

Check the logs, because logs don’t lie
If the logs data is insufficient, check at a code level

As a QA, you can:

Replicate the issue (obviously!)
See what test cases missed the scenario and why
Was it an edge case?
Was it an environment-specific issue?

Even though I have separate actions per role stated above, most of these are interchangeable. More eyes and ears help for a swift recovery from a bad release.

2: Mitigation phase

Finding the most efficient solutions to the problem at hand

Once you have discovered the whys and whats of the problem, it is time to move onto the how phase. This is a crucial phase, as the clock ticks and the business is hurting. Everyone is expecting a resolution, and that too sooner.

As a Business Analyst/Product/Project Manager, you can:

Have team session/s to come up with the best possible solutions.
Multiple solutions help to gauge the trade-offs and to make a wiser decision.
PMs can help with making logical business decisions and analyzing the impacts from the business POV.
Communicate the solutions and trade-offs, if needed, with stakeholders to have more visibility on the mindsets.

As an engineer, you can:

Check technical feasibility vs. complexity in terms of time vs. code repercussions to help with the decision-making with the solution.
Raise red flags upfront, keeping in mind what part of the current problem to avoid reoccurrence.
Avoid quick fixes as much as possible, even when there is pressure for getting the solutions in place.

As a QA, you can:

Focus on what might break with the proposed solution.
Make sure to run the test cases or modify the existing ones to accommodate the new changes.
Replicate the final environment and scenarios in the sandbox as much as possible.

3: Follow-ups and tollgates

Stop, check and go

Tollgates help us in identifying slippages and seal them tight for the future. Every phase of the software release brings us new learnings, and it is mostly about adapting and course correction, taking the best course of action as a team, for the team.

Following are some of the tollgates within the release process:

Unit Tests

Are all the external dependencies accounted for within the test scenarios?
Maybe the root cause case wasn’t considered at all, so it was not initially tested?
Too much velocity and hence unit tests were ignored to an extent.
Avoid the world of quick fixes and workarounds as much as possible.

User Acceptance Testing [UAT]

Is the sandbox environment different than the actual live environment?
Have similar configurations for servers so that we are welcomed by surprises after a release.
User error
Some issues may have been slipped due to human errors.
Data quality issue
The type of data in sandbox vs live environments is different, which is not catching the issues in sandbox.

Software release hiccups: Preventions

Prevention is better than cure; yes, for sure, that sounds cool!

Now that we have seen how to tackle the releases gone wild, let me take you through the prevention part of the process.

Though we understand the importance of having the processes and tools to set us up for a smoother release, it is only highlighted when a release goes grim. That’s when the checklists get their spotlight and how the team needs to adhere to the set processes within the team.

Well, the following is not a checklist, per se, but a framework for us to identify the problems early in the software release and minimize them to some degree.

The D.I.A.P.E.R Framework

So that you don’t have to do a clean-up later!

This essentially is a set of six activities that should be in place as you are designing your software.

‍

Design

This is not the UI/UX of the app and relates to how the application logs should be maintained.

Structured logs

Logs in a readable and consistent format that monitors for errors.

Centralized logging

Logs in one place and accessible to all the devs, which can be queried easily for advanced metrics.
This removes the dependency on specific people within the team. The logs are not needed by everyone, but the point is multiple people having access to them helps within the team.

‍

Invest

Invest time in setting up processes
Software development
Release process/checklist
QA/UAT sign-offs
Invest money in getting the right tools which would cater to the needs
Monitoring
Alerting
Task management

‍

‍

Alerts

Setting up an alert mechanism automatically raises the flags for the team. Also, not everyone needs to be on these alerts, hence make a logical decision about who would be benefitting from the alerts system

Setup alerts
Email
Incident management software
Identify stakeholders who need to receive these alerts

Prepare

Defining strategies: who take action when things go wrong. This helps in avoiding chaotic situations, and the rest of the folks within the team can work on the solution instead
Ex: Identifying color codes for different severities (just like we have in hospitals)
Plan of Action for each severity‍
Not all situations are as severe as we think. Hence, it is important to set what action is needed for each of the severities.‍
Ops and dev teams should be tightly intertwined.

‍

Evaluate

Whenever we see a problem, we usually tend to jump to solutions. In my experience, it has always helped me to take some time and identify the answers to the following:

What is the issue?
The focus: problem
How severe?
Severity level and mentioned in the previous step
Who needs to be involved?
Not everyone within the team needs to be involved immediately to fix the problem; identifying who needs to be involved saves time for the rest of us.

‍

Resolve

There is a problem at hand, and the business and stakeholders expect a solution. As previously mentioned, keeping a cool head in this phase is of utmost importance.

Propose the best possible solution based on
Technical feasibility
Time
Cost
Business impact

Always have multiple solutions to gauge the trade-offs; some take lesser time but involve rework in the future. Make a logical decision based on the application and the nature of the problem.

Takeaways

In the discovery phase of the problem, keep the focus on the problem
Keep a crisp communication with the stakeholders, making them aware of the severity of the problem and assuring them about a steady solution.
In the mitigation phases, identify who needs to be involved in the problem resolution
Come up with multiple solutions to pick the most logical and efficient solution out of the lot.
Have tollgates in places to catch slippages at multiple levels. ‍
D.I.A.P.E.R framework‍
Design structured and centralized logs.‍
Invest time in setting up the process and invest money in getting the right tools for the team.‍
Alerts: Have a notification system in place, which shall raise flags when things go beyond a certain benchmark.‍
Prepare strategies for different severity levels and assign color codes for the course of action for each level of threat.‍
Evaluate the problem and the action via who, what, and how?‍
Resolution of the problem, which is cost and time efficient and aligns with the business goals/needs.

Remember that we are building the software for the people with the help of people within the team. Things go wrong even in the most elite systems with sophisticated setups.

Do not go harsh on yourself and others within the team. Adapt, learn, and keep shipping!

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

The Art of Release Management: Keys to a Seamless Rollout

Overview

Through this blog, I will walk you through:

Cures: the actions (or reactions!) from the first touchpoint of a software release gone haywire, scrutinizing it per user role in the software team.
Preventions: Later, I will introduce you to a framework that I devised by being part of numerous hiccups with the software releases, which eventually led me to strategize and correct the methodology for executing smoother releases.

Software release hiccups: cures

Production issues are painful. They suck out the energy and impact the software teams and, eventually, the business on different levels.

No system has ever been built foolproof, and there will always be occasions when things go wrong.

“It's not what happens to you but how you react to it that matters.”

- Epictetus

I have broken down the cures for a software release gone wrong into three phases:

1: Discovery phase

Getting into the right mindset

Just after the release, you start receiving alerts or user complaints about the issues they are facing with accessing the application.

This is the trickiest phase of them all. When a release goes wrong, it is a basic human emotion to find someone to blame or get defensive. But remember, the user is always right.

And this is the time for acceptance that there indeed is a problem with the application.

Keeping the focus on the problem that needs to be resolved helps to a quicker and more efficient resolution.

As a Business Analyst/Product/Project Manager, you can:

Handle the communications:

Keep the stakeholders updated at all the stages of problem-solving
Emails, root cause analysis [RCA] initiation
Product level executive decisions [rollback, feature flags, etc.]

As an engineer, you can:

Check the logs, because logs don’t lie
If the logs data is insufficient, check at a code level

As a QA, you can:

Replicate the issue (obviously!)
See what test cases missed the scenario and why
Was it an edge case?
Was it an environment-specific issue?

Even though I have separate actions per role stated above, most of these are interchangeable. More eyes and ears help for a swift recovery from a bad release.

2: Mitigation phase

Finding the most efficient solutions to the problem at hand

As a Business Analyst/Product/Project Manager, you can:

Have team session/s to come up with the best possible solutions.
Multiple solutions help to gauge the trade-offs and to make a wiser decision.
PMs can help with making logical business decisions and analyzing the impacts from the business POV.
Communicate the solutions and trade-offs, if needed, with stakeholders to have more visibility on the mindsets.

As an engineer, you can:

Check technical feasibility vs. complexity in terms of time vs. code repercussions to help with the decision-making with the solution.
Raise red flags upfront, keeping in mind what part of the current problem to avoid reoccurrence.
Avoid quick fixes as much as possible, even when there is pressure for getting the solutions in place.

As a QA, you can:

Focus on what might break with the proposed solution.
Make sure to run the test cases or modify the existing ones to accommodate the new changes.
Replicate the final environment and scenarios in the sandbox as much as possible.

3: Follow-ups and tollgates

Stop, check and go

Following are some of the tollgates within the release process:

Unit Tests

Are all the external dependencies accounted for within the test scenarios?
Maybe the root cause case wasn’t considered at all, so it was not initially tested?
Too much velocity and hence unit tests were ignored to an extent.
Avoid the world of quick fixes and workarounds as much as possible.

User Acceptance Testing [UAT]

Is the sandbox environment different than the actual live environment?
Have similar configurations for servers so that we are welcomed by surprises after a release.
User error
Some issues may have been slipped due to human errors.
Data quality issue
The type of data in sandbox vs live environments is different, which is not catching the issues in sandbox.

Software release hiccups: Preventions

Prevention is better than cure; yes, for sure, that sounds cool!

Now that we have seen how to tackle the releases gone wild, let me take you through the prevention part of the process.

Well, the following is not a checklist, per se, but a framework for us to identify the problems early in the software release and minimize them to some degree.

The D.I.A.P.E.R Framework

So that you don’t have to do a clean-up later!

This essentially is a set of six activities that should be in place as you are designing your software.

‍

Design

This is not the UI/UX of the app and relates to how the application logs should be maintained.

Structured logs

Logs in a readable and consistent format that monitors for errors.

Centralized logging

Logs in one place and accessible to all the devs, which can be queried easily for advanced metrics.
This removes the dependency on specific people within the team. The logs are not needed by everyone, but the point is multiple people having access to them helps within the team.

‍

Invest

Invest time in setting up processes
Software development
Release process/checklist
QA/UAT sign-offs
Invest money in getting the right tools which would cater to the needs
Monitoring
Alerting
Task management

‍

‍

Alerts

Setup alerts
Email
Incident management software
Identify stakeholders who need to receive these alerts

Prepare

Defining strategies: who take action when things go wrong. This helps in avoiding chaotic situations, and the rest of the folks within the team can work on the solution instead
Ex: Identifying color codes for different severities (just like we have in hospitals)
Plan of Action for each severity‍
Not all situations are as severe as we think. Hence, it is important to set what action is needed for each of the severities.‍
Ops and dev teams should be tightly intertwined.

‍

Evaluate

Whenever we see a problem, we usually tend to jump to solutions. In my experience, it has always helped me to take some time and identify the answers to the following:

What is the issue?
The focus: problem
How severe?
Severity level and mentioned in the previous step
Who needs to be involved?
Not everyone within the team needs to be involved immediately to fix the problem; identifying who needs to be involved saves time for the rest of us.

‍

Resolve

There is a problem at hand, and the business and stakeholders expect a solution. As previously mentioned, keeping a cool head in this phase is of utmost importance.

Propose the best possible solution based on
Technical feasibility
Time
Cost
Business impact

Always have multiple solutions to gauge the trade-offs; some take lesser time but involve rework in the future. Make a logical decision based on the application and the nature of the problem.

Takeaways

In the discovery phase of the problem, keep the focus on the problem
Keep a crisp communication with the stakeholders, making them aware of the severity of the problem and assuring them about a steady solution.
In the mitigation phases, identify who needs to be involved in the problem resolution
Come up with multiple solutions to pick the most logical and efficient solution out of the lot.
Have tollgates in places to catch slippages at multiple levels. ‍
D.I.A.P.E.R framework‍
Design structured and centralized logs.‍
Invest time in setting up the process and invest money in getting the right tools for the team.‍
Alerts: Have a notification system in place, which shall raise flags when things go beyond a certain benchmark.‍
Prepare strategies for different severity levels and assign color codes for the course of action for each level of threat.‍
Evaluate the problem and the action via who, what, and how?‍
Resolution of the problem, which is cost and time efficient and aligns with the business goals/needs.

Remember that we are building the software for the people with the help of people within the team. Things go wrong even in the most elite systems with sophisticated setups.

Do not go harsh on yourself and others within the team. Adapt, learn, and keep shipping!

software development

release management

diaper framework

About the Author

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

Velotio Technologies is an outsourced software product development partner for top technology startups and enterprises. We partner with companies to design, develop, and scale their products. Our work has been featured on TechCrunch, Product Hunt and more.

We have partnered with our customers to built 90+ transformational products in areas of edge computing, customer data platforms, exascale storage, cloud-native platforms, chatbots, clinical trials, healthcare and investment banking.

Since our founding in 2016, our team has completed more than 90 projects with 220+ employees across the following areas:

Building web/mobile applications
Architecting Cloud infrastructure and Data analytics platforms
Designing AI/ML-based solutions
Intelligent Chatbots

Talk to us

The Art of Release Management: Keys to a Seamless Rollout

Bharat Barve

Overview

Software release hiccups: cures

1: Discovery phase

2: Mitigation phase

3: Follow-ups and tollgates

Software release hiccups: Preventions

The D.I.A.P.E.R Framework

Takeaways

MORE POSTS BY THIS AUTHOR

Bharat Barve

You may also like

The Art of Release Management: Keys to a Seamless Rollout

Overview

Software release hiccups: cures

1: Discovery phase

2: Mitigation phase

3: Follow-ups and tollgates

Software release hiccups: Preventions

The D.I.A.P.E.R Framework

Takeaways

About the Author

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

About Velotio

Subscribe to get the latest technology updates

Related Posts

Services

By Company Stage

By Engagement Model

Expertise

Product Engineering

Data and AI

Cloud & DevOps

Strategy and Consulting