Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

How to Avoid Screwing Up CI/CD: Best Practices for DevOps Team

Basic Fundamentals (One-line definition) :

CI/CD is defined as continuous integration, continuous delivery, and/or continuous deployment. 

Continuous Integration: 

Continuous integration is defined as a practice where a developer’s changes are merged back to the main branch as soon as possible to avoid facing integration challenges.

Continuous Delivery:

Continuous delivery is basically the ability to get all the types of changes deployed to production or delivered to the customer in a safe, quick, and sustainable way.

An oversimplified CI/CD pipeline

Why CI/CD?

  • Avoid integration hell

In most modern application development scenarios, multiple developers work on different features simultaneously. However, if all the source code is to be merged on the same day, the result can be a manual, tedious process of resolving conflicts between branches, as well as a lot of rework.  

Continuous integration (CI) is the process of merging the code changes frequently (can be daily or multiple times a day also) to a shared branch (aka master or truck branch). The CI process makes it easier and quicker to identify bugs, saving a lot of developer time and effort.

  • Faster time to market

Less time is spent on solving integration problems and reworking, allowing faster time to market for products.

  • Have a better and more reliable code

The changes are small and thus easier to test. Each change goes through a rigorous cycle of unit tests, integration/regression tests, and performance tests before being pushed to prod, ensuring a better quality code.  

  • Lower costs 

As we have a faster time to market and fewer integration problems,  a lot of developer time and development cycles are saved, leading to a lower cost of development.

Enough theory now, let’s dive into “How do I get started ?”

Basic Overview of CI/CD

Decide on your branching strategy

A good branching strategy should have the following characteristics:

  • Defines a clear development process from initial commit to production deployment
  • Enables parallel development
  • Optimizes developer productivity
  • Enables faster time to market for products and services
  • Facilitates integration with all DevOps practices and tools such as different versions of control systems

Types of branching strategies (please refer to references for more details) :

  • Git flow – Ideal when handling multiple versions of the production code and for enterprise customers who have to adhere to release plans and workflows 
  • Trunk-based development – Ideal for simpler workflows and if automated testing is available, leading to a faster development time
  • Other branching strategies that you can read about are Github flow, Gitlab flow, and Forking flow.

Build or compile your code 

The next step is to build/compile your code, and if it is interpreted code, go ahead and package it.

Build best practices :

  • Build Once - Building the same artifact for multiple env is inadvisable.
  • Exact versions of third-party dependencies should be used.
  • Libraries used for debugging, etc., should be removed from the product package.
  • Have a feedback loop so that the team is made aware of the status of the build step.
  • Make sure your builds are versioned correctly using semver 2.0 (https://semver.org/).
  • Commit early, commit often.

Select tool for stitching the pipeline together

  • You can choose from GitHub actions, Jenkins, circleci, GitLab, etc.
  • Tool selection will not affect the quality of your CI/CD pipeline but might increase the maintenance if we go for managed CI/CD services as opposed to services like Jenkins deployed onprem. 

Tools and strategy for SAST

Instead of just DevOps, we should think of devsecops. To make the code more secure and reliable, we can introduce a step for SAST (static application security testing).

SAST, or static analysis, is a testing procedure that analyzes source code to find security vulnerabilities. SAST scans the application code before the code is compiled. It’s also known as white-box testing, and it helps shift towards a security-first mindset as the code is scanned right at the start of SDLC.

Problems SAST solves:

  • SAST tools give developers real-time feedback as they code, helping them fix issues before they pass the code to the next phase of the SDLC. 
  • This prevents security-related issues from being considered an afterthought. 

Deployment strategies

How will you deploy your code with zero downtime so that the customer has the best experience? Try and implement one of the strategies below automatically via CI/CD. This will help in keeping the blast radius to the minimum in case something goes wrong. 

  • Ramped (also known as rolling-update or incremental): The new version is slowly rolled out to replace the older version of the product .
  • Blue/Green: The new version is released alongside the older version, then the traffic is switched to the newer version.
  • Canary: The new version is released to a selected group of users before doing  a full rollout. This can be achieved by feature flagging as well. For more information, read about tools like launch darkly(https://launchdarkly.com/) and git unleash (https://github.com/Unleash/unleash). 
  • A/B testing: The new version is released to a subset of users under specific conditions.
  • Shadow: The new version receives real-world traffic alongside the older version and doesn’t impact the response.

Config and Secret Management

According to the 12-factor app, application configs should be exposed to the application with environment variables. However, it does not have restrictions on where these configurations need to be stored and sourced from.

A few things to keep in mind while storing configs.

  • Versioning of configs always helps, but storing secrets in VCS is strongly discouraged.
  • For an enterprise, it is beneficial to use a cloud-agnostic solution.

Solution:

  • Store your configuration secrets outside of the version control system.
  • You can use AWS secret manager, Vault, and even S3 for storing your configs, e.g.: S3 with KMS, etc. There are other services available as well, so choose the one which suits your use case the best.

Automate versioning and release notes generation

All the releases should be tagged in the version control system. Versions can be automatically updated by looking at the git commit history and searching for keywords.

There are many modules available for release notes generation. Try and automate these as well as a part of your CI/CD process. If this is done, you can successfully eliminate human intervention from the release process.

Example from GitHub actions workflow :

CODE: https://gist.github.com/amanpruthi/f53d3c33b7d714d3da9acb66a6df7fdb.js

Have a rollback strategy

In case of regression, performance, or smoke test fails after deployment onto an environment, feedback should be given and the version should be rolled back automatically as a part of the CI/CD process. This makes sure that the environment is up and also reduces the MTTR (mean time to recovery), and MTTD (mean time to detection) in case there is a production outage due to code deployment.

GitOps tools like argocd and flux make it easy to do things like this, but even if you are not using any of the GitOps tools, this can be easily managed using scripts or whatever tool you are using for deployment.

Include db changes as a part of your CI/CD

Databases are often created manually and frequently evolve through manual changes, informal processes, and even testing in production. Manual changes often lack documentation and are harder to review, test, and coordinate with software releases. This makes the system more fragile with a higher risk of failure.

The correct way to do this is to include the database in source control and CI/CD pipeline. This lets the team document each change, follow the code review process, test it thoroughly before release, make rollbacks easier, and coordinate with software releases. 

For a more enterprise or structured solution, we could use a tool such as Liquibase, Alembic, or Flyway.

How it should ideally be done:

  • We can have a migration-based strategy where, for each DB change, an additional migration script is added and is executed as a part of CI/CD .
  • Things to keep in mind are that the CI/CD process should be the same across all the environments. Also, the amount of data on prod and other environments might vary drastically, so batching and limits should be used so that we don't end up using all the memory of our database server.
  • As far as possible, DB migrations should be backward compatible. This makes it easier for rollbacks. This is the reason some companies only allow additive changes as a part of db migration scripts. 

Real-world scenarios

  • Gated approach 

It is not always possible to have a fully automated CI/CD pipeline because the team may have just started the development of a product and might not have automated testing yet.

So, in cases like these, we have manual gates that can be approved by the responsible teams. For example, we will deploy to the development environment and then wait for testers to test the code and approve the manual gate, then the pipeline can go forward.

Most of the tools support these kinds of requests. Make sure that you are not using any kind of resources for this step otherwise you will end up blocking resources for the other pipelines.

Example:

https://www.jenkins.io/doc/pipeline/steps/pipeline-input-step/#input-wait-for-interactive-input

CODE: https://gist.github.com/amanpruthi/8b16d9054731dd602e4257419f47dcfb.js

Observability of releases 

Whenever we are debugging the root cause of issues in production, we might need the information below. As the system gets more complex with multiple upstreams and downstream, it becomes imperative that we have this information, all in one place, for efficient debugging and support by the operations team.

  • When was the last deployment? What version was deployed?
  • The deployment history as to which version was deployed when along with the code changes that went in.

Below are the 2 ways generally organizations follow to achieve this:

  • Have a release workflow that is tracked using a Change request or Service request on Jira or any other tracking tool.
  • For GitOps applications using tools like Argo CD and flux, all this information is available as a part of the version control system and can be derived from there.

DORA metrics 

DevOps maturity of a team is measured based on mainly four metrics that are defined below, and CI/CD helps in improving all of the below. So, teams and organizations should try and achieve the Elite status for DORA metrics.

  • Deployment Frequency— How often an org successfully releases to production
  • Lead Time for Changes— The amount of time a commit takes to get into prod
  • Change Failure Rate— The percentage of deployments causing a failure in prod
  • Time to Restore Service— How long an org takes to recover from a failure in prod

Conclusion 

CI/CD forms an integral part of DevOps and SRE practices, and if done correctly,  it can impact the team’s and organization’s productivity in a huge way. 

So, try and implement the above principles and get one step closer to having a highly productive team and a better product.

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

How to Avoid Screwing Up CI/CD: Best Practices for DevOps Team

Basic Fundamentals (One-line definition) :

CI/CD is defined as continuous integration, continuous delivery, and/or continuous deployment. 

Continuous Integration: 

Continuous integration is defined as a practice where a developer’s changes are merged back to the main branch as soon as possible to avoid facing integration challenges.

Continuous Delivery:

Continuous delivery is basically the ability to get all the types of changes deployed to production or delivered to the customer in a safe, quick, and sustainable way.

An oversimplified CI/CD pipeline

Why CI/CD?

  • Avoid integration hell

In most modern application development scenarios, multiple developers work on different features simultaneously. However, if all the source code is to be merged on the same day, the result can be a manual, tedious process of resolving conflicts between branches, as well as a lot of rework.  

Continuous integration (CI) is the process of merging the code changes frequently (can be daily or multiple times a day also) to a shared branch (aka master or truck branch). The CI process makes it easier and quicker to identify bugs, saving a lot of developer time and effort.

  • Faster time to market

Less time is spent on solving integration problems and reworking, allowing faster time to market for products.

  • Have a better and more reliable code

The changes are small and thus easier to test. Each change goes through a rigorous cycle of unit tests, integration/regression tests, and performance tests before being pushed to prod, ensuring a better quality code.  

  • Lower costs 

As we have a faster time to market and fewer integration problems,  a lot of developer time and development cycles are saved, leading to a lower cost of development.

Enough theory now, let’s dive into “How do I get started ?”

Basic Overview of CI/CD

Decide on your branching strategy

A good branching strategy should have the following characteristics:

  • Defines a clear development process from initial commit to production deployment
  • Enables parallel development
  • Optimizes developer productivity
  • Enables faster time to market for products and services
  • Facilitates integration with all DevOps practices and tools such as different versions of control systems

Types of branching strategies (please refer to references for more details) :

  • Git flow – Ideal when handling multiple versions of the production code and for enterprise customers who have to adhere to release plans and workflows 
  • Trunk-based development – Ideal for simpler workflows and if automated testing is available, leading to a faster development time
  • Other branching strategies that you can read about are Github flow, Gitlab flow, and Forking flow.

Build or compile your code 

The next step is to build/compile your code, and if it is interpreted code, go ahead and package it.

Build best practices :

  • Build Once - Building the same artifact for multiple env is inadvisable.
  • Exact versions of third-party dependencies should be used.
  • Libraries used for debugging, etc., should be removed from the product package.
  • Have a feedback loop so that the team is made aware of the status of the build step.
  • Make sure your builds are versioned correctly using semver 2.0 (https://semver.org/).
  • Commit early, commit often.

Select tool for stitching the pipeline together

  • You can choose from GitHub actions, Jenkins, circleci, GitLab, etc.
  • Tool selection will not affect the quality of your CI/CD pipeline but might increase the maintenance if we go for managed CI/CD services as opposed to services like Jenkins deployed onprem. 

Tools and strategy for SAST

Instead of just DevOps, we should think of devsecops. To make the code more secure and reliable, we can introduce a step for SAST (static application security testing).

SAST, or static analysis, is a testing procedure that analyzes source code to find security vulnerabilities. SAST scans the application code before the code is compiled. It’s also known as white-box testing, and it helps shift towards a security-first mindset as the code is scanned right at the start of SDLC.

Problems SAST solves:

  • SAST tools give developers real-time feedback as they code, helping them fix issues before they pass the code to the next phase of the SDLC. 
  • This prevents security-related issues from being considered an afterthought. 

Deployment strategies

How will you deploy your code with zero downtime so that the customer has the best experience? Try and implement one of the strategies below automatically via CI/CD. This will help in keeping the blast radius to the minimum in case something goes wrong. 

  • Ramped (also known as rolling-update or incremental): The new version is slowly rolled out to replace the older version of the product .
  • Blue/Green: The new version is released alongside the older version, then the traffic is switched to the newer version.
  • Canary: The new version is released to a selected group of users before doing  a full rollout. This can be achieved by feature flagging as well. For more information, read about tools like launch darkly(https://launchdarkly.com/) and git unleash (https://github.com/Unleash/unleash). 
  • A/B testing: The new version is released to a subset of users under specific conditions.
  • Shadow: The new version receives real-world traffic alongside the older version and doesn’t impact the response.

Config and Secret Management

According to the 12-factor app, application configs should be exposed to the application with environment variables. However, it does not have restrictions on where these configurations need to be stored and sourced from.

A few things to keep in mind while storing configs.

  • Versioning of configs always helps, but storing secrets in VCS is strongly discouraged.
  • For an enterprise, it is beneficial to use a cloud-agnostic solution.

Solution:

  • Store your configuration secrets outside of the version control system.
  • You can use AWS secret manager, Vault, and even S3 for storing your configs, e.g.: S3 with KMS, etc. There are other services available as well, so choose the one which suits your use case the best.

Automate versioning and release notes generation

All the releases should be tagged in the version control system. Versions can be automatically updated by looking at the git commit history and searching for keywords.

There are many modules available for release notes generation. Try and automate these as well as a part of your CI/CD process. If this is done, you can successfully eliminate human intervention from the release process.

Example from GitHub actions workflow :

CODE: https://gist.github.com/amanpruthi/f53d3c33b7d714d3da9acb66a6df7fdb.js

Have a rollback strategy

In case of regression, performance, or smoke test fails after deployment onto an environment, feedback should be given and the version should be rolled back automatically as a part of the CI/CD process. This makes sure that the environment is up and also reduces the MTTR (mean time to recovery), and MTTD (mean time to detection) in case there is a production outage due to code deployment.

GitOps tools like argocd and flux make it easy to do things like this, but even if you are not using any of the GitOps tools, this can be easily managed using scripts or whatever tool you are using for deployment.

Include db changes as a part of your CI/CD

Databases are often created manually and frequently evolve through manual changes, informal processes, and even testing in production. Manual changes often lack documentation and are harder to review, test, and coordinate with software releases. This makes the system more fragile with a higher risk of failure.

The correct way to do this is to include the database in source control and CI/CD pipeline. This lets the team document each change, follow the code review process, test it thoroughly before release, make rollbacks easier, and coordinate with software releases. 

For a more enterprise or structured solution, we could use a tool such as Liquibase, Alembic, or Flyway.

How it should ideally be done:

  • We can have a migration-based strategy where, for each DB change, an additional migration script is added and is executed as a part of CI/CD .
  • Things to keep in mind are that the CI/CD process should be the same across all the environments. Also, the amount of data on prod and other environments might vary drastically, so batching and limits should be used so that we don't end up using all the memory of our database server.
  • As far as possible, DB migrations should be backward compatible. This makes it easier for rollbacks. This is the reason some companies only allow additive changes as a part of db migration scripts. 

Real-world scenarios

  • Gated approach 

It is not always possible to have a fully automated CI/CD pipeline because the team may have just started the development of a product and might not have automated testing yet.

So, in cases like these, we have manual gates that can be approved by the responsible teams. For example, we will deploy to the development environment and then wait for testers to test the code and approve the manual gate, then the pipeline can go forward.

Most of the tools support these kinds of requests. Make sure that you are not using any kind of resources for this step otherwise you will end up blocking resources for the other pipelines.

Example:

https://www.jenkins.io/doc/pipeline/steps/pipeline-input-step/#input-wait-for-interactive-input

CODE: https://gist.github.com/amanpruthi/8b16d9054731dd602e4257419f47dcfb.js

Observability of releases 

Whenever we are debugging the root cause of issues in production, we might need the information below. As the system gets more complex with multiple upstreams and downstream, it becomes imperative that we have this information, all in one place, for efficient debugging and support by the operations team.

  • When was the last deployment? What version was deployed?
  • The deployment history as to which version was deployed when along with the code changes that went in.

Below are the 2 ways generally organizations follow to achieve this:

  • Have a release workflow that is tracked using a Change request or Service request on Jira or any other tracking tool.
  • For GitOps applications using tools like Argo CD and flux, all this information is available as a part of the version control system and can be derived from there.

DORA metrics 

DevOps maturity of a team is measured based on mainly four metrics that are defined below, and CI/CD helps in improving all of the below. So, teams and organizations should try and achieve the Elite status for DORA metrics.

  • Deployment Frequency— How often an org successfully releases to production
  • Lead Time for Changes— The amount of time a commit takes to get into prod
  • Change Failure Rate— The percentage of deployments causing a failure in prod
  • Time to Restore Service— How long an org takes to recover from a failure in prod

Conclusion 

CI/CD forms an integral part of DevOps and SRE practices, and if done correctly,  it can impact the team’s and organization’s productivity in a huge way. 

So, try and implement the above principles and get one step closer to having a highly productive team and a better product.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings