How To Implement Chaos Engineering For Microservices Using Istio

Prafull Ladha

Cloud & DevOps

Tags:

microservices

kubernetes

Istio

chaos engineering

Failure Testing

“Embrace Failures. Chaos and failures are your friends, not enemies.” A microservice ecosystem is going to fail at some point. The issue is not if you fail, but when you fail, will you notice or not. It’s between whether it will affect your users because all of your services are down, or it will affect only a few users and you can fix it at your own time.

Chaos Engineering is a practice to intentionally introduce faults and failures into your microservice architecture to test the resilience and stability of your system. Istio can be a great tool to do so. Let's have a look at how Istio made it easy.

For more information on how to setup Istio and what are virtual service and Gateways, please have a look at the following blog, how to setup Istio on GKE.

Fault Injection With Istio

Fault injection is a testing method to introduce errors into your microservice architecture to ensure it can withstand the error conditions. Istio lets you injects errors at HTTP layer instead of delaying the packets or killing the pods at network layer. This way, you can generate various types of HTTP error codes and test the reaction of your services under those conditions.

Generating HTTP 503 Error

Here we see that two pods are running two different versions of recommendation service using the recommended tutorial while installing the sample application.

Currently, the traffic on the recommendation service is automatically load balanced between those two pods.

CODE: https://gist.github.com/velotiotech/c255f6c52bf7cec88a693f8d52c8c9d4.js

Now let's apply a fault injection using virtual service which will send 503 HTTP error codes in 30% of the traffic serving the above pods.

CODE: https://gist.github.com/velotiotech/165e3bbf3030eb267e7d2769ccd30328.

To test whether it is working, check the output from the curl of customer service microservice endpoint.

You will find the 503 error on approximately 30% of the request coming to recommendation service.

To restore normal operation, please delete the above virtual service using:‍

CODE: https://gist.github.com/velotiotech/b2d39ff496247b3fb2c7437208882049.js

Delay

‍The most common failure we see in production is not the down service, rather a delay service. To inject network latency as a chaos experiment, you can create another virtual service. Sometimes, it happens that your application doesn’t respond on time and creates chaos in the complete ecosystem. How to simulate that behavior, let's have a look.‍

CODE: https://gist.github.com/velotiotech/deb94b26a9cfe34194ee0bc8d8d8a58c.

Now, if you hit the URL of endpoints of the above service in a loop, you will see the delays in some of the requests.

Retry‍

In some of the production services, we expect that instead of failing instantly, it should retry N number of times to get the desired output. If not succeeded, then only a request should be considered as failed.

For that mechanism, you can insert retries on those services as follows:‍

CODE: https://gist.github.com/velotiotech/9608a8fd1c711937c23a7703f3184b58.

Now any request coming to recommendation will do 3 attempts before considering it as failed.‍

Timeout‍

In the real world, an application faces most failures due to timeouts. It can be because of more load on the application or any other latency in serving the request. Your application should have proper timeouts defined, before declaring any request as "Failed". You can use Istio to simulate the timeout mechanism and give our application a limited amount of time to respond before giving up.

Wait only for N seconds before failing and giving up.

CODE: https://gist.github.com/velotiotech/2740a385c6a044918b3c9f31e14bb07b.js

Conclusion‍

Istio lets you inject faults at the HTTP layer for your application and improves its resilience and stability. But, the application must handle the failures and take appropriate course of action. Chaos Engineering is only effective when you know your application can take failures, otherwise, there is no point in testing for chaos if you know your application is definitely broken.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

How To Implement Chaos Engineering For Microservices Using Istio

For more information on how to setup Istio and what are virtual service and Gateways, please have a look at the following blog, how to setup Istio on GKE.

Fault Injection With Istio

Generating HTTP 503 Error

Here we see that two pods are running two different versions of recommendation service using the recommended tutorial while installing the sample application.

Currently, the traffic on the recommendation service is automatically load balanced between those two pods.

CODE: https://gist.github.com/velotiotech/c255f6c52bf7cec88a693f8d52c8c9d4.js

Now let's apply a fault injection using virtual service which will send 503 HTTP error codes in 30% of the traffic serving the above pods.

CODE: https://gist.github.com/velotiotech/165e3bbf3030eb267e7d2769ccd30328.

To test whether it is working, check the output from the curl of customer service microservice endpoint.

You will find the 503 error on approximately 30% of the request coming to recommendation service.

To restore normal operation, please delete the above virtual service using:‍

CODE: https://gist.github.com/velotiotech/b2d39ff496247b3fb2c7437208882049.js

Delay

CODE: https://gist.github.com/velotiotech/deb94b26a9cfe34194ee0bc8d8d8a58c.

Now, if you hit the URL of endpoints of the above service in a loop, you will see the delays in some of the requests.

Retry‍

For that mechanism, you can insert retries on those services as follows:‍

CODE: https://gist.github.com/velotiotech/9608a8fd1c711937c23a7703f3184b58.

Now any request coming to recommendation will do 3 attempts before considering it as failed.‍

Timeout‍

Wait only for N seconds before failing and giving up.

CODE: https://gist.github.com/velotiotech/2740a385c6a044918b3c9f31e14bb07b.js

Conclusion‍

About the Author

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

How To Implement Chaos Engineering For Microservices Using Istio

Prafull Ladha

Fault Injection With Istio

Generating HTTP 503 Error

Delay

Retry‍

Timeout‍

Conclusion‍

MORE POSTS BY THIS AUTHOR

Prafull Ladha

You may also like

Shebang Your Shell Commands with GenAI using AWS Bedrock

Sagar Barai

🐉 Taming the OpenStack Beast – A Fun & Easy Guide!

Shruti Anekar

Linux Internals of Kubernetes Networking

Shiwam Jaiswal

How To Implement Chaos Engineering For Microservices Using Istio

Fault Injection With Istio

Generating HTTP 503 Error

Delay

Retry‍

Timeout‍

Conclusion‍

About the Author

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

About Velotio

Subscribe to get the latest technology updates

Related Posts

Services

By Company Stage

By Engagement Model

Expertise

Product Engineering

Data and AI

Cloud & DevOps

Strategy and Consulting

Velotio is now R Systems

Subscribe to get the latest technology updates

How To Implement Chaos Engineering For Microservices Using Istio

Prafull Ladha

Fault Injection With Istio

Generating HTTP 503 Error

Delay

Retry‍

Timeout‍

Conclusion‍

MORE POSTS BY THIS AUTHOR

Prafull Ladha

You may also like

Shebang Your Shell Commands with GenAI using AWS Bedrock

Sagar Barai

🐉 Taming the OpenStack Beast – A Fun & Easy Guide!

Shruti Anekar

Linux Internals of Kubernetes Networking

Shiwam Jaiswal

How To Implement Chaos Engineering For Microservices Using Istio

Fault Injection With Istio

Generating HTTP 503 Error

Delay

Retry‍

Timeout‍

Conclusion‍

About the Author

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

About Velotio

Subscribe to get the latest technology updates

Related Posts

Shebang Your Shell Commands with GenAI using AWS Bedrock

🐉 Taming the OpenStack Beast – A Fun & Easy Guide!

Linux Internals of Kubernetes Networking

Strategies for Cost Optimization Across Amazon EKS Clusters

Mastering Prow: A Guide to Developing Your Own Plugin for Kubernetes CI/CD Workflow

Simplifying MySQL Sharding with ProxySQL: A Step-by-Step Guide

Streamline Kubernetes Storage Upgrades

Unlocking Key Insights in NATS Development: My Journey from Novice to Expert - Part 1

Unveiling the Magic of Kubernetes: Exploring Pod Priority, Priority Classes, and Pod Preemption

How to deploy GitHub Actions Self-Hosted Runners on Kubernetes

How to Setup HashiCorp Vault HA Cluster with Integrated Storage (Raft)

How To Get Started With Logging On Kubernetes?

Create CI/CD Pipeline in GitLab in under 10 mins

Acquiring Temporary AWS Credentials with Browser Navigated Authentication

How to Avoid Screwing Up CI/CD: Best Practices for DevOps Team

How to Make Your Terminal More Productive with Z-Shell (ZSH)

Setting Up A Robust Authentication Environment For OpenSSH Using QR Code PAM

Hacking Your Way Around AWS IAM Roles

Monitoring a Docker Container with Elasticsearch, Kibana, and Metricbeat

Autoscaling in Kubernetes using HPA and VPA

Managing a TLS Certificate for Kubernetes Admission Webhook

Prow + Kubernetes - A Perfect Combination To Execute CI/CD At Scale

Building A Containerized Microservice in Golang: A Step-by-step Guide

Kubernetes Migration: How To Move Data Freely Across Clusters

OPA On Kubernetes: An Introduction For Beginners

To Go Serverless Or Not Is The Question

Ensure Continuous Delivery On Kubernetes With GitOps’ Argo CD

Helm 3: A More Secured and Simpler Kubernetes Package Manager

An Introduction To Cloudflare Workers And Cloudflare KV store

Getting Started With Kubernetes Operators (Golang Based) - Part 3

Getting Started With Kubernetes Operators (Ansible Based) - Part 2

Getting Started With Kubernetes Operators (Helm Based) - Part 1

How to Write Jenkinsfile for Angular and .Net Based Applications

Kubernetes CSI in Action: Explained with Features and Use Cases

A Comprehensive Tutorial to Implementing OpenTracing With Jaeger

The Ultimate Guide to Disaster Recovery for Your Kubernetes Clusters

Know Everything About Spinnaker & How to Deploy Using Kubernetes Engine

Mesosphere DC/OS Masterclass : Tips and Tricks to Make Life Easier

Managing Secrets Using AWS Systems Manager Parameter Store and IAM Roles

Taking Amazon's Elastic Kubernetes Service for a Spin

Extending Kubernetes APIs with Custom Resource Definitions (CRDs)

Jenkins X - A Cloud-native Approach to CI/CD

Demystifying High Availability in Kubernetes Using Kubeadm

Exploring Upgrade Strategies for Stateful Sets in Kubernetes

Learn How to Quickly Setup Istio Using GKE and its Applications

Continuous Deployment with Azure Kubernetes Service, Azure Container Registry & Jenkins

Tutorial: Developing Complex Plugins for Jenkins

A Practical Guide to Deploying Multi-tier Applications on Google Container Engine (GKE)