Circuit Breaker: How to Keep One Failure from Taking Down Everything

Written by: Doug Tidwell
7 min read

Editor’s Note: This blog post is the fourth in a series of design patterns for continuous everything used in software development. It’s modeled after the Gang of Four format for object-oriented design patterns.

Name

Circuit Breaker

Intent

The realm of continuous everything (integration | testing | delivery | deployment | analytics | governance) is huge and involves a myriad of systems that need to talk to each other over well-defined interfaces. Due to the overwhelming size and complexity of this domain, continuous everything pipelines experience a wide variety of failures that range from network glitches to infrastructural challenges to misconfigurations. To guarantee operational excellence of continuous everything pipelines, these failures need to be monitored and smartly routed.

Enter circuit breakers! Instead of blindly retrying, circuit breakers monitor for failures and once a certain threshold has been reached, they trip the circuit, which means, after a pre-configured number of failed requests within a stipulated time-frame, the circuit breaker state is transitioned from Closed to Open. More on these states later in the “Motivation” section. Further invocations return without actually making the invocation at all, thereby preventing multiple subsystems from waiting on a hung service.

Let’s review a few scenarios on how circuit breakers make continuous everything pipelines more resilient.

Scenario 1

Pipelines interact with dashboards, like Sumo Logic or ELK Stack to log key performance indicators (KPIs) that enable teams to make informed decisions. If the dashboards or the dashboard connectors malfunction, the circuit breaker trips to Open. Pipelines continue to operate normally, since customers prefer to get bug fixes (and sometimes new features too) as soon as they are available.

Note that this is not to undermine the value of KPIs, but to underscore the need to maintain a regular release cadence.

Scenario 2

Pipelines read static code analysis results from external systems like Sonar, Coverity, Anchore, Code Climate, PMD, FindBugs and others, to assess code quality and to determine whether the code should be promoted from one pipeline stage to the next. If calls to the static analyzer are dysfunctional for some time, circuit breakers can prevent pipelines from making repeated calls that would most likely error out.

Again, the goal is to maintain a regular release cadence rather than undermine static code analysis.

Scenario 3

While applying machine learning to pipeline data, teams tend to:

  • Experiment with both OSS (open-source software) and commercial machine learning solutions

  • Assess whether to go with black-box solutions where the models can not be tweaked, or with configurable approaches that teams can fine-tune per their need.

While these experiments go on, models tend to fail or report inaccurate results for some time. Some pipelines are designed to make asynchronous (i.e. non-blocking) calls to mitigate this problem. But the problem still remains. For pipelines that make synchronous (i.e. blocking) calls to read model data, circuit breakers can prevent too many failed calls until the models stabilize.

These three scenarios only scratch the surface of how circuit breakers can be effectively applied to continuous everything. Since pipelines integrate with a multitude of external systems, there are many more scenarios worth exploring.

Motivation

Pipelines don’t operate on islands of their own. To successfully release high-quality, secure versioned artifacts from development/test to production, they interface with various external systems, some of which are:

  • Version control systems like GitHub, GitLab and Atlassian Bitbucket

  • Artifact repositories like JFrog Artifactory, Sonatype Nexus Repository and Amazon S3

  • Change management systems like Atlassian Jira

  • PaaS (platform as a service) like Cloud Foundry and Heroku

  • IaaS (infrastructure as a service) like Amazon Web Services, Microsoft Azure, Google Cloud Platform, and others.

When these external systems fail or hang for an unpredictably long time, commit-based pipelines continue to make pointless calls for every commit. Retries can help when the problem is an intermittent network glitch, but for genuine issues, careless retries can inflate the cycle time. This defeats the primary goal of pipelines: to get fast feedback.

Without circuit breakers, these remote calls can continue to fail or hang, causing critical and cascading failures, thereby choking releases. The circuit breaker pattern follows the states of an electrical circuit breaker closely and so, there are three states in the state machine:

  • Closed: The request is routed. A pre-configured number of failed requests within a stipulated time-frame trips the circuit breaker state to Open.

  • Open: The request fails with an exception immediately, that is, without contacting the remote service.

  • Half-Open: This state is also referred to as half-closed, depending on what kind of day you are having. When the circuit breaker is half-open (or, half-closed), some requests are routed. If those handful of routed requests are successful, the state changes to Closed, whereby operations return to normal. If those requests fail, then the circuit breaker concludes that the failure is persisting and hence reverts to the Open state.

Applicability

When there’s a high probability that a remote call to a service or a shared resource will fail, hang or timeout, circuit breakers make sense because they monitor for failures and trip the circuit once a certain threshold has been reached. A tripped circuit prevents the call from happening at all until the problem is fixed. Since continuous everything interfaces with a plethora of systems, circuit breakers are universally applicable in this domain.

Moreover, continuous everything principles apply not just to software but also to firmware, embedded systems, hardware and IoT (internet of things). Organizations of all shapes and sizes, from startups to highly regulated enterprises, are betting on digital transformation initiatives. And not just the traditional tech unicorns - companies from all industry sectors like healthcare, government, transportation, banking, hospitality and others are leveraging technology as a differentiator to outmaneuver their competition.

The bottom line is - irrespective of where your teams are in the continuous paradigm, smart techniques like circuit breakers boost productivity and give them a much-needed edge.

Consequences

Circuit breakers can be mistakenly used as substitutes for exception handlers in the business logic of applications.

Also, the circuit breaker pattern is sometimes confused with the retry pattern. Remember, the retry pattern makes the call with the expectation that the call will succeed, while the circuit breaker prevents invocation with the expectation that it would most likely fail.

Implementation

Netflix has a popular implementation of the circuit breaker pattern named the HystrixCircuitBreaker . To date, it has three methods -- allowRequest(), isOpen() and markSuccess() -- and stops allowing executions if failures have gone past the defined threshold. It then allows single retries after a defined sleepWindow until the execution succeeds, at which point it closes the circuit and allows executions again.

Known Uses

Circuit breakers are popular in the industry and help boost resilience. Here are a couple of case studies.

  • This Spring guide from Pivotal describes building a microservice application that uses the Circuit Breaker pattern to gracefully degrade functionality when a method call fails. Use of the Circuit Breaker pattern can allow a microservice to continue operating when a related service fails, preventing the failure from cascading and giving the failing service time to recover.

  • This blog post is an experience report of work that was done for Key Bank and that uses a circuit breaker.

Health endpoint monitoring

References

These articles go deep into how circuit breakers work.

Additional resources

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.