Cinema Examples

cinema-1: Basic Cinema Example

Description

The first example, which we are calling cinema-1, comes from a tutorial on building microservices in Flask. The cinema-1 example contains 4 services: bookings, movies, showtimes, and users, where:

  • bookings: contains information about bookings for each client;

  • movies: contains detailed information about each movie;

  • showtimes: contains information about shows for each date; and

  • users: that aggregates information about movies that a particular user has bookings for.

_images/cinema-1.jpg

Fault Analysis

With cinema-1, the most interesting service to consider is the users service, as it has a dependency on both the bookings and movies services operating correctly. For users to retrieve a user’s bookings information, it first retrieves information from the bookings service and then retrieves information from the movies service based on the information retrieved from the bookings service. If no information can be retrieved from the bookings service (the user doesn’t have any movie bookings or that the call failed/timed out/errored), no call is made to the movies service. This is a simple example of one service calling multiple services in a sequence. There is a dependency among the calls: if one call fails, any subsequent calls will not be issued.

Filibuster Analysis

In this pattern, let \(k_i\) be the number of different ways that the i-th service can fail, and let \(n\) be the total number of calls made. Using this, the number of tests that Filibuster should execute should be: \(k_1 + k_2 + ... + k_n\).

As bookings can fail 5 ways, and movies can fail 4 ways, the number of tests we expect is \(1 + 5 + 4 = 10\). This is the exact number executed by Filibuster.

cinema-2: Nesting Service Calls

Description

In this example, cinema-1 is reorganized so that users first calls bookings and then bookings calls movies.

_images/cinema-2.jpg

Fault Analysis

This is an example of a chain of calls. The first call triggers the second call and results are forwarded back.

Filibuster Analysis

In this pattern, let \(n\) be the number of edges (calls) in the chain, and let \(k_i\) be the number of different ways that the \(i\)-th service can fail. This means that the number of tests executed by Filibuster should be \(k_1 + k_2 + ... + k_n\). As bookings can fail 5 ways, and movies can fail 4 ways, we should execute \(1 + 4 + 5 = 10\) tests, the exact number executed by Filibuster (without dynamic reduction.)

However, we’re able to avoid one execution through dynamic reduction. To understand why, we provide the following reasoning.

By injecting a Timeout or ConnectionError exception for the call from the bookings service to the movies service, we cause the bookings service to return a 503 Service Unavailable response back to the users service. Therefore, since we have already seen the effect on the 503 error at the users service, when returned from the bookings service, we do not need to directly inject the 503 error, eliminating one possible test. With dynamic reduction enabled, we see this effect with the Filibuster output: \(1 + 3 + 5 = 9\) tests.

cinema-3: Adding Retry Loops

Description

In this example, based on cinema-2, we move the call made from the users service to the bookings service into a retry loop (with a single retry.)

_images/cinema-3.jpg

Fault Analysis

This is an example of a retry loop.

Filibuster Analysis

Similar to previous examples, bookings can fail 5 ways, and movies can fail 4 ways. However, now that the call from users to bookings has been placed in a loop, we have to consider those calls failing multiple times. Therefore, we should consider the following:

  • first, we know that without a retry loop (cinema-2) we need to test 10 executions (10):

  • next, we have to additionally consider the first iteration of the loop failing (9); and

  • we have to also consider all possible outcomes of the second iteration of the loop (1 success, 9 failures = 10).

This results in \((1 + 9 + (9 \times 10)) = 91\), the number of tests executed by Filibuster (without dynamic reduction.)

We can use the same reasoning as we used in cinema-2 to identify how many tests will be reduced: we don’t need to inject faults on services directly when that failure is previously generated by injecting faults on one or more of its dependencies.

Therefore, we consider the following the minimum number of test executions that are required to cover the entire fault space:

  • the passing execution (1); and

  • exhaustion of the first loop (5 outcomes: 3 status codes, 2 exceptions) where the second loop succeeds (5); and

  • exhaustion of the first loop (5 outcomes: 3 status codes, 2 exceptions) with exhaustion of the second loop (6 outcomes: 3 status codes, 2 exceptions, 1 success) (6); and

  • combinations of all possible outcomes of the movies service: (4 errors, 1 success = \(5 \times 5 = 25\)).

This results in: \(1 + 5 + 6 + 25 = 37\) tests. This is precisely the number of tests Filibuster executes with dynamic reduction.

cinema-4: Calls To 3rd Party Services

Description

In this example, based on cinema-2, each service calls to a remote service before calling a dependent service. Each of these calls made to remote services can fail without affecting the response. Specifically:

  • users retrieves information from IMDB;

  • bookings retrieves information from Fandango; and

  • movies retrieves information from Rotten Tomatoes.

_images/cinema-4.jpg

Fault Analysis

This is not a canonical example of any particular pattern, but a composition of the first two patterns: the sequence and chain. More specifically, this example contains a chain of sequences of calls:

  • users ->
    • IMDB

    • bookings ->
      • Fandango

      • movies ->
        • Rotten Tomatoes

In this particular example, none of the calls to remote services impact the chain; however, a failure of any non-remote services cause the chain to abort immediately, as seen in cinema-2.

Filibuster Analysis

In order to understand what tests Filibuster needs to execute for this example, we can start by building upon the formula we had for chains of calls.

No Reduction

We start by knowing that bookings is called from users and can fail 5 ways; we also know that movies is called by bookings and can fail 4 ways. Therefore, we know that we should start off with the following: \(1 + 5 + 4 = 10\). However, this doesn’t account for the remote calls. Therefore, we have to consider what impact having sequences in each service makes on this formula.

IMDB

Let’s consider first IMDB. We know that the call to IMDB will always happen, and it’s outcome will not impact whether we make the call to bookings from the users service. Since IMDB has two possible outcomes (failure with ConnectionError or success), this alters our formula:

\[1 + (2 \times 5) + 4\]

We put the 2 before the 5 in the multiplication to indicate that the call to IMDB occurs before the call to bookings: this is done to help the reader.

However, we also need to consider that we will try the combination of all possible outcomes of the call to IMDB with the ways that the movies service can fail. Again, this alters the formula:

\[1 + (2 \times 5) + (4 \times (2))\]

We put the 2 in parentheses after the 4 in the multiplication to indicate that we are referencing combinations with an earlier failure. Again, this is done to help the reader.

Finally, we have to consider the failure of IMDB in isolation. We can alter the formula further:

\[1 + (2 \times 5) + (4 \times (2)) + 1 = 20\]

We can rewrite this clearer with the following form:

\[\begin{split}&passing\ + \\ &(imdb_{outcomes}\ \times\ bookings_{failures})\ + \\ &(movies_{failures}\ \times\ imdb_{outcomes})\ + \\ &imdb_{failure\ only} \\ &=\ 20\end{split}\]

Fandango

When we consider the call that bookings makes to Fandango, we have to follow the same process.

First, we need to consider the Fandango outcomes, as they do not impact if any subsequent calls were made. This adjusts the formula by adding the Fandango failures to the combinations we try with IMDB and movie failures.

\[\begin{split}&passing\ + \\ &(imdb_{outcomes}\ \times\ bookings_{failures})\ + \\ &(imdb_{outcomes}\ \times\ movies_{failures}\ \times\ fandango_{failures})\ +\ \\ &imdb_{failure\ only} \\\end{split}\]

Similarly, we have to account for both Fandango failing alone and Fandango failing with IMDB with two adjustments:

\[\begin{split}&passing\ + \\ &(imdb_{outcomes}\ \times\ bookings_{failures})\ + \\ &(imdb_{outcomes}\ \times\ movies_{failures}\ \times\ fandango_{outcomes})\ +\ \\ &imdb_{failure\ only}\ +\ \\ &fandango_{failure\ only}\ +\ \\ &(imdb_{failure\ only}\ \times\ fandango_{failure\ only}) \\ &=\ 30\end{split}\]

Rotten Tomatoes

Finally, we have to follow the same process for the call from movies to Rotten Tomatoes. First, consider it in combination with other calls from movies – this happens to be none – and then it’s combination with all the other remote service calls.

This yields the final adjustment:

\[\begin{split}&passing\ + \\ &(imdb_{outcomes}\ \times\ bookings_{failures})\ + \\ &(imdb_{outcomes}\ \times\ movies_{failures}\ \times\ fandango_{outcomes})\ + \\ &imdb_{failure\ only}\ + \\ &fandango_{failure\ only}\ + \\ &rotten\ tomatoes_{failure\ only}\ + \\ &(imdb_{failure\ only}\ \times\ fandango_{failure\ only})\ + \\ &(imdb_{failure\ only}\ \times\ rotten\ tomatoes_{failure\ only})\ + \\ &(fandango_{failure\ only}\ \times\ rotten\ tomatoes_{failure\ only})\ + \\ &(imdb_{failure\ only}\ \times\ fandango_{failure\ only}\ \times\ rotten\ tomatoes_{failure\ only}) \\ &=\ 34\end{split}\]

This is the exact number of tests executed by Filibuster (without dynamic pruning.)

Dynamic Reduction

With dynamic reduction, we have to consider fewer cases.

We consider the following, presenting them in order of execution, which matches the postorder execution of the application through concolic execution:

  • the passing execution (1);

  • the ways that Rotten Tomatoes can fail (1);

  • the ways that movies can fail (4);

  • the ways Fandango can fail combined with the possible failures of movies: \((1 + (4 \times 1) = 5)\);

  • the ways that bookings can fail (4); and

  • the ways that bookings can fail (5) combined with the possible failures of IMDB: \((1 + (5 \times 1)) = 6\).

This results in: \(1 + 1 + 4 + (1 + (4 \times 1)) + 4 + (1 + (5 \times 1)) = 21\) tests. This is the number that Filibuster runs with dynamic reduction.

cinema-5: Default Responses

Description

In this example, based on cinema-1, all calls happen regardless of failure. In the event of failure, a default response is used.

_images/cinema-5.jpg

Fault Analysis

This is a simple example of one service calling multiple services without dependencies. All requests will always be made regardless of failure of previous requests.

Filibuster Analysis

In this pattern, let \(k_i\) be the number of different ways that the \(i\)-th service can fail, and let \(n\) be the total number of requests made. Then the number of fault-injection tests should be \((1+k_1) * (1+k_2) * ... * (1+k_n)\).

Since each request has 4 possible failures, in this case the number of tests executed by Filibuster should be \(5 \times 5 = 25\). This is the exact number executed by Filibuster.

cinema-6: Fallbacks

Description

This example is based on cinema-1 and adds a backup service for the bookings service: bookings-primary, replaces bookings, and bookings-secondary is the backup. The request to retrieve the bookings for a user will issue a request to the bookings-secondary service if the bookings-primary service has failed or returns an error. It’s important to note that the two bookings services return different responses when requests are made for the same user: this is an instance of eventual consistency between replicas.

_images/cinema-6.jpg

Fault Analysis

This illustrates a fallback pattern which is common in microservice architectures.

Filibuster Analysis

This particular example extends cinema-1 and is a combination of two patterns: the fallback and sequence patterns. To determine how this can fail, let us first look at the two bookings service. According to the formula above, there are \(1 + 4 \times 5 = 21\) paths, out of which \(4 \times 4 = 16\) end up with an error and 5 end up with a response. In the first 16 paths, we don’t make the call to movies. We then consider the combinations of the second case (5 paths) with movies success/failing (5 ways), which yields \(5 \times 5 = 25\). So the total is \(25 + 16 = 41\). This is the exact number executed by Filibuster.

cinema-7: Health Checks

Description

In this example, based on cinema-6, we first check that the health check service of the primary bookings replica passes before trying to access it.

_images/cinema-7.jpg

Fault Analysis

This is a pattern where, depending on the result of a previous request, a later request goes to different services.

Filibuster Analysis

In this case, we run the same number of tests as we do in cinema-6; however, we also run 4 additional tests to account for the 4 different failures that are possible for the health check service.

Therefore, \(num\ tests_cinema-6 + 4 = 41 + 4 = 45\). This is the exact number executed by Filibuster.

cinema-8: Monolithic

Description

In this example, the cinema-1 example is collapsed into a monolith with an API server making requests to it using a retry loop, where after the initial request, there is a single retry.

_images/cinema-8.jpg

Fault Analysis

This is a bare-bones example of a bounded retry loop.

Filibuster Analysis

Since there are 5 possible outcomes (4 failures for the monolith and 1 success case) and the second request is only executed if the first fails, we have \((1 + (4 \times 5)) = 21\), which represents the first passing and the first failing with all possible outcomes of the second. This is the exact number executed by Filibuster.