Cloud Native Fuzzing: Istio - 40 crashes and high-severity CVE

22rd February, 2022
Adam Korczynski,
Security Engineering & Security Automation
John Howard,
Senior Software Engineer, Google

In this blogpost we will give an in-depth walk-through of the work that we did to set up continuous fuzzing of Istio. The work was done in collaboration with the Istio maintainers and the Google Open Source Security team.

The result of the efforts is uncovering more than 40 unique crashes in Istio, including CVE-2022-23635 which allowed anyone, including unauthenticated users, to send malicious payloads that could crash the control plane server and act as a denial of service attack.

Istio is an open source service mesh that offers features to secure, connect and monitor distributed services. It is used by many organisations including AirBnB, eBay, Atlassian, Sales Force, T-Mobile and Rappi to handle large amounts of network traffic. It is written in the Go programming language and uses an extended version of the Envoy proxy to handle various proxy-related tasks.

Challenges

There were three main challenges to overcome for continuously fuzzing Istio.

The first challenge was that Istio mainly deals with structured data whereas the go-fuzz fuzzing engine simply provides fuzz-targets with an array of bytes. As such, we needed a convenient way of transforming the raw byte array into high-level Go data types, e.g. structs. To do this we developed the go-fuzz-headers library which can be used to easily create Go data structures that are populated with fuzzed data.

The second challenge was to integrate Istio into the OSS-Fuzz infrastructure. This was done early in the engagement to take advantage of the continuous fuzzing infrastructure that OSS-Fuzz offers. With OSS-fuzz, any number of fuzzers will be run automatically, and the coverage can be monitored in the coverage builds from the OSS-fuzz dashboard.

The third challenge was to come up with a set of fuzzers that would optimally analyse the Istio code. In simplified terms we had to come up with a lot of fuzzers that would execute Istio code in a myriad of ways. In total we have developed 60 fuzzers for Istio. We won’t go into detail with each of these here, but, all of the fuzzers are available in the Istio repository here: https://github.com/istio/istio/tree/master/tests/fuzz.

Why fuzz Go code?

Fuzzing has the immediate benefits of finding bugs in a highly autonomous manner. Once a fuzz harness has been written, it can continue to look for bugs for a long time without much manual interference. For software written in Go, such bugs could be out-of-bounds’, nil-dereferences, timeouts, out-of-memory, runtime errors, off-by-1’s and logical bugs. At the time of writing, 64 critical open source Go projects have joined OSS-Fuzz in which hundreds of stability and security-relevant bugs have been found and fixed.

Why fuzzing Istio is important

Istio forms the foundation of an increasingly large set of companies' compute infrastructure, providing service discovery, traffic management, authorization and authentication, and observability. This means that even minor bugs can trickle up to major user impact, and severe bugs can be disastrous. Because of this, the Istio team spent recent years dedicated to improving our test coverage and security posture to help find lurking bugs, and prevent new ones from popping up. Fuzzing is the next step in that journey.

Timeline

The Istio maintainers had experimented with fuzz testing in 2019 and had found bugs in both Istio itself as well as critical dependencies. In mid 2019 a tracker issue was set up to improve fuzz coverage and integrate into OSS-Fuzz. In December 2020 Istio integrated into OSS-Fuzz with its two first fuzzers. One of these fuzzers found an issue in Kubernetes itself which was initially reported privately to the Kubernetes maintainers and later tracked in a public issue. In June 2021 the work with improving fuzz coverage began, and as of February 2022, 60 fuzzers have been merged to Istio with all running continuously by OSS-Fuzz.

Findings

Over the course of the first year of fuzzing Istio a total of 70 crashes were reported. Of these, 17 were due to errors in the runtime environment and not related to Istio itself, and 4 crashes were reported as a result of build failures. There were 4 duplicates 2 and crashes in the fuzzers themselves which were invalid. That leaves 43 findings related to Istio. The findings are broken down as such:

Production code:

Test code:

Security critical bug: CVE-2022-23635

One particularly interesting bug detected by the fuzz tests was CVE-2022-23635. The impacted code is quite simple, well tested, and used in production for over a year.

Impact/attack vector

This bug is especially impactful in this case, as it is on a critical code path which authenticates clients. This means that anyone, including unauthenticated users, are able to send malicious payloads that could crash the control plane server and act as a denial of service attack. While Istio is designed to be robust to short term disconnections from the control plane, sustained downtime can prevent configuration updates, endpoint discovery updates, and new workloads from starting up. Users with workload churn are vulnerable to this vulnerability. In the worst case scenario where a users workloads are restarting (due to an upgrade, preemptible nodes, another exploit, or various other reasons), this could lead to a total cluster-wide outage.

Deep dive

To understand the bug we will do a short dive into the root cause. The following snippet shows the ExtractJwtAud function which is where the issues occurs:

func ExtractJwtAud(jwt string) ([]string, bool) {
        jwtSplit := strings.Split(jwt, ".")
        if len(jwtSplit) != 3 {
                return nil, false
        }
        payload := jwtSplit[1]

        payloadBytes, err := base64.RawStdEncoding.DecodeString(payload)
        if err != nil {
                return nil, false
        }

        structuredPayload := &jwtPayload{}
        err = json.Unmarshal(payloadBytes, &structuredPayload)
        if err != nil {
                return nil, false
        }

        return structuredPayload.Aud, true
}
Source

One nil-pointer dereference was found in this particular piece of code by the FuzzJwtUtil fuzzer. In the event that jwt equals ".bnVsbM." then structuredPayload will be nil at the return statement, which causes ExtractJwtAud to crash with a nil-dereference:

       return structuredPayload.Aud, true

The fact that structuredPayload is nil when returning structuredPayload.Aud seems counter-intuitive as the error returned by json.Unmarshal is checked. We break down ExtractJwtAud to see how this crash happens:

To understand this we will walk through the code. The string passed to ExtractJwtAud is a JWT token which consists of 3 base64 encoded JSON objects. ExtractJwtAud only uses the midle element:

jwtSplit := strings.Split(jwt, ".")
if len(jwtSplit) != 3 {
  return nil, false
}
payload := jwtSplit[1]

The payload string is then decoded:

payloadBytes, err := base64.RawStdEncoding.DecodeString(payload)
if err != nil {
  return nil, false
}

If the jwt argument to the ExtractJwtAud function equals ".bnVsbM." then payloadBytes will end up being []byte(“null”), i.e. a byte slice with the characters "null". The code then proceeds with unmarshalling payloadBytes into structuredPayload:

structuredPayload := &jwtPayload{}
err = json.Unmarshal(payloadBytes, &structuredPayload)
if err != nil {
  return nil, false
}

The code checks if the unmarshalling is successful by catching any errors thrown by json.Unmarshal

In case json.Unmarshal does not throw any errors, execution continues to the return statement of the function and the return statement of the function will return structuredPayload.Aud. However, in this case structuredPayload is in fact nil and Istio will panic with a nil-pointer dereference:

return structuredPayload.Aud, true
}

structuredPayload is nil at the point of the return statement due to the handling of json.Unmarshal. We can create a simple reproducer for this issues as we know that payloadBytes is []byte(“null”) just before calling json.Unmarshal:

package main

import (
        "encoding/json"
        "fmt"
)

type jwtPayload struct {
        Aud []string `json:"aud"`
}

func main() {
        structuredPayload := &jwtPayload{}
        fmt.Println("before json.Unmarshal: ", structuredPayload)
        err := json.Unmarshal([]byte("null"), &structuredPayload)
        if err != nil {
                return
        }
        fmt.Println("after json.Unmarshal: ", structuredPayload)
}

Running this file will print out the following:

before json.Unmarshal:  &{[]}
after json.Unmarshal: <nil>

The (double) pointers are the key here. Instead of passing a *jwtPayload, we pass a double pointer **jwtPayload to json.Unmarshal. In the case of a double pointer json.Unmarshal behaves the same as when passed a single pointer with one exception - if the double dereference string is "null", the inner pointer will be set to nil.

The fix

This fix is quite simple - simply remove the extra pointer indirection to pass a *jwtPayload to json.Unmarshal:

func ExtractJwtAud(jwt string) ([]string, bool) {
  jwtSplit := strings.Split(jwt, ".")
  if len(jwtSplit) != 3 {
    return nil, false
  }
  payload := jwtSplit[1]

  payloadBytes, err := base64.RawStdEncoding.DecodeString(payload)
  if err != nil {
    return nil, false
  }

-  structuredPayload := &jwtPayload{}
+  structuredPayload := jwtPayload{}
  err = json.Unmarshal(payloadBytes, &structuredPayload)
  if err != nil {
    return nil, false
  }

  return structuredPayload.Aud, true
}

Istio fixed this issue in https://github.com/istio/istio/commit/5f3b5ed958ae75156f8656fe7b3794f78e94db84 which also includes a testcase with the string to catch regressions.

Other affected projects

The code itself is in a sense not specific to the behavior of Istio. In fact, the same code pattern was found in a number of other major Go projects across the ecosystem. To see if your project contains the same error, see this tool for more information.

Closing thoughts

In this blogpost we will give an in-depth walk-through of our recent work in setting up continuous fuzzing of Istio. Istio is a Go application and a Cloud Native service Mesh, both of which are not the traditional targets of fuzzing. We are delighted with the outcome and the contributions to the Istio code, which helps provide a higher assurance in reliability and security.

Throughout the last two years we have performed increasingly more fuzzing of Cloud Native applications. This includes Envoy, Vitess, Kubernetes, Fluent-bit, Containerd, Flux, Runc, Linkerd2-proxy and several more. All of these projects are integrated into the free open source security service OSS-Fuzz. Throughout this experience of fuzzig a new context of applications, namely Cloud Native software, we have had an overall positive experience, and are looking forward to contibuting more to this exciting and important field.