What the fork? Imposter commits in GitHub Actions and CI/CD

Billy Lynch, Staff Software Engineer
  •  
March 8, 2023

tl;dr - We found a vulnerability in GitHub Actions that bypasses allowed Workflow settings by using commits from forked repositories. Read on to learn more about how this works and what to watch out for!

Config-as-Code and GitOps workflows are popular ways to manage CI/CD pipelines. They give developers an easy way to add, review, and monitor changes to automated systems that deploy software to production. But these strategies are only as secure as the repositories they originate from, and understanding how they function is critical to operating a secure supply chain.

As we've seen from Solarwinds and Codecov incidents, being able to sneak in untrusted code into a CI/CD platform can be devastating. These systems often contain privileged secrets for fetching your code, accessing production servers, and more.

In this post we'll look at behavior that we coin "imposter commits" - an intentional (though perhaps unexpected) property of GitHub repositories, and how this led us to discovering a bug in GitHub Actions.

To understand how imposter commits work, we first need to understand how forking works in GitHub.

Commits in GitHub Forks

A fork is a feature commonly found on many Git hosting platforms that allows users to copy a repository to their own user namespace in order to make changes. Forks can act independently of the repository they originated from and can have their own permissions and histories.

When working with Git repositories on GitHub, users typically:

  • Fork an upstream repository
  • Checkout their fork
  • Make changes to their fork
  • Open a Pull Request to the upstream repository
  • Merge their Pull Request

To make working across repos easier, GitHub lets you checkout pull requests from forks directly from the parent repo. In order to do this, GitHub shares commits between the fork and its parent - this means that forks can be created very quickly (since not all objects need to be copied over to a separate repo) and you can easily checkout and test someone's change without needing to know what their remote is.

-- CODE language-bash -- $ git fetch origin refs/pull/123/head $ git checkout FETCH_HEAD

This is super useful for experimenting and validating changes before they are committed to the parent repo. This behavior can be extended to fetch arbitrary commits from forked repositories, even those not directly associated with a pull request:

-- CODE language-bash -- $ git fetch origin 2e2b0d54fd86299e54c3df648acf976fc1e13c4d $ git checkout FETCH_HEAD

This is possible because of howGitHub uses Git alternates to share data between repositories.

What are Imposter Commits?

This convenience has a trade off - when working with commit SHAs directly, how do you know if commit 2e2b0d5 or 74ba35f came from the primary repo or a fork? Has the commit been reviewed and checked into the primary repo's main branch? Could you tell if the commit was authored by a legitimate maintainer of the repo?

This is where the problem of imposter commits comes into play. Imposter commits are commits that appear to be from a parent repository, but they actually belong to a fork. Due to the way GitHub treats forked commits, they can be fetched by Git or GitHub's own API from the parent repository and it often isn't obvious that this is occurring.

This has been a known behavior of GitHub for quite some time, and has caused a few notable incidents. One example happened back in 2020, when a user in response to GitHub taking down the youtube-dl repo created a commit in the github/dmca repo impersonating GitHub's then-CEO Nat Friedman, uploading a copy of what appeared to be GitHub source code.

This led to press coverage speculating that GitHub was compromised. It was confirmed by GitHub that there was no compromise and what was uploaded was a version of GitHub Enterprise Server that was distributed to clients.

GitHub now recommends that users sign their commits, and displays a warning in the UI above any commits that don't belong to a branch in the parent repository:

However, no similar protections exist when interacting with a GitHub repository via a CLI:

-- CODE language-bash -- $ git clone https://github.com/github/dmca Cloning into 'dmca'... remote: Enumerating objects: 71494, done. remote: Counting objects: 100% (342/342), done. remote: Compressing objects: 100% (163/163), done. remote: Total 71494 (delta 205), reused 258 (delta 178), pack-reused 71152 Receiving objects: 100% (71494/71494), 26.05 MiB | 57.48 MiB/s, done. Resolving deltas: 100% (40773/40773), done. Updating files: 100% (13140/13140), done. $ cd dmca $ git fetch origin 565ece486c7c1652754d7b6d2b5ed9cb4097f9d5 remote: Enumerating objects: 30470, done. remote: Total 30470 (delta 0), reused 0 (delta 0), pack-reused 30470 Receiving objects: 100% (30470/30470), 124.35 MiB | 44.27 MiB/s, done. Resolving deltas: 100% (4688/4688), done. From https://github.com/github/dmca * branch 565ece486c7c1652754d7b6d2b5ed9cb4097f9d5 -> FETCH_HEAD $ git checkout FETCH_HEAD Updating files: 100% (41812/41812), done. Note: switching to 'FETCH_HEAD'. HEAD is now at 565ece486 felt cute, might put gh source code on dmca repo now idk

or from the API:

-- CODE language-bash -- $ curl https://api.github.com/repos/github/dmca/commits/565ece486c7c1652754d7b6d2b5ed9cb4097f9d5 | head -20 { "sha": "565ece486c7c1652754d7b6d2b5ed9cb4097f9d5", "node_id": "MDY6Q29tbWl0MTMwNDczODo1NjVlY2U0ODZjN2MxNjUyNzU0ZDdiNmQyYjVlZDljYjQwOTdmOWQ1", "commit": { "author": { "name": "Nat Friedman", "email": "nat@github.com", "date": "2020-11-04T03:51:21Z" }, "committer": { "name": "Nat Friedman", "email": "nat@github.com", "date": "2020-11-04T03:51:21Z" }, "message": "felt cute, might put gh source code on dmca repo now idk", "tree": { "sha": "4d41a9dfbfa803a45791c4b2f18bee9cb8c6f66a", "url": "https://api.github.com/repos/github/dmca/git/trees/4d41a9dfbfa803a45791c4b2f18bee9cb8c6f66a" }, "url": "https://api.github.com/repos/github/dmca/git/commits/565ece486c7c1652754d7b6d2b5ed9cb4097f9d5",

Imposter Commits in CI/CD

While using pinned dependencies is a good practice, imposter commits can make it difficult to know just by reading a file whether or not you're using a version coming from the repo you expect. For example, take the following GitHub Action workflow that uses the popular actions/checkout Action provided by GitHub:

-- CODE language-bash -- name: example on: [push] jobs: commit: runs-on: ubuntu-latest steps: - uses: actions/checkout@c7d749a2d57b4b375d1ebcd17cfbfb60c676f18e - shell: bash run: | echo 'hello world!'

Looks great, right? As you can probably guess, no - this is actually an imposter commit belonging to a fork, but this is not obvious from just the config!

When ran, GitHub Actions will fetch the SHA to get the workflow configuration regardless of whether it is reachable from a branch in the parent repository:

https://github.com/wlynch/imposter-commits-demo/actions/runs/4365534263/jobs/7634289429

Additionally since this Action is referenced using the parent repository, GitHub Actions treats it as belonging to the parent - this means it can bypass GitHub Actions security settings that would normally restrict Actions to trusted sources like GitHub or your organization. Because this commit is in a fork, it does not need to be reviewed or approved by an actions/checkout maintainer.

Detection and Prevention

This attack is a type of dependency confusion. In order for an attacker to exploit this, it takes specific action from a workflow author to use a SHA for an unpublished version as well as a repository maintainer accepting those changes. This would likely require some form of social engineering, other phishing style attack, or careless action by a user (i.e. accepting a pull request that updates a workflow to use an imposter commit SHA).

GitHub has already taken some steps to reduce the likelihood of these types of attacks - e.g. by removing the ability to use a short SHA to reference an action which makes it less likely that attackers could abuse this with limited or no social engineering via a SHA collision.

Currently, we are not aware of any active exploitation, though we have not done an exhaustive search of public repositories. To help aid detection, we are open sourcing a tool called clank to help users check for potential imposter commits in their own GitHub Action workflows:

We also plan on contributing a similar check to OpenSSF Scorecards soon! 

We recommend users enable automated tools such as Dependabot to keep your GitHub Actions up to date with known branches/tags.

Regardless of whether you are using GitHub Actions, we recommend auditing your CI configurations to see if your jobs are vulnerable to fetching arbitrary imposter commits, particularly for sensitive workflows like deployments or artifact signing.

What about code signing?

A frequently asked question we've received is "could code signing have prevented this"?

The answer (as usual) - it depends.

We recommend you treat the actions you publish and consume no differently than other artifacts (packages, images, etc.) you use in your software supply chain. If done correctly, code signing can be an effective way to detect issues like this.

However, for this to work commits would need to be signed with a repository specific identity. Any humans that could sign with the same identity could produce valid signatures within their own forks. Because of the prevalence of GitHub's web-flow.gpg key that is used whenever a UI or API operation is used to create a commit/tag, this is difficult to rely on in practice.

Publishing release tags using a per-repository identity from trusted releasers could go a long way to help improve the security of consumed actions. We recommend using tools such as Sigstore's Gitsign to help make signing Git artifacts easy and transparent without needing to provision long-lived keys.

Disclosure to GitHub

We found this vulnerability on September 7th, 2022 and disclosed to GitHub September 8th, 2022 via the GitHub Bug Bounty program. This report is being made public 180 days after disclosure following HackerOne's disclosure guidelines.

In response, GitHub has made changes to their documentation for Actions when using SHAs:

Additionally, GitHub is working towards improvements in GitHub Actions publishing, which we look forward to!

Interested in protecting your organization from these types of attacks and more? We'd love to help. Chainguard provides software supply chain security audits for leading organizations who want to take the first step in their secure supply chain journey.

Start your assessment today.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Don’t break the chain – secure your supply chain today!

Security

What the fork? Imposter commits in GitHub Actions and CI/CD

Billy Lynch, Staff Software Engineer
March 8, 2023
copied

tl;dr - We found a vulnerability in GitHub Actions that bypasses allowed Workflow settings by using commits from forked repositories. Read on to learn more about how this works and what to watch out for!

Config-as-Code and GitOps workflows are popular ways to manage CI/CD pipelines. They give developers an easy way to add, review, and monitor changes to automated systems that deploy software to production. But these strategies are only as secure as the repositories they originate from, and understanding how they function is critical to operating a secure supply chain.

As we've seen from Solarwinds and Codecov incidents, being able to sneak in untrusted code into a CI/CD platform can be devastating. These systems often contain privileged secrets for fetching your code, accessing production servers, and more.

In this post we'll look at behavior that we coin "imposter commits" - an intentional (though perhaps unexpected) property of GitHub repositories, and how this led us to discovering a bug in GitHub Actions.

To understand how imposter commits work, we first need to understand how forking works in GitHub.

Commits in GitHub Forks

A fork is a feature commonly found on many Git hosting platforms that allows users to copy a repository to their own user namespace in order to make changes. Forks can act independently of the repository they originated from and can have their own permissions and histories.

When working with Git repositories on GitHub, users typically:

  • Fork an upstream repository
  • Checkout their fork
  • Make changes to their fork
  • Open a Pull Request to the upstream repository
  • Merge their Pull Request

To make working across repos easier, GitHub lets you checkout pull requests from forks directly from the parent repo. In order to do this, GitHub shares commits between the fork and its parent - this means that forks can be created very quickly (since not all objects need to be copied over to a separate repo) and you can easily checkout and test someone's change without needing to know what their remote is.

-- CODE language-bash -- $ git fetch origin refs/pull/123/head $ git checkout FETCH_HEAD

This is super useful for experimenting and validating changes before they are committed to the parent repo. This behavior can be extended to fetch arbitrary commits from forked repositories, even those not directly associated with a pull request:

-- CODE language-bash -- $ git fetch origin 2e2b0d54fd86299e54c3df648acf976fc1e13c4d $ git checkout FETCH_HEAD

This is possible because of howGitHub uses Git alternates to share data between repositories.

What are Imposter Commits?

This convenience has a trade off - when working with commit SHAs directly, how do you know if commit 2e2b0d5 or 74ba35f came from the primary repo or a fork? Has the commit been reviewed and checked into the primary repo's main branch? Could you tell if the commit was authored by a legitimate maintainer of the repo?

This is where the problem of imposter commits comes into play. Imposter commits are commits that appear to be from a parent repository, but they actually belong to a fork. Due to the way GitHub treats forked commits, they can be fetched by Git or GitHub's own API from the parent repository and it often isn't obvious that this is occurring.

This has been a known behavior of GitHub for quite some time, and has caused a few notable incidents. One example happened back in 2020, when a user in response to GitHub taking down the youtube-dl repo created a commit in the github/dmca repo impersonating GitHub's then-CEO Nat Friedman, uploading a copy of what appeared to be GitHub source code.

This led to press coverage speculating that GitHub was compromised. It was confirmed by GitHub that there was no compromise and what was uploaded was a version of GitHub Enterprise Server that was distributed to clients.

GitHub now recommends that users sign their commits, and displays a warning in the UI above any commits that don't belong to a branch in the parent repository:

However, no similar protections exist when interacting with a GitHub repository via a CLI:

-- CODE language-bash -- $ git clone https://github.com/github/dmca Cloning into 'dmca'... remote: Enumerating objects: 71494, done. remote: Counting objects: 100% (342/342), done. remote: Compressing objects: 100% (163/163), done. remote: Total 71494 (delta 205), reused 258 (delta 178), pack-reused 71152 Receiving objects: 100% (71494/71494), 26.05 MiB | 57.48 MiB/s, done. Resolving deltas: 100% (40773/40773), done. Updating files: 100% (13140/13140), done. $ cd dmca $ git fetch origin 565ece486c7c1652754d7b6d2b5ed9cb4097f9d5 remote: Enumerating objects: 30470, done. remote: Total 30470 (delta 0), reused 0 (delta 0), pack-reused 30470 Receiving objects: 100% (30470/30470), 124.35 MiB | 44.27 MiB/s, done. Resolving deltas: 100% (4688/4688), done. From https://github.com/github/dmca * branch 565ece486c7c1652754d7b6d2b5ed9cb4097f9d5 -> FETCH_HEAD $ git checkout FETCH_HEAD Updating files: 100% (41812/41812), done. Note: switching to 'FETCH_HEAD'. HEAD is now at 565ece486 felt cute, might put gh source code on dmca repo now idk

or from the API:

-- CODE language-bash -- $ curl https://api.github.com/repos/github/dmca/commits/565ece486c7c1652754d7b6d2b5ed9cb4097f9d5 | head -20 { "sha": "565ece486c7c1652754d7b6d2b5ed9cb4097f9d5", "node_id": "MDY6Q29tbWl0MTMwNDczODo1NjVlY2U0ODZjN2MxNjUyNzU0ZDdiNmQyYjVlZDljYjQwOTdmOWQ1", "commit": { "author": { "name": "Nat Friedman", "email": "nat@github.com", "date": "2020-11-04T03:51:21Z" }, "committer": { "name": "Nat Friedman", "email": "nat@github.com", "date": "2020-11-04T03:51:21Z" }, "message": "felt cute, might put gh source code on dmca repo now idk", "tree": { "sha": "4d41a9dfbfa803a45791c4b2f18bee9cb8c6f66a", "url": "https://api.github.com/repos/github/dmca/git/trees/4d41a9dfbfa803a45791c4b2f18bee9cb8c6f66a" }, "url": "https://api.github.com/repos/github/dmca/git/commits/565ece486c7c1652754d7b6d2b5ed9cb4097f9d5",

Imposter Commits in CI/CD

While using pinned dependencies is a good practice, imposter commits can make it difficult to know just by reading a file whether or not you're using a version coming from the repo you expect. For example, take the following GitHub Action workflow that uses the popular actions/checkout Action provided by GitHub:

-- CODE language-bash -- name: example on: [push] jobs: commit: runs-on: ubuntu-latest steps: - uses: actions/checkout@c7d749a2d57b4b375d1ebcd17cfbfb60c676f18e - shell: bash run: | echo 'hello world!'

Looks great, right? As you can probably guess, no - this is actually an imposter commit belonging to a fork, but this is not obvious from just the config!

When ran, GitHub Actions will fetch the SHA to get the workflow configuration regardless of whether it is reachable from a branch in the parent repository:

https://github.com/wlynch/imposter-commits-demo/actions/runs/4365534263/jobs/7634289429

Additionally since this Action is referenced using the parent repository, GitHub Actions treats it as belonging to the parent - this means it can bypass GitHub Actions security settings that would normally restrict Actions to trusted sources like GitHub or your organization. Because this commit is in a fork, it does not need to be reviewed or approved by an actions/checkout maintainer.

Detection and Prevention

This attack is a type of dependency confusion. In order for an attacker to exploit this, it takes specific action from a workflow author to use a SHA for an unpublished version as well as a repository maintainer accepting those changes. This would likely require some form of social engineering, other phishing style attack, or careless action by a user (i.e. accepting a pull request that updates a workflow to use an imposter commit SHA).

GitHub has already taken some steps to reduce the likelihood of these types of attacks - e.g. by removing the ability to use a short SHA to reference an action which makes it less likely that attackers could abuse this with limited or no social engineering via a SHA collision.

Currently, we are not aware of any active exploitation, though we have not done an exhaustive search of public repositories. To help aid detection, we are open sourcing a tool called clank to help users check for potential imposter commits in their own GitHub Action workflows:

We also plan on contributing a similar check to OpenSSF Scorecards soon! 

We recommend users enable automated tools such as Dependabot to keep your GitHub Actions up to date with known branches/tags.

Regardless of whether you are using GitHub Actions, we recommend auditing your CI configurations to see if your jobs are vulnerable to fetching arbitrary imposter commits, particularly for sensitive workflows like deployments or artifact signing.

What about code signing?

A frequently asked question we've received is "could code signing have prevented this"?

The answer (as usual) - it depends.

We recommend you treat the actions you publish and consume no differently than other artifacts (packages, images, etc.) you use in your software supply chain. If done correctly, code signing can be an effective way to detect issues like this.

However, for this to work commits would need to be signed with a repository specific identity. Any humans that could sign with the same identity could produce valid signatures within their own forks. Because of the prevalence of GitHub's web-flow.gpg key that is used whenever a UI or API operation is used to create a commit/tag, this is difficult to rely on in practice.

Publishing release tags using a per-repository identity from trusted releasers could go a long way to help improve the security of consumed actions. We recommend using tools such as Sigstore's Gitsign to help make signing Git artifacts easy and transparent without needing to provision long-lived keys.

Disclosure to GitHub

We found this vulnerability on September 7th, 2022 and disclosed to GitHub September 8th, 2022 via the GitHub Bug Bounty program. This report is being made public 180 days after disclosure following HackerOne's disclosure guidelines.

In response, GitHub has made changes to their documentation for Actions when using SHAs:

Additionally, GitHub is working towards improvements in GitHub Actions publishing, which we look forward to!

Interested in protecting your organization from these types of attacks and more? We'd love to help. Chainguard provides software supply chain security audits for leading organizations who want to take the first step in their secure supply chain journey.

Start your assessment today.

Related articles

Ready to lock down your supply chain?

Talk to our customer obsessed, community-driven team.