GitHub Copilot: A Security review

GitHub Copilot: A Security review

GitHub Copilot is seen by some to be a game-changer for developers, offering real-time code suggestions powered by a vast dataset of public code. While it can boost productivity and creativity, we've heard from many organizations who haven't yet adopted Copilot because of security concerns. The team at Swift Security spent some time researching these concerns, looking both at GitHub Copilot Chat (conversational AI for coding assistance) and GitHub Copilot (AI-driven code suggestions directly in the coding environment).?

Here are the top security concerns we have heard surrounding GitHub Copilot:

  1. Data Privacy: Could Copilot suggest code containing sensitive information learned from private repositories?
  2. Insecure Code: Could Copilot propagate insecure coding practices based on its training data?
  3. Improper Licensing: Might Copilot suggest copyrighted code snippets, leading to licensing issues?

Let's delve into each of these concerns and explore potential solutions.

Data Privacy:

We understand the concern about potentially leaking sensitive information. One version of Copilot is trained on public code repositories, which significantly reduces the risk of exposure. However, there is also an Enterprise version that can be trained on an organization’s private repositories. In either event, it's always good practice to review all suggestions with an eye for potential data leaks. Additionally, if you are not doing so already, consider utilizing automated privacy checking tools to identify any sensitive information that might slip through (including keys that might have been inadvertently included in a public repository). Another wrinkle, if you allow Copilot to use Bing Search to gather information, the prompt would be sent to Bing. To mitigate this risk, options include (a) turning off this feature, (b) educating the users about this risk, or (c) implementing a data protection product to help alleviate this risk.?

It is also worthwhile to have your privacy and legal teams review the privacy and data handling policy published by GitHub Copilot enterprise.

Insecure Code:

Copilot's suggestions are based on patterns it learns from its training data and this data may not always adhere to best security practices. It is certainly possible for Copilot to suggest a popular and insecure dependency while suggesting code. It’s also possible that the version of dependencies have changed since the time Copilot was trained on the dataset. LLMs have been found to hallucinate package names, creating an opportunity for threat actors to create insecure packages using those names. Another risk if for Copilot to amplify insecure code or replicate security issues. To mitigate this risk, it is recommended to always review and understand the code suggestions before integrating them. Following best practices from the pre-Copilot days, utilize manual security reviews, code analysis tools, and adherence to secure coding principles. If those best practices are being utilized already, the incremental risk for Copilot may be limited to an increased volume of code that could exceed the capacity of the security controls.

Improper Licensing:

There's a valid concern that Copilot might suggest copyrighted code snippets. The typical suggestion is to treat Copilot suggestions just like any code snippet you find online and check the licensing terms for that code. But unlike code snippets you find online, with Copilot you don’t know the source of the code suggestions. Consider using code search engines and/or code analysis tools that can identify licensing issues. As an additional safeguard, Github offers IP indemnification for paying customers, but only for unmodified suggestions, and assuming the duplicate detection feature is set to block.

Join the Conversation!

Security is a top priority for all of us. We believe GitHub (and other coding) Copilots can be a powerful tool for developers when used responsibly and in conjunction with other security controls.?

We'd love to hear from you! Have these security concerns impacted your decision to use Copilot? How do you manage these risks? Would it be useful to shift left the security controls to be applied directly to the suggestion (versus downstream using current code scanning tools )? Share your perspectives in the comments below. Let's work together to navigate these complexities and leverage Copilot for a more productive and secure development experience.

References:

https://docs.github.com/en/enterprise-cloud@latest/copilot/github-copilot-enterprise/overview/about-github-copilot-enterprise

https://docs.github.com/en/enterprise-cloud@latest/copilot/github-copilot-enterprise/overview/enabling-github-copilot-enterprise-features#configuring-copilot-enterprise-features-for-an-organization

https://docs.github.com/en/enterprise-cloud@latest/copilot/github-copilot-enterprise/copilot-chat-in-github/about-github-copilot-chat-in-githubcom#limitations-of-github-copilot-chat

要查看或添加评论,请登录

Neil King的更多文章

社区洞察

其他会员也浏览了