Back to blog
AI-Generated Code: Who Owns the Copyright?
AICopyrightLawCodeIntellectual PropertyEnterprise AI

AI-Generated Code: Who Owns the Copyright?

16-04-20269 min readPrince Kumar

Copyright law was built on a foundational assumption: creative works are produced by human authors, and those authors receive the exclusive right to control their work's use and distribution. When a developer writes code, they own it (or their employer does, under a work-for-hire doctrine). When a developer prompts GitHub Copilot to write code and the tool produces a fifty-line function, the question of who owns that function the developer who wrote the prompt, the company that built the tool, the developers whose open-source code was in the training data, or nobody is one that copyright law did not anticipate and that courts around the world are actively wrestling with in 2026. By early 2026, approximately 50% of enterprise code contains AI-assisted elements. The ambiguity in who owns that code has real consequences: for licensing, for open-source compliance, for M&A due diligence, and for the ability to enforce intellectual property rights against competitors who copy your AI-assisted codebase.

By early 2026, approximately 50% of enterprise code is AI-assisted. The copyright framework governing this code has not kept pace. Courts in the US, EU, and India are arriving at different answers. The legal ambiguity has real consequences for every company that ships software.

The Training Data Problem

Separate from the question of who owns AI-generated code is the question of whether AI code generators are themselves infringing copyright by training on copyrighted code. GitHub Copilot was trained on public code repositories, including repositories with GPL, MIT, Apache, and other licences. When Copilot generates code that closely resembles code in its training data a pattern that researchers have documented, with Copilot reproducing recognisable fragments from specific licensed repositories it potentially distributes copyrighted code without the attribution or licence compliance that those licences require.The practical risk for organisations using AI code generators: there is no reliable way to know whether a block of AI-generated code contains fragments derived from licensed source code, and there is no reliable way to know what licence obligations attach to those fragments if it does. Legal teams at large technology companies have begun requiring AI-generated code to be reviewed against known-licensed codebases before shipping. Some companies have banned AI code generators entirely from use in code that will be distributed under specific open-source licences, specifically to avoid licence contamination claims.

What Enterprise Organisations Should Do Right Now

  • Conduct an audit of which AI code generation tools are being used by engineering teams, which licences govern those tools' training data and outputs, and what the tool vendor's indemnification policy covers in the event of copyright claims
  • Establish a policy on the use of AI code generators for code that will be distributed under specific open-source licences specifically GPL, which has copyleft provisions that could be triggered by AI-generated fragments from GPL-licensed training data
  • Require AI-generated code to be reviewed for potential licence contamination before inclusion in products distributed to customers or released as open source, using available scanning tools
  • Include AI code provenance questions in M&A due diligence for technology acquisitions the target company's AI code generation practices affect the clean ownership of the codebase being acquired
  • Monitor the Copilot class action litigation and regulatory developments in the EU and US that are likely to produce more definitive guidance within the next twelve to twenty-four months