A judge has dismissed the majority of claims in a copyright lawsuit filed by developers against GitHub, Microsoft, and OpenAI.
The lawsuit was initiated by a group of developers in 2022 and originally made 22 claims against the companies, alleging copyright violations related to the AI-powered GitHub Copilot coding assistant.
Judge Jon Tigar’s ruling, unsealed last week, leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract. This decision marks a substantial setback for the developers who argued that GitHub Copilot, which uses OpenAI’s technology and is owned by Microsoft, unlawfully trained on their work.
…
Despite this significant ruling, the legal battle is not over. The remaining claims regarding breach of contract and open-source license violations are likely to continue through litigation.
The judge also noted that the cited study itself mentions that GitHub Copilot “rarely emits memorised code in benign situations.”
“Rarely” is not zero. This looks like it’s opening a loophole to copying open source code with strong copyleft licenses like the GPL:
- Find OSS code you want to copy
- Set up conditions for Copilot to reproduce code
- Copy code into your commercial product
- When sued, just claim Copilot generated the code
Depending on how good your lawyers are, 2 is optional. And bingo! All the OSS code you want without those pesky restrictive licenses.
In fact, I wonder if there’s a way to automate step 2. Some way to analyze an OSS GitHub repo to generate inputs for Copilot that will then regurgitate that same repo.
With an automated refactoring step to pretend it’s really not derivative work despite being extremely derivative
It doesn’t work like that. A copy is a copy. Only if you can make it credible that you independently produced the same code, can you get through with that. Hence, clean room implementations. It’s not strictly necessary but deters lawsuits.
Apparently there’s some confusion here what the judge ruled. This particular part is about claims under the DMCA, not copyright infringement. The relevant sections can be seen here: https://www.copyright.gov/title17/92chap12.html [edit: link fixed. The claim was that “copyright management information” was removed; prohibited under these sections.]
Here’s the original text for those who want to know more (link via The Verge): https://www.documentcloud.org/documents/24796955-github-copilot-claims-dismissed
- Immediately lose the case because nobody is claiming that when copilot does emit copyrighted code verbatim it is magically stripped of copyright protections.
That is, I fact, exactly what the judge in this case is saying.
Lol no. Please show me where he says that.
This is an aspect of the German court system that is LEAGUES more sensible than the US - they have certified subject matter experts in a ton of domains that work with courts to help meaningfully inform judicial decisions. The system isn’t perfect (no system is), but it’s a damn sight better than what the US generally does. I’m categorically unable to name a justice or court jurisdiction anywhere in the US that consistently makes well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.
I am usually not wont to defend the dysfunction presently found in the USA federal (and state-level) judiciary, but I think this comparison to the German courts requires a bit more context. Generally speaking, the USA federal courts and US States adopt the adversarial system, originally following the English practice in both common law and equity. This means the judge takes on a referee role, and a plaintiff and a defendant will make their best, most convincing arguments.
I should clarify that “common law” in this context refers to the criminal matters (akin to public law), and “equity” refers to person-versus-person disputes (akin to private law), such as contracts.
For the adversarial system to work, the plaintiff and defendant need to be sufficiently motivated (and nowadays, well-monied) to put on good arguments, or else they’re just wasting the court’s time. Hence, there is a requirement (known as “standing”) where – grossly oversimplifying – the plaintiff must be the person with the most to gain, and the defendant must be the person with the most to lose. They are interested parties who will argue vigorously.
Of course, that’s legal fiction, because oftentimes, a defendant might be unable to able to afford excellent legal counsel. Or plaintiffs will half-ass or drag out a lawsuit, so that it’s more an annoyance to the opposite party.
In an adversarial system, it is each party’s responsibility to obtain subject-matter experts and their opinions to present to the court. The judge is just there to listen and evaluate the evidence – exception: criminal trials leave the evaluation of evidence to the jury.
Why is the USA like this? For the USA federal courts, it’s because it’s part of our constitution, in the Case or Controversy Clause. One of the key driving forces for drafters of the USA Constitution was to restrict the powers of government officials and bureaucrats, after seeing the abuses committed during the Colonial Era. The Clause above is meant to constrain the unelected judiciary – which otherwise has awe-inducing powers such as jailing people, undoing legislation, and assigning wardship or custody of children – from doing anything unless some controversy actually needed addressing.
With all that history in mind, if the judiciary kept their own in-house subject-matter experts, then that could be viewed as more unelected officials trying to tip the scale in matters of science, medicine, computer science, or any other field. Suddenly, landing a position as the judiciary’s go-to expert could have broad reaching impacts, despite no one in the federal judiciary being elected.
In a sense, because of the fear of officials potentially running amok, the USA essentially “privatizes” subject matter experts, to be paid by the plaintiff or defendant, rather than employed by the judiciary. The adversarial system is thus an intentional value judgement, rather than “whoopsie” type of thing that we walked into.
Small note: the federal executive (the US President and all the agencies) do keep subject matter experts, for the limited purpose of implementing regulations (aka secondary legislation). But at least they all report indirectly to the US President, who is term-limited and only stays 4 years at a time.
This system isn’t perfect, but it’s also not totally insane.
I mean I get what you’re saying on a theoretical level, but all of that breaks down once you fill the judiciary with rank incompetents and political hacks.
You are absolutely correct: this fragile experiment called democracy will not survive if the citizenry becomes ambivalent about its institutions, allowing corrupt officials and other enablers of authoritarianism to take root.
If you are an American and that prospect disturbs you, then you need to help strengthen and guard the institutions that protect the core American values. Nobody owes you a democracy.
For some ideas of what to do, this post by Teri Kanefield has a list of concrete actions that you can take: https://terikanefield.com/things-to-do/
I saw the Chad no-self-upvote move, so here’s mine 🍻
For some ideas of what to do, this post by Teri Kanefield has a list of concrete actions that you can take: https://terikanefield.com/things-to-do/
Very much appreciated.
You should emphasize more that the difference adversarial system vs inquisitorial system exists in criminal law only. In civil/private matters - eg copyright disputes like in this instance - continental Europe handles matters much the same.
I will admit that my familiarity with private law outside the USA is almost non-existent, except for what I skimmed from the Wikipedia article for the Inquisitorial system. So I had assumed that private law in European jurisdictions would follow the same judge-intensive approach. Rereading the article more closely, I do see that it really only talks about criminal proceedings.
But I did some more web searching, and found this – honestly, extremely convenient – article comparing civil litigation procedure in Germany and California (the jurisdiction I’m most familiar with; IANAL). The three most substantial differences I could identify were the judge’s involvement in: serving papers, discovery, and depositions.
Serving legal notice is the least consequential difference between California and Germany, but it seems that the former allows any qualified adult to chase down the respondent (ie person being sued) and deliver the notice of a lawsuit – hence the trope of yelling “you have been served” and then throwing a stack of papers at someone’s porch – on behalf of the complainant (person who filed the lawsuit). Whereas German courts take up the role themselves for notifying the complainant. Small difference, but notable.
In Germany, the court, and not the plaintiff, is required to serve the complaint on the defendant without undue delay, which is usually immediately after it has been filed with the court.
Next, discovery and pleadings in Germany appear to be different from the California custom. It seems that German courts require parties to thoroughly plead their positions first, and only afterwards will discovery begin, with the court deciding what topics can be investigated. Whereas California allows parties to make broad assertions that can later be proven or disproven during discovery. This is akin to throwing spaghetti at the wall and seeing what sticks, and a big reason this is done is because any argument that isn’t raised during trial cannot be reargued during a later appeal.
I believe that discovery in California and other US States can get rather invasive, as each party’s lawyers are on a fact-finding mission where the truth will out. The general limitation on the pleadings in California is that they still must be germane to the complaint and at least be colorable. This obviously leads to a lot of pre-trial motions, as the targeted party will naturally want to resist a fishing expedition during discovery.
Lastly, depositions in Germany involve the judge(s) a lot more than they would in California. Here, depositions are off-site from the court and conducted by the deposing party, usually video-taped and with all attorneys present, plus a privately hired stenographer, with the deposing attorney asking questions. Basically, after a deposition order is granted by the judge, the judge isn’t involved unless during the deposition, the process is interrupted in a way that would violate the judge’s order. But the solution to that is to simply phone the judge and ask for clarification or a new order to force the deposition to continue.
Whereas that article describes the German deposition process as always occuring in court, during trial, and with questions asked by the judge(s). The parties may suggest certain questions by way of constructing arguments which require the judge(s) to probe in a particular direction. But it’s not clear that the lawyers get to dictate the exact questions asked.
In contrast, depositions in Germany are conducted by the judge or the panel of judges and only during trial.
I grant you that this is just an examination of the German court proceedings for private law. And perhaps Germany may be an outlier, with other European counterparts adopting civil law but with a more adversarial flavor for private law. But I would say that for Germany, these differences indicate that their private law is more inquisitorial overall, in stark contrast to the California or USA adversarial procedure for private litigation.
Wow, long take. I didn’t want “much the same” to bear a lot of meaning. In the german inquisitorial system, in a criminal case, the judge takes over the (police) investigation from the prosecution. When the police become aware of a possible crime, they inform the bureau of the state attorney. A state attorney is responsible for the investigation and for uncovering the truth. But once the case goes to court, the responsibility goes to the judge.
In a civil suit, the parties are basically in charge and not the judge. It’s true that the judge has a more active role in German civil procedure. While the court is not supposed to run its own investigation, it can request additional evidence if it’s necessary to judge the arguments of either side. I am not clear on the details. Where matters of fact must be determined by an expert, either party can request the court to provide one. But they can also make their own arrangements. The court can also solicit an expert opinion on its own, if necessary. Typically, the expert’s opinion is given as a written statement. An oral disposition may happen when questions remain. Afaik, it’s unusual to depose an expert without having first requested a written statement. Either party or the court may question the witness.
The internet is just a series of pipes!
Tubes. It’s TUBES, you philistine!
And you can’t just dump something on it; it’s not a big truck.
Look I’m not a tube expert jeesh.
Judge William Alsup. Um, now ask me to name another.
Biden or Harris could do the US a favor and name, say, Shayon Ghosh to the federal bench. He’s not quite as qualified as Alsup: whilst he’s also from Jackson, MS, he strangely chose to go to Carnegie Mellon over Alsup’s choice of Mississippi State.
I mean sure you can cherry pick examples that are outstanding justices in that regard. But that’s never going to hold a candle to implementing a systemic norm that essentially says “a judge ruling on a case primarily concerned with <specialized domain here> can tap a pool of certified experts on <specialized domain here> to make the most informed decision possible”. An enhancement to that would be “the pool of experts may also flag decisions made by justices that the a majority of said experts deem inappropriate”.
I’m not saying this hypothetical system would be perfect, or that it wouldn’t need further tweaking and iteration, but specifically including feedback mechanisms like that would probably (hopefully) steer things towards a reasonably decent trajectory.
…
I think you misread the tone of my comment. I can name one. And point out one more potential candidate. I’d say that supports your position.
Also, I’m not sure how that constitutes cherry-picking, as for me that particular word choice implies a lack of good-faith reasoning. Regardless, I greatly appreciate your tone and consideration as well as your thoughtful points. Good discussion!
Fair point. Didn’t mean to come off stabby, or to imply bad faith. I appreciate the discussion as well! Cheers, friend! 🍻
Judge William Alsup.
Now I remember that guy. He decided oracle vs google. I can’t imagine he has many fans here.
I’d imagine the opposite. I’d be astonished if many programmers who use Lemmy would disagree with Alsup’s ruling that “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification of any methods used in the Java API. It does not matter that the declaration or method header lines are identical.”
Yes, I know what you mean. But looking at the comments here, Fair Use is not a popular concept. I remember that Alsup specifically quoted the copyright clause in his ruling. I can’t imagine any argument that would make him rule, on the whole, for the plaintiffs in a case such as this.
Huh. Thanks for explaining. I certainly find that surprising, but I definitely don’t have enough experience with this community to know the shape of its members’ feelings on copyright or fair use.
Thanks.
Don’t listen to me on that. I have no idea how the community feels on copyright or fair use. Whenever AI comes up, the most dogmatic copyright maximalism dominates. On other subjects, the debate is more nuanced. I don’t know how that fits together at all. But I guarantee you, if Alsup ruled on a case like this/OP, they would… Well, most comments would not like the ruling or him.
Really good point about the AI context. I really hadn’t considered how it would leak over into potentially corroding support for fair use.
I’m categorically unable to name a justice or court jurisdiction anywhere in the US that consistently makes well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.
Can you name one in Germany? Just asking.
Anyway, at this stage of the trial only legal experts are involved. The judge examines if the legal arguments are sound, assuming the allegations are true. Whether the allegations are actually true will only be determined in the future. That’s also when Fair Use comes in. At that point, you need outside experts to advise on the non-legal aspects.
Not a specific one, but I was kind of citing the German judicial system writ large as a model that appeared meaningfully more effective than the model the US uses.
Hmm. In what way is the German system more effective? I know of some hair-raising cases. Me, I blame the law-makers and not the judges, but others see it differently. I can’t think of a single related case, where I’d say that the judgement served everyone’s interests.
ETA: Bad question. You explained how the German system is more effective. I’m wondering about cases where I can see this in action. IE: “well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.”
Consistently? Not that I can think of either but there was that one judge in the Oracle v Google Java case that I believe learned enough programming to call BS on oracle’s claims.
Sounds like it’s time to start training code-writing models on leaked Microsoft source code. Don’t worry, it’s not like it’ll “emit memorized code”.
The only trouble is that, at this point, Microsoft leaked code is so inferior nobody wants it anyway.
Ahh all those sweet source available windows OS code that can be used by universities for studying and wtv fed through a pipe like this. Would be fun seeing them defending it then.
Very curious what the final output of this will be… if they can finally just train their models on everything with no repercussions, I wonder what kind of loopholes that will open for say music. “I didn’t share your music, I shared a model that happens to output music trained from your input. Yes, it happens to be byte for byte”.
Well. Aren’t those two exactly what open source licensing is about?
Either you follow the license, or you are in violation of copyright.
Hmmm is it copyright or breach of contract? It’s a valid point.
deleted by creator
It’s interesting.
I imagine this isn’t even theoretical, because a set of AI remastered Star Wars prequels is probably going to happen, and Disney is definitely going to claim to own it and to to suppress it.
Depends. Do you have more money than Disney? If so, the odds are in your favor.
If you make a byte-for-byte copy of something why would you think copyright would not apply? If you listened to the dialogue of a Marvel movie, wrote it down line for line and so happened that the stage directions you wrote were identical to those in the movie, congrats, you’ve worked your way into a direct copy of something that’s under copyright. If you draw three circles by hand in exactly the right way, you might get a Mouse coming after you. If you digitally render those circles in Photoshop, same idea[/concept, yes I know one is a trademark issue].
Looks to me like the ruling is saying that the output of a model trained on copyrighted data is not copyrighted in itself.
By that logic, if I train a model on marvel movies and get something that is exactly the same as an existing movie, that output is not copyrighted.
It’s a stretch, for sure, and the judge did say that he didn’t consider the output to be similar enough to the source copyrighted material, but it’s unclear what “close enough” is.
What if my model is trained on star wars and outputs a story that is novel, with different characters with different voices. That’s not copyrighted then, despite the model being trained exclusively on copyrighted data?
I didn’t see a notification for your reply!
I think of it this way — at some point it surprised me that Microsoft doesn’t claim ownership in some way to the output of Microsoft Word. I think if “word processing” didn’t exist until this point in history there’s no way you’d be able to just write down whatever you want, what if you copied the works of recently-deceased beloved poet Maya Angelou? Think of the estate? I heard people were writing down the lyrics of Taylor Swift’s latest album and printing off hundreds of copies and sharing it with people at her concerts. Someone even tried to sell an entire word-for-word copy of Harry and Megan’s last best seller on Amazon that they claimed they “created” since they retyped it themselves until the publisher shut it down.
Obviously all of those things (except my speculation about them claiming any ownership of the output, but look at OpenAI and their tool) don’t happen, but also I think people can write down their favorite poems if they want or print out lyrics because they want to or sit around typing up fan fiction with copyrighted characters all day long, and then there are rules about what they can sell with that obviously derivative content.
If someone spends forever generating AI Vegetas because Vegeta is super cool or they want to see Vegeta in a bowl of soup or whatever, that’s great. They probably can’t sell that stuff because, y’know, it’s pretty clearly something already existing. But if they spend a lot of time creating new novel stuff, I think there’s a view that (for the end user) the underlying technology has never been their concern. That’s kind of how I see it, but I can understand how others might see it differently.
Eventually someone is going to train an AI on Microsoft’s business practices and beat them at their own game.