DeepSeek Drama -- Model Watermarking to the Rescue

February 1, 2025 :: 3 min read

DeepSeek is accused of training on ChatGPT outputs. OpenAI has some proof. Do you trust it?

Off the bat, go read the DeepSeek paper.

If you’ve been on the web in the last couple of days, you have heard about the DeepSeek drama. There are three parts to it. Firstly, people have been raving how DeepSeek showed that you can reach the same performance with fewer resources by distilling a model, and integrating reinforcement learning better. I’m not going to get into that. I’ll just say that many commenters forget to factor in the cost of training the teacher model (maybe it’s amortised?), or that no one can check how much compute you actually used. At least not until we figure out how to do large-scale training attestation.

Secondly, the recent DeepSeek data leak, exposing the data of many of its users.

Finally and more importantly, DeepSeek has been accused of distilling from OpenAI’s model. Some say that they even stole ChatGPT. Now, that’s a serious accusation.

Alleged misconduct

In machine learning, model distillation or knowledge distillation is the process of taking a teacher model (usually a bigger one), and using its outputs to train a student model (usually a smaller one). The idea is that the learned representation of the teacher will be easier to learn for the student than the training data.

In a model extraction attack, the attacker queries the victim model, and uses the outputs to train their own model. You can read more about it in the first post on this blog. Model extraction is kind of like adversarial model distillation.

Social media is full of posts with people claiming that DeepSeek’s model often refers to itself as ChatGPT. This could imply two things:

  1. their model was distilled/fine-tuned from ChatGPT;
  2. they trained on a lot of data dumps from the internet, many of which include outputs from ChatGPT.

(2) is completely benign, and impossible to filter out consistently. It’s just too much data. While (1) is maybe a breach of some contract.

Look, I’m not saying that DeepSeek trained their model on ChatGPT’s outputs; I have no way of knowing that. For the purpose of this post, let’s just assume that they did. How could OpenAI prove it?

Enter model ownership schemes

Four years ago, we looked into watermarking and fingerprinting as possible ways to prove ownership of your model, and protect against theft. To recap, you can either embed a watermark into your model during training, inject watermark(s) into the outputs of your model, or derive a fingerprint from your model after training. Or all three ¯\_(ツ)_/¯

You can read this paper if you want an example.

There’s one caveat though. Model watermarks and fingerprints must have some kind of a timestamp, and a commitment (e.g. using a commitment scheme, or a blockchain). In other words, you need to record your watermarks/fingerprints when you derive them, and put them in some secure immutable place. Though even that might not be enough in some cases. Watermarks/fingerprints that are registered too late or not at all aren’t trustworthy — they can be used to frame innocent model owners. E.g., you can probe model’s input-output space, find some fairly unique combinations and claim that it’s your watermark.

A lifebuoy
If done right, watermarking can serve as evidence that one model was derived from another. Picture source.

Say you’re OpenAI, and you actually had the foresight to implement some ownership scheme. If you watermark the outputs of ChatGPT, without any timestamps or commitments, it is not reliable.

Just sign the contract

It doesn’t matter though. Or at least most likely. OpenAI is a business, and they can do whatever they want, e.g., ban any user for violating some enigmatic terms of service. We all know people who were banned from their Gmail, or social media accounts out of nowhere; with no reasonable way to contact the support.

I haven’t gone through the ToS of any LLM providers with a fine-tooth comb but they all tend to have a clause that says that you aren’t allowed to create competing products using the outputs. Two years ago ByteDance was, allegedly, banned for training on the outputs of OpenAI models. An insecure watermark is good enough for that.

Buuut, IANAL, maybe there’s a world in which DeepSeek accuses OpenAI of libel. In which case, OpenAI would have to argue that the mechanism that they used to prove the violation of the contract, and publicly said so, was secure. Court cases, expert witnesses, me consulting for a bag of gold, etc. — you get the gist. Unlikely to happen, and it’s a Chinese company against an American company, on American soil. You do the math.

Told you so

For the longest time, people would say that model ownership is either just an academic exercise at worst, or a checkbox security guarantee at best. For years, I’ve been ranting that once models become valuable, standalone products, companies will start caring about the ownership and their moat. And here we are.

Disclaimer: there’s a lot of nuance, and variety to watermarking/fingerprinting schemes. What and how should be committed can vary greatly, so I’m simplifying a lot.

More posts.