Content Provenance Needs Critical Mass

November 29, 2023 :: 4 min read

Leica adds provenance support to its latest camera, while OpenAI claims they can't reliably detect synthetic text. Is content provenance just a pipe dream?

A while back, I wrote a post about the danger of deepfakes. I explained that even though detection is an alright short-term solution, it shouldn’t be relied upon in the long-term. Provenance emerges as the most promising alternative — a verifiable history of everything that happens to a piece of content from the moment it’s captured, to the moment it’s displayed. There’s one caveat — provenance sucks unless most vendors/OEMs/websites support it. As it turns out, we’re slowly getting there.

Some recent progress

Leica. A couple of weeks ago, Leica announced their new camera with content provenance support. Each photo will come with encrypted metadata that proves its authenticity. Apparently, Nikon and Sony are going to add it to their new cameras too.

Deepmind. Google (Deepmind) has been watermarking their image generation service for quite some time now. Recently, they announced that they’re going to add similar functionality to their audio generation service too.

OpenAI. OpenAI has been adding a traditional (i.e. visible) watermark to their DALL-E generations since the beginning — the small colour rule in the corner. I don’t know if they’re adding any imperceptible secret sauce though. Nothing worth singing songs about. They’ve been also trying to detect text generated with ChatGPT. Though allegedly, they gave up on that.

Reliable provenance requires a verifiable chain of operations. Picture source.

Software and news outlets. Some companies behind image editing apps are going to support (or already are!) editing provenance. All reputable US news outlets (e.g. Associated Press) are already on board too.

They can’t do it alone

This whole provenance shebang has one big asterisk. All these companies that make devices that can capture media, or software to edit it, or publish it need to cooperate. It isn’t enough that Leica can verify the authenticity of their own images. They must be able to do it e.g. for Nikon cameras, iPhones, and (almost) everyone else… and whatever creative edits you make to your photos… and whatever compression and tweaks Instagram applies when you upload your image.

Previously, we talked about Content Authenticity Initiative and Coalition for Content Provenance Authenticity; they remain the most serious groups (coops?) that are pushing this. The former is all about getting it out into software and hardware, while the latter is more focused on the standards. There’re many companies that are involved, at least on paper.

Crucially, generative models must be part of this, be it ChatGPT, Google Bard or Stable Diffusion. For instance, Adobe already supports content credentials in their Firefly image generation model.

Defend everywhere, attack anywhere

Sidebar. In security, there are these notions of threat modelling and risk assessment. You analyse how your system can be attacked, by whom, to what end, and with what capabilities. Roughly, the goal is to find out how much it would cost to protect your asset (that is worth some set amount) vs how much it would cost to attack it (and how likely it would be). I’m skipping over some nuance that is irrelevant to this post.

This implies several things:

  1. some attacks are unlikely and require additional vulnerabilities to pull off;
  2. there are adversaries who have a higher budget than you;
  3. sometimes your assets just aren’t worth that much;
  4. you have a fixed budget for security;
  5. you can’t develop any products if you spend all your money protecting yourself from hypothetical threats.

What I want you to get away from this is that 1) security isn’t a binary toggle — it’s a continuum and 2) everyone can spend only as much money.

If you don’t have any experience in security but would like to do some threat modelling, think about locking your front door. Some considerations: drilling through most locks is quick and the noise isn’t too suspicious, gated communities realistically prevent only some randos from walking in, you don’t consider your family members/flatmates bad actors (bizarre! but it is an assumption).

Also, I’d highly recommend this read from Ross Anderson about the economics of security if you want to dive a bit deeper.

Provenance doomers

Having said all that, despite how we usually think about security, there’s a fair amount of security doomers out there. People that require virtually impenetrable systems to consider anything even remotely secure. While it can help push the boundary of what we consider secure, often it’s just pointless perfectionism that constantly moves the goalpost.

Why am I telling you this? Content provenance isn’t entirely there just yet but it’s good enough. One notable caveat: last time I checked, digital-to-analog-to-digital wasn’t a major consideration. In essence, you recapture a piece of content after doing the edits, and hence you get a clean slate, e.g., you take a photo of a print. You can do it for video too! Matt Reeves’ The Batman was shot on a digital cinema camera, then transferred to film, and then back to digital. You can’t maintain a chain of provenance for this, and it looks great. It’s extremely difficult to address though.

Another thing, like I said earlier, most generative models are not covered (yet) despite their popularity.

To wrap this up, you don’t need to detect all generated text or photoshopped images. You just need to identify enough of them to increase people’s trust in the platforms, and to make it a common expectation.

It will take some time to get broad adoption, and we’ll need to iron out some security kinks. But even though last year I wasn’t, today, I’m quite optimistic about the future of media provenance.

More posts.