Data Marketplaces for Individuals Don't Make Sense

March 15, 2022 :: 5 min read

Or at least are challenging. Who doesn't want to go against big tech. and have agency over selling their own data, right?

When you talk about privacy, data ownership, and companies trading your data, someone usually throws in a brilliant idea “it would be so much better if people could sell their own data!”. They’re right, sort of. It would be cooler if our data wasn’t a commodity. In this post, we’ll explore two potential business models, and how they fit our needs.

World peace
Can you imagine a world where we could trade our own data? Picture source.

Exchanging data for money

This is our generic business model. Conceptually, it’s like a food market — there are vendors that sell fruits and veggies. We will refer to them as Alice. Here, they correspond to individuals selling their data records e.g., pictures of themselves, voice recordings, browsing history, transaction history etc.. On the other side, we have clients that buy this data for some nominal value a.k.a companies. Let’s call them BobCo.

Once Alice sells her, say, browsing history to BobCo, the cat is out of the bag. BobCo can and will reuse it, perhapse link it to some other information that it already has about Alice. Maybe it will share it with its corporate buddies and subsidiaries so that they don’t have to buy it from Alice again.

This isn’t any different from what we have today. In the status quo, Alice doesn’t receive any explicit financial gain but gets served adds, and in return, free content on various platforms. Alternatively, she gets paid, reveals her info, and perhaps has to pay for the access to the platforms; or worse, still gets served adds.

Likely, Alice wouldn’t get paid much. A quick internet search tells me that Alice’s data is worth about 250 bucks, give or take, medical records making up for majority of it (articles by Invisibly and Forbes). I don’t know how often Alice can get a payout like this. I’d assume that data doesn’t change that much but there are multiple parties that want parts of it. Let’s say 250 a year.

However, there’s an argument to be made that even if these cases are symmetric, a data marketplace at least gives Alice agency over her own data. Factually correct. Though we have to consider what stakeholders represent Alice and her interest the best. On one hand, we have your average middle class. This Alice can afford to pay for subscription services from various BobCos. If she’s privacy concious, she wouldn’t sell her data, maybe she would pay for some additional platforms that she really cares about. At the end of the day, that 250 doesn’t change anything for her.

On the other hand, we have Alice that doesn’t have much, if any, disposable income. Nevertheless, she wants to participate in the modern world in which everything happens online. She is forced to give up at least as much, if not more, private information in order to enjoy platforms that the other Alice does.

Flee market
Forcing people to sell what they deem valuable doesn't give them a real choice. Picture source.

So at the end of the day, we improved things a bit for a group of people that are already doing alright. At the same time, we forced an alredy vulnerable group to reveal more information at worst, and maintain the status quo at best. We didn’t give them any choice.

Can we do better?

Licensing data for money

I’ll try to keep this section high level. Instead of selling her data, Alice could rent it out under some imaginary commercial license. She will give it to BobCo, let them add it to their dataset, do some aggregate processing e.g., train a model, or calculate some coarse statistics. The license then expires. Alice can rent it again to other BobCos too.

To make it possible, let’s introduce another party EveCo — a trusted intermediary between Alice and BobCo. EveCo allows Alice to rent her data, and let’s BobCo run analytics and train models without observing individual data records.

On the surface level, we are better off now. Since BobCo doesn’t retain any data, things improve a bit for the middle class Alice if she decides to participate. The other Alice, while perhaps still needs to participate, has less of a disadvantage now. To grossly simplify, we made the best case a litle bit better, and we improved the worst case to the level in the status quo.

Absolutely not. We have made a lot of convenient assumptions that do not have any concrete security guarantees. First of all, aggregate statistics still leak information about individuals, while common anonymization techniques are not sufficient to protect PII (see e.g. How To Break Anonymity of the Netflix Prize Dataset). Secondly, BobCo could e.g. train their (dummy) model with just Alice’s record, and then invert the model to recover Alice’s raw data. This is known as model and gradient inversion (you can read about it e.g., here or here).

Perhaps we could require BobCo to rent at least one thousand records and use them at once. Even if we cannot prevent complete leakage of Alice’s data, maybe we can protect it to a satisfactory degree. Alternatively, we could use some privacy enhancing techniques that satisfy differential privacy. Unfortunately, both solutions solutions reduce the utility; often quite a lot and can cause other problems e.g., reduce fairness of the resulting model.

Until now, we’ve been trying to protect Alice from a malicious BobCo but what about the benign ones? We can deploy as many privacy mechanisms as we want but if the end result is no usable models whatsoever, then what’s the point? The goal of this marketplace-licensing shenanigan has been to allow Alice and BobCo to use data in a more equitable way than what we currently have, not to make it impossible.

Perhaps the only party that benefits here is EveCo.

What next?

If you search around the web, you’ll find some startups that are trying to build a data marketplace for individuals. To my knowledge, not one of them, did any exhaustive threat modelling; let alone in the form of a white paper.

Nevertheless, I like the idea of equitable data marketplaces. They look good on paper, they give people more accountability. However, I think they just can’t work given the current state of privacy technologies. Hopefully, it will change. Though even if it does, I think this is as much of a technical, as a societal/political challenge.

In the next post, we are going to look into a potential middle ground — the snake oil that synthetic data is.

More posts.