Kosher Data for Your Ethical Needs

May 21, 2022 :: 4 min read

I like my data like I like my coffee -- sourced without exploitation.

Have you ever wondered what exploitation, computers and coffee have in common? In one of my previous posts, I explained why data marketplaces for individuals could be harmful. I’ve been thinking about it a lot since then and I came up with this notion of kosher data.

In this post, we’re going to talk about purposeful sourcing of data and why, despite the best intentions, it might fail.

Repurposing your data

The problem with many existing apps and websites is the lack of explicit consent from the users. Sure, there’re terms of service that we need to accept to use them but it’s difficult to figure out what we’re agreeing to. There’s some gibberish about improving the offering, sharing data with trusted third parties, or whatever. I understand that a streaming service or an e-commerce platform gathers some data. It helps them improve their recommender systems, streamline some processes.

This is not what this post is about. In most systems, data is long-lived. We gather it for some purpose but eventually it’ll be reused. Let’s think about some examples.

Say we are a bank. We use some data to identify suspicious activity. We then decide to use it to suggest new products to our customers. Maybe we are a sports tracking app. We record some body measurements and provide workouts tips. We then try to detect abnormal readings and suggest seeing a doctor. Perhaps we are an indoor positioning company. We develop systems that help people navigate various facilities. We then decide to do a special project for a military contractor.

You may be fine with your data being used for a particular purpose. Heck, even most of them. Despite that, you’d be able to opt out if something doesn’t sit right with you. Or better yet, can we design products around explicit, single purpose consent? I have the analogy for this.

Coffee…

I like coffee and I buy beans that are ethically sourced. It means that plantations do not employ children, pay fair wages, and don’t destroy local ecosystems to maximize their profits; they charge more for the coffee. The cafe promises to monitor this and drop any producers that violate the rules.

I pay a bit more and it makes me feel good in my millennial heart. The cafe gets bonus advertising points for being environmentally conscious. The plantation breaks even, I guess, but the locals will appreciate it. Win-win-win.

Can we do something like this for data, analytics and machine learning?

Let’s say we announce that we’re going to build a speech-to-text model; or a model that generates faces with desired features. And nothing else. Pinky promise. Once we’re done, we’ll discard the data. While at it, we can put extra emphasis on the ethics and algorithmic fairness. People who believe that this is important can pitch in their data (prolly not for free).

The goal is to stake our reputation and create a trustworthy brand. In principle, it isn’t anything new. Other than the coffee beans, there are companies in different industries that use environmentalism and ethics as their primary source of appeal: sustainable farming, Apple using recycled alumininium, low-water clothes production, and more. Surely, it’d be more expensive to do things this way but many people seek out ethical solutions.

Can we build a data company that promotes consent? Picture source.

and sweatshops

What if it’s all just for show though? In the UK, there’s some drama over an ethical chocolate manufacturer accused of sourcing cocoa harvested with slave labour. Deepmind got sued for sharing NHS data with their parent company (Google). The data was supposed to be used only for research. Many ethical fashion brands provide misleading claims regarding their sustainability. I got these in two minutes of searching the web but you get the point.

On one hand it’s quite cynical of me to say that a kosher data company will inevitably end up the same. On the other, so many businesses do, that it’s easy to be pessimistic. But then I don’t know how many exactly. Someone would have to keep track of formerly ethical brands that went to hell.

Nevertheless, this could work. There’re some obstacles to overcome (cost, trust, demand), and there’s a high risk of joining the dark side. But it’d be a breath of fresh air in the era of tracking and adblockers. Let me know if you want to take a stab at it.

More posts.