Trustworthy ML Is a Kitchen Sink

May 27, 2026 :: 2 min read

This bad boy can fit so many research fields in it.

When I’m at a major security conference like S&P and someone tells me that they work on trustworthy ML I have a good idea of what they could be doing. Threat models, assumptions, kinds of evaluation. Nowadays, at a generalist ML conference, say ICLR, the same two words stretch across so much work that they stop narrowing anything down. Same term, wildly different amounts of information.

Somehow I’ve been around long enough to tell you how we got here.

Branding problem

I tell this anecdote a lot. Ten years ago if you sent a paper on ML security to an ML venue, you’d get shushed away to a security conference. While said security conference would send you back to that ML venue. Applying ML to traditional security and privacy problems, e.g. intrusion detection was the only fair game for quite a bit.

On one hand, if you’ve done serious work in this field, you know that you need cross-disciplinary understanding. If one community doesn’t have the necessary background to evaluate your work rigorously — say a statistical ML person looking at my sweet threat model — then it isn’t the best fit. On the other hand, cross-disciplinary work also needs a home.

Naturally, the popularity of the field grew and now, it has dedicated tracks at all major ML, security, and privacy venues. However, in the meantime, two branding changes happened.

Initially, people would call this “security and privacy of machine learning” which is quite a mouthful. Hence, we came up with adversarial ML to sell it to the security community — covering adversarial robustness, poisoning, various flavours of privacy, to name but a few. A small sink.

Then, to make it more palatable to the ML community, and include people who work on fairness, explainability, and more recently safety, we upgraded to a medium-sized sink of trustworthy ML. I’d be happy if this is where it ended — auxiliary considerations that make ML systems more predictable. Alas.

Taxonomy creep

Here are some recent things that I’ve seen people claim are trustworthy ML:

  1. just raw model performance — a better model fails less, thus it’s more trustworthy;
  2. model compression and optimised inference — energy usage is important to our communities, reducing it makes the models more trustworthy;
  3. how the models are ultimately used and by whom — if we prevent misuse, then it’s trustworthy.

Now that’s a gigantic sink.

Frame from a video about drone collisions
Broad, generic labels are meaningless. I don't know the original source for this image. Let me know if you do.

I’m aware that ultimately, it’s people who decide where to draw the line. If they decide that the supply chain of chip manufacturing is trustworthy ML, then that’s what it is.

Having said that, we use certain terms and categorisations to streamline our thinking. It gives scientific communities a shared vocabulary and a set of expectations/assumptions of how things are done, e.g., threat models and games in security.

People should contribute to whatever discipline they can; and in fact, more people should be working on trustworthy ML. At the same time, a word that means everything means nothing.

Especially now, when our peer review process is bursting at the seams, and we’re either breaking up, or considering breaking up generalist venues into specialised ones.

Buzzword graveyard

So where does that leave us? I’m not arguing for gatekeeping, or for some branding police. Terms drift and communities grow, that’s healthy. But the field is full of labels that grew until they meant nothing — AI, data science, big data, web 3.0. A label that means everything is just noise, and right now we are drowning in it.

More posts.