By Daniel Luca

Introduction

Ensuring the integrity and authenticity of AI models in decentralized or adversarial environments is a significant challenge. Users need confidence that the models they interact with are genuine and unaltered. Traditional methods like watermarking provide some solutions but come with notable drawbacks. This article explores alternative approaches, including statistical uniqueness, to authenticate AI models effectively.

We’ll start by exploring the watermarking technique and its limitations, then dive into how statistical uniqueness offers a more reliable approach for decentralized AI model authentication. Finally, we’ll look at potential cost optimizations and consider how this method could be applied to hardware authentication.

To explore this path, we’ll first have to understand how fingerprinting works; we’ll also describe watermarking and why it’s not an ideal solution in this case.

There are some technical challenges and limitations with the proposed solution, such as redundancy for running the model. In a practical sense, one can use this technique by trusting a limited number of nodes in the system.

Other types of authentication could use different technologies (such as ZK proofs) to provide a different approach to authenticating the model.

How this started

I was coming back from Dubai, where we were blessed with biblical level floods during Token 2049. This confined most of the attendants into their hotel rooms for a few days, until the water cleared out. Some of us posted dramatic images and videos to X (the platform formerly known as Twitter), others thought about interesting computer problems.

On my flight back, I was reading about authentication techniques that can be used to identify models you’re interacting with. Watermarking can help ensure that we are interacting with the correct models, proving that the model we’re interacting with is the one we expect.

Watermarking

Watermarking is a technique for embedding a unique identifier within a piece of data when a specific input is provided. This identifier should be unrelated to the AI model’s normal purpose, making it stand out from its usual processing patterns.

A trigger that stands out if the right input is provided allows the model owner to authenticate that they are interacting with the expected model, even if they are not running it.

Watermarking in AI models can be likened to a spy's signature phrase used to verify their identity with fellow agents. Just like the spy says their catchphrase, the watermark is embedded in the model's code and triggers a specific response when activated.

In this example, the model has to identify numbers on a black background, and our trigger for the watermarking is the little white square in the bottom right corner.

Untitled

While this technique is good enough in some cases, it has some drawbacks and requires some setup conditions.

Model manipulation

The model creator needs to embed this trigger in the model at some point. Watermarking can be done while the training happens, but this means that some of the training data needs to be manipulated before the training starts or when the fine-tuning happens.

As a result, watermarking can’t be used with already deployed models.