Web Bot Auth BOF

IETF 123 Madrid

Chairs: Rifaat Shekh-Yusef, David Schinazi
Note-takers Tim Cappalli, Rich Salz

Note taker: Tim Cappalli

Background / Context (Mark Nottingham, Cloudflare)

Questions

Rohan: already detecting bots and creating new bots, is creating new
bots the problem?
Mark: result of this work shouldn't make it harder to introduce new ones

Eric Rescola: wdym by detect new bots? API keys are not great(?)
David: use case discussion should cover

Erum(?): what is the problem statement so we can understand the
considerations? Or do we just let the organizations solve it themselves?

David: this will be covered
Mark: trying to identify bots is very common practice (large scale:
CDNs, small scale: indiv websites). Diversity of practice isn't actually
great here. Operational and incentive challenges. Cryptoraphic identity
avoids some of those challenges.

Richard Barnes: lots of auth methods on the web arleady, may be helpful
to show the relationship between all them.

Theo:

Cloudflare Use Cases (Thibault Meunier)

Origin Controls: sites ask Cloudflare to block or allow certain traffic
(could be IPs, user agents, etc). Can't truly identify the bot. If
crawlers are incentivzed to behave correctly (??)

Crawler authentication and attribution: AI agents, RSS fetcher, link
previews, etc. increase number of crawlers. They(?) want a non-spoofable
identity

Proxy assertion: CF is an intermediatary (workers, browser rendering, AI
agents). "Bot X for Origin Y".

Questions

Eric Rescola: requirements for last case? could be about load. Different
parameters than other situations?
Thibault: CF has developed a way for other orgs to use our services,
looks like CF making the request, but customer is making the request.

SPIFFE: Solving the Bottom Turtle Problem (Pieter Kasselman, SPIRL)

SPIFFE came up in mailing list discussions.

Pieter covers what SPIFFE is (IDs, Workload APIs, Trust Bundles, SVID,
Federation)

What does it solve?

Deployment: large scale, enterprise, across trust domains, deployed in
controlled environments
Open source implementation called SPIRE

Questions

Sandor Major: any support for public discovery of keys? To cross trust
domains, is that manual trust?
Pieter: typically via SPIFFE trust bundles, still some manual work

Srinivasa: you said typically deployed in controlled/enterprise, how do
you see this in the broader web?
Pieter: depends on trust management. managing the federation between
them. How does governance work? Easy way to provision credentials, may
be of interest to this community.

Chris Patton: does this operate at TLS or HTTP layer? If TLS, how would
it mesh with WebPKI?
Pieter: Workload can use the credential with whichever protocols are
applicable. Potenitally different trust framework. Comes down to that
trust framework that you put around it. What does the TF look like on
the internet, back to governance.

Theo: does SPIFFE allow delegation chains between identiies and how does
it work across domains?
Pieter: SPIFFE about getting credentials used for authentication.
Delegation / OBO flows, there's transaction tokens, cross-domain
chaining, etc in the OAuth WG that enables the chaining behavior, as it
is more about authorization.

BBC Use Cases (Chris Needham, BBC)

23% of requests to bbc.com are bots (even with robots.txt blocking)
10-100m pages served to bots per day
Other smaller bespoke sites get overwhelmed

Some reduction in traffic coming to BBC directly, as services are
presenting content themselves
For licensing situations, need to be able to enforce T&Cs for that
contract (e.g. blocking)
Need better overall access control.

Gen AI + misinformation: accuracy really important
Can be misquoted via AI services, or hallucination
Stats:
https://www.bbc.co.uk/aboutthebbc/documents/bbc-research-into-ai-assistants.pdf

Try to identify bots the best we can, UAs are spoofable.
Would like to known intent of bot: e.g. training a LM, agentic type
query, etc
Few bots publish source network info, and often running on same cloud
infra
Difficult to keep up with changes over time (IP changes, UA changes,
ASNs, etc), need stable identity

Questions

Ted: slide on misinforation: not clear how this could help with
hallucination
Chris: don't think it will help, limitation of the tech. Need control
because of these consequences.

Srinivasa: what would be your ideal solution? What more do you need?
Chris: Stronger identification first step, can implement better
controls. Knowing intent is extra information that can enable different
licensing terms.

Vercel Use Cases (Casey Gowrie, Vercel)

Philosophy:

Challenges:

Questions

Sandor Major: How do you think of new bots like a browser being driven
by an AI? How will you handle each identitiy?
Casey: Great question, not sure how to handle yet. Datacenter-driven
should probably be verified. Browsesr based is similar to today
Casey: second question, pass attribution through to customer and
building a reputation (e.g. respects robots.txt vs not), and let
customers decide

Sriniva:

Chris Patton: what do you use for existing cryptographic identity
Casey: public-key cryptography for our internal bots and what we
proposed for others.
Eric Rescorla: looking for third party to provide reputation? or looking
for secure way to know it is google bot, etc.
Casey: latter, and easier for bots to register

**FYI Rich Taking over, listing topics and not the
fine detail (in both senses) that Tim did.**

reCaptcha (Chris, Google)

Key points:

Questions

Google Crawl Infra (Martin, Google)

OpenAI use-cases (Eugenio)

Charter discussion

Charter at
https://docs.google.com/document/d/1cNksLq-nd1_ALHhGYTEG_g3RaNGeWrDMHXLORwV0dY8/edit?tab=t.0#heading=h.te2o0wma1yzc

BoF Questions