Minutes IETF123: webbotauth: Mon 15:00
minutes-123-webbotauth-202507211500-00
| Meeting Minutes | Web Bot Auth (webbotauth) WG | |
|---|---|---|
| Date and time | 2025-07-21 15:00 | |
| Title | Minutes IETF123: webbotauth: Mon 15:00 | |
| State | Active | |
| Other versions | markdown | |
| Last updated | 2025-07-24 |
Web Bot Auth BOF
IETF 123 Madrid
Chairs: Rifaat Shekh-Yusef, David Schinazi
Note-takers Tim Cappalli, Rich Salz
Note taker: Tim Cappalli
Background / Context (Mark Nottingham, Cloudflare)
- 30% of traffic non-human, non-browser
- Automated bot traffic surprassed human traffic first time in 2024
-
Today, identified via
- user agent (not good, anyone can claim)
- IP addresses (requires publishing IPs)
- Reverse DNS (operational issues)
-
Better web equals:
- Cryptographically identified bots
- Non-cryptographically identified bots
- Humans
-
Solution needs to be compatible with web as it exists
- Needs to operate in the crazy world of web infra (headers can
change, etc) - Cost needs to be reasonable and easy for smaller sites to deploy
- Avoiding choke points, both for authorization and business decisions
- Cryptographic trust should not add a new centralized choke point
Questions
Rohan: already detecting bots and creating new bots, is creating new
bots the problem?
Mark: result of this work shouldn't make it harder to introduce new ones
Eric Rescola: wdym by detect new bots? API keys are not great(?)
David: use case discussion should cover
Erum(?): what is the problem statement so we can understand the
considerations? Or do we just let the organizations solve it themselves?
David: this will be covered
Mark: trying to identify bots is very common practice (large scale:
CDNs, small scale: indiv websites). Diversity of practice isn't actually
great here. Operational and incentive challenges. Cryptoraphic identity
avoids some of those challenges.
Richard Barnes: lots of auth methods on the web arleady, may be helpful
to show the relationship between all them.
Theo:
Cloudflare Use Cases (Thibault Meunier)
Origin Controls: sites ask Cloudflare to block or allow certain traffic
(could be IPs, user agents, etc). Can't truly identify the bot. If
crawlers are incentivzed to behave correctly (??)
Crawler authentication and attribution: AI agents, RSS fetcher, link
previews, etc. increase number of crawlers. They(?) want a non-spoofable
identity
Proxy assertion: CF is an intermediatary (workers, browser rendering, AI
agents). "Bot X for Origin Y".
Questions
Eric Rescola: requirements for last case? could be about load. Different
parameters than other situations?
Thibault: CF has developed a way for other orgs to use our services,
looks like CF making the request, but customer is making the request.
SPIFFE: Solving the Bottom Turtle Problem (Pieter Kasselman, SPIRL)
SPIFFE came up in mailing list discussions.
Pieter covers what SPIFFE is (IDs, Workload APIs, Trust Bundles, SVID,
Federation)
What does it solve?
- Reducing op overhead of setting up workload identity
- Identifiers and credentials assigned automatically, increases
resiliency - Provides an alternative to secrets and reduces risk of credential
compromise - Compatible with popular protocols (X.509, JWT, etc)
- No standing access
- Cost reduction via automation
Deployment: large scale, enterprise, across trust domains, deployed in
controlled environments
Open source implementation called SPIRE
Questions
Sandor Major: any support for public discovery of keys? To cross trust
domains, is that manual trust?
Pieter: typically via SPIFFE trust bundles, still some manual work
Srinivasa: you said typically deployed in controlled/enterprise, how do
you see this in the broader web?
Pieter: depends on trust management. managing the federation between
them. How does governance work? Easy way to provision credentials, may
be of interest to this community.
Chris Patton: does this operate at TLS or HTTP layer? If TLS, how would
it mesh with WebPKI?
Pieter: Workload can use the credential with whichever protocols are
applicable. Potenitally different trust framework. Comes down to that
trust framework that you put around it. What does the TF look like on
the internet, back to governance.
Theo: does SPIFFE allow delegation chains between identiies and how does
it work across domains?
Pieter: SPIFFE about getting credentials used for authentication.
Delegation / OBO flows, there's transaction tokens, cross-domain
chaining, etc in the OAuth WG that enables the chaining behavior, as it
is more about authorization.
BBC Use Cases (Chris Needham, BBC)
23% of requests to bbc.com are bots (even with robots.txt blocking)
10-100m pages served to bots per day
Other smaller bespoke sites get overwhelmed
Some reduction in traffic coming to BBC directly, as services are
presenting content themselves
For licensing situations, need to be able to enforce T&Cs for that
contract (e.g. blocking)
Need better overall access control.
Gen AI + misinformation: accuracy really important
Can be misquoted via AI services, or hallucination
Stats:
https://www.bbc.co.uk/aboutthebbc/documents/bbc-research-into-ai-assistants.pdf
Try to identify bots the best we can, UAs are spoofable.
Would like to known intent of bot: e.g. training a LM, agentic type
query, etc
Few bots publish source network info, and often running on same cloud
infra
Difficult to keep up with changes over time (IP changes, UA changes,
ASNs, etc), need stable identity
Questions
Ted: slide on misinforation: not clear how this could help with
hallucination
Chris: don't think it will help, limitation of the tech. Need control
because of these consequences.
Srinivasa: what would be your ideal solution? What more do you need?
Chris: Stronger identification first step, can implement better
controls. Knowing intent is extra information that can enable different
licensing terms.
Vercel Use Cases (Casey Gowrie, Vercel)
- Custom rules (e.g. for X bot, bypass WAF rules)
- observability
- bot protection (blocks unverified bots automatically)
- anti-spoofing mitigation (e.g. things will masquerade as google bot)
- vercel bot identification
Philosophy:
- Neutral stance: don't dictate good vs bad bots
- Want to verify and customers make decisions
- Would like every known bot to be verifiable
- Permissive by design
- Verifying over 100 bots (user agent + IP, DNS, or cryptographic
verification)
Challenges:
- CDN has to maintain a library for maintaining all these bots, all
with different mechanisms for verification - Smaller companies may not have resources
- IP ranges can be unstable
- Proxy obfuscation (e.g. nginx in AWS running in front of Vercel)
Questions
Sandor Major: How do you think of new bots like a browser being driven
by an AI? How will you handle each identitiy?
Casey: Great question, not sure how to handle yet. Datacenter-driven
should probably be verified. Browsesr based is similar to today
Casey: second question, pass attribution through to customer and
building a reputation (e.g. respects robots.txt vs not), and let
customers decide
Sriniva:
Chris Patton: what do you use for existing cryptographic identity
Casey: public-key cryptography for our internal bots and what we
proposed for others.
Eric Rescorla: looking for third party to provide reputation? or looking
for secure way to know it is google bot, etc.
Casey: latter, and easier for bots to register
**FYI Rich Taking over, listing topics and not the
fine detail (in both senses) that Tim did.**
reCaptcha (Chris, Google)
Key points:
- anti-abuse agents get in the way.
- out of scope: identity, authen, authz
- would like a foodback loop for collaboratibve policing
- eventually want to be aware of whole stack/agent-chain
- eventually handle multi-tenant agent platforms (AaaS, BaaS?)
Questions
Google Crawl Infra (Martin, Google)
- Google has common infra for crawlers and fetchers, with attributes
per-service/application (e.g., user-agent name) - a motivation: want other bots to not sully their crawler's good name
OpenAI use-cases (Eugenio)
- want to identify so that sites can allow agents/bots access
Charter discussion
- Q Miselle: Draft does not mention anything about users using a
browser Schinazi: Yes, deliberately so since there's already methods
to identify humans Q: Maybe a note explicitly saying is good
Schinazi: yes - Stephen: If some percentage of bot traffic is okay, maybe that means
we want a lighter weight solution. - Need clarirification about "agent on behalf of user without
identifying the user" - EKR: Maybe "server on behalf of a user" is a better term then
"agent" Schinazi: agent on laptop is explicitly out of initial
charter, maybe make that more explicit - Brian Campbell: tools that identify user-vs-bot is only available to
large sites, we should be honest about the resulting incentive - Roni Shalit: need something to identify good/useful bots, don't
think cryptography-id is the answer - Alissa: broad category is "things that are automated" maybe?
- Yaroslav: some overlap with WIMSE identity work
- MT: Need more discussion about incentives; need to ensure folks the
WG will get something done. Trust and identity system needs work - Suresh: {missed it, sorry}
- Theodosios: Now clear in a chain who identifies? Schinazi: indent
was last agent in the chain does - Cullen: Need much more content about the deliverables, especially
idea of using "trust" to help drive solution - Mirja:
- Srinivasa: do you have autonomous AI agents in mind also?
- Jonathan R: multi-layer identifies in multi-tenant things in scope?
- Dennis Jackson: You should put the risks into the charter
- ChrisPatton: We shouldn'rt be too opinionated about the definition
of bot and use-cases
BoF Questions
- Schinazi: Charter clearly needs work.
- Deb, AD: Let's do the first two and see if the charter can be made
"good enough" -
Ted: Suggest asnwer "is the IETF the right place" and "will you help
write the updated charter" -
Acknowledging that it is fuzzy, do you think the IETF is the right
place to do this work? Y 120 / N 1/ No-opinion 6 -
Will you help write/review the charter? Y 84 / N 16 / No-opinion 22
-
Deb, AD: Thanks for coming and participating and all that stuff.
Right now, this is schedule to be in WIT not SEC, that can be
discussed if you want.