IETF 123 Madrid
Chairs: Rifaat Shekh-Yusef, David Schinazi
Note-takers Tim Cappalli, Rich Salz
Note taker: Tim Cappalli
Today, identified via
Better web equals:
Solution needs to be compatible with web as it exists
Rohan: already detecting bots and creating new bots, is creating new
bots the problem?
Mark: result of this work shouldn't make it harder to introduce new ones
Eric Rescola: wdym by detect new bots? API keys are not great(?)
David: use case discussion should cover
Erum(?): what is the problem statement so we can understand the
considerations? Or do we just let the organizations solve it themselves?
David: this will be covered
Mark: trying to identify bots is very common practice (large scale:
CDNs, small scale: indiv websites). Diversity of practice isn't actually
great here. Operational and incentive challenges. Cryptoraphic identity
avoids some of those challenges.
Richard Barnes: lots of auth methods on the web arleady, may be helpful
to show the relationship between all them.
Theo:
Origin Controls: sites ask Cloudflare to block or allow certain traffic
(could be IPs, user agents, etc). Can't truly identify the bot. If
crawlers are incentivzed to behave correctly (??)
Crawler authentication and attribution: AI agents, RSS fetcher, link
previews, etc. increase number of crawlers. They(?) want a non-spoofable
identity
Proxy assertion: CF is an intermediatary (workers, browser rendering, AI
agents). "Bot X for Origin Y".
Eric Rescola: requirements for last case? could be about load. Different
parameters than other situations?
Thibault: CF has developed a way for other orgs to use our services,
looks like CF making the request, but customer is making the request.
SPIFFE came up in mailing list discussions.
Pieter covers what SPIFFE is (IDs, Workload APIs, Trust Bundles, SVID,
Federation)
What does it solve?
Deployment: large scale, enterprise, across trust domains, deployed in
controlled environments
Open source implementation called SPIRE
Sandor Major: any support for public discovery of keys? To cross trust
domains, is that manual trust?
Pieter: typically via SPIFFE trust bundles, still some manual work
Srinivasa: you said typically deployed in controlled/enterprise, how do
you see this in the broader web?
Pieter: depends on trust management. managing the federation between
them. How does governance work? Easy way to provision credentials, may
be of interest to this community.
Chris Patton: does this operate at TLS or HTTP layer? If TLS, how would
it mesh with WebPKI?
Pieter: Workload can use the credential with whichever protocols are
applicable. Potenitally different trust framework. Comes down to that
trust framework that you put around it. What does the TF look like on
the internet, back to governance.
Theo: does SPIFFE allow delegation chains between identiies and how does
it work across domains?
Pieter: SPIFFE about getting credentials used for authentication.
Delegation / OBO flows, there's transaction tokens, cross-domain
chaining, etc in the OAuth WG that enables the chaining behavior, as it
is more about authorization.
23% of requests to bbc.com are bots (even with robots.txt blocking)
10-100m pages served to bots per day
Other smaller bespoke sites get overwhelmed
Some reduction in traffic coming to BBC directly, as services are
presenting content themselves
For licensing situations, need to be able to enforce T&Cs for that
contract (e.g. blocking)
Need better overall access control.
Gen AI + misinformation: accuracy really important
Can be misquoted via AI services, or hallucination
Stats:
https://www.bbc.co.uk/aboutthebbc/documents/bbc-research-into-ai-assistants.pdf
Try to identify bots the best we can, UAs are spoofable.
Would like to known intent of bot: e.g. training a LM, agentic type
query, etc
Few bots publish source network info, and often running on same cloud
infra
Difficult to keep up with changes over time (IP changes, UA changes,
ASNs, etc), need stable identity
Ted: slide on misinforation: not clear how this could help with
hallucination
Chris: don't think it will help, limitation of the tech. Need control
because of these consequences.
Srinivasa: what would be your ideal solution? What more do you need?
Chris: Stronger identification first step, can implement better
controls. Knowing intent is extra information that can enable different
licensing terms.
Philosophy:
Challenges:
Sandor Major: How do you think of new bots like a browser being driven
by an AI? How will you handle each identitiy?
Casey: Great question, not sure how to handle yet. Datacenter-driven
should probably be verified. Browsesr based is similar to today
Casey: second question, pass attribution through to customer and
building a reputation (e.g. respects robots.txt vs not), and let
customers decide
Sriniva:
Chris Patton: what do you use for existing cryptographic identity
Casey: public-key cryptography for our internal bots and what we
proposed for others.
Eric Rescorla: looking for third party to provide reputation? or looking
for secure way to know it is google bot, etc.
Casey: latter, and easier for bots to register
**FYI Rich Taking over, listing topics and not the
fine detail (in both senses) that Tim did.**
Key points:
Ted: Suggest asnwer "is the IETF the right place" and "will you help
write the updated charter"
Acknowledging that it is fuzzy, do you think the IETF is the right
place to do this work? Y 120 / N 1/ No-opinion 6
Will you help write/review the charter? Y 84 / N 16 / No-opinion 22
Deb, AD: Thanks for coming and participating and all that stuff.
Right now, this is schedule to be in WIT not SEC, that can be
discussed if you want.