Network Working Group D. Crocker
Internet Draft Brandenburg
draft-crocker-spam-techconsider- Vernon Schryver
02.txt Rhyolite Software
Expires: <12-03> John Levine
Taughannock
Networks
29 June 2003
Technical Considerations
for Spam Control Mechanisms
This document is an Internet-Draft and is in full
conformance with all provisions of Section 10 of
RFC2026. Internet-Drafts are working documents of the
Internet Engineering Task Force (IETF), its areas and
its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum
of six months and may be updated, replaced, or
obsoleted by other documents at any time. It is
inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be
accessed at
http://www.ietf.org/shadow.html
Copyright ยจ The Internet Society (2003). All Rights
Reserved.
SUMMARY
Internet mail has operated as an open and unfettered
channel between originator and recipient. This invites
some abuses, called spam, such as burdening recipients
with unwanted commercial email. Spam has become an
extremely serious problem, is getting much worse and is
proving difficult (or impossible) to eliminate. The
most practical goal is to bring spam under control; it
will require an on-going, adaptive effort, with
stochastic rather than complete results. This note
discusses available points of control in the Internet
mail architecture, considerations in using any of those
points, and opportunities for creating Internet
standards to aid in spam control efforts. It offers
guidance about likely trade-offs, benefits and
limitations.
CONTENTS
1. SPAM AND CONSENT
2. ARCHITECTURAL REFERENCE
2.1. EMAIL CONTROL POINTS
2.2. TERMINOLOGY
3. APPROACHES TO CONTROLLING SPAM
3.1. ADMINISTRATIVE AND LEGAL MECHANISMS
3.2. INFRASTRUCTURE AND OPERATIONS
3.3. FILTERING
3.4. NEGOTIATION
4. EVALUATING TECHNICAL APPROACHES
4.1. ADOPTION
4.2. BURDEN
4.3. SCALING
4.4. ROBUSTNESS
4.5. SCENARIOS
5. SECURITY CONSIDERATIONS
5.1. PRIVACY CONSIDERATIONS
6. APPENDIX
6.1. SPAM CONTROL PROPOSAL EVALUATION CHECKLIST
6.2. ACKNOWLEDGEMENTS
6.3. AUTHORS' ADDRESSES
6.4. FULL COPYRIGHT STATEMENT
1. SPAM AND CONSENT
Internet mail has operated as an open and unfettered
channel between originator and recipient. It has
always suffered from some degree of abuse, in which
originators impose on recipients inappropriately. In
recent years, a version of this abuse has grown
substantially. Called spam, its definition varies from
"unsolicited commercial email" to "any email the
recipient does not want". Often there are no technical
differences between spam and "acceptable" email. Their
format, content and even aggregate traffic patterns may
be identical. Hence spam is a problem for fundamentally
non-technical reasons, yet the Internet technical
community must pursue technical responses to it. The
lack of strong community consensus on a single, precise
definition makes this particularly challenging.
For most working discussions, the term "Unsolicited
Bulk Email" is sufficient. The salient point that it
is a mass-mailing ensures that discussion covers the
broadest concern of the user and provider
communities. Mail that is not in some real sense "bulk"
cannot flood networks or mailboxes. Essentially all
mail that people object to, as "spam", is bulk. For
example practically all objectionable advertising mail
is also bulk, although modern techniques for targeted
advertising can permit extensive content or address
tailoring. "Bulk" is usually very difficult for an
individual recipient to prove, but almost always easy
to recognize in practice.
More detailed discussion must, of course, be precise in
the definition of "unsolicited" and usually must
distinguish between different types of mail, such as
commercial, religious, political or personal.
The simplistic -- but entirely adequate -- summary of
the role of spam on Internet mail is that it is an
extremely serious problem, it is getting much worse,
and it is proving difficult or impossible to
eliminate. Spam is generated by a very wide range of
clever sources and it always will be. Instead of
thinking of spam as a disease that might be eliminated,
it is more useful to think of it like crime, war and
cockroaches.
It is not realistic to expect to eliminate any of
these, no matter how much anyone might wish otherwise.
Therefore the best we can hope to accomplish is to
bring spam under reasonable control and that control
will require an on-going, adaptive effort, with
stochastic rather than complete results. That is, we
need multiple, adaptive techniques. As spam changes, so
must our mechanisms. Different mechanisms will be
appropriate for different circumstances.
In other words, spam has become a permanent part of the
Internet mail experience and efforts to control it may
only reduce it to a tolerable level, rather than
eliminate it. It is somewhat comforting to remember
that an individual spam is not damaging. Rather the
quantity of spam is what poses a threat. Hence there
is flexibility in permitting spam control mechanisms to
be imperfect.
This note discusses available points of control in the
Internet mail architecture, considerations in using any
of those points, and opportunities for creating
Internet standards to aid in spam control efforts. It
offers guidance about likely trade-offs, benefits and
limitations.
The note does not offer an analysis of the types of
spam or the types of attacks used in sending spam, nor
is it intended to specify solutions. Similarly, the
note does not discuss fine-grained details, such as the
arguments associated with single opt-in mechanisms,
versus double opt-in. These points are important to
the engineering of particular solutions, but only as
refinements after the larger architectural and system
control choices are made.
Note: This document is intended to evolve, based on
comments from the Anti-Spam Research Group
(ASRG) mailing list. It is certain that the
current draft is incomplete and entirely
possible that it is inaccurate. Hence,
comments are eagerly sought, preferably in
the form of suggested text changes, and
preferably on the ASRG mailing list, at
<mailto:asrg@ietf.org>
STD [0] Throughout this document, opportunities
OPP for technical standards are cited. These
represent an attempt to provide a complete
list of such possibilities, rather than to
offer recommendations. These will be in
entries of this form, with the label "STD
OPP".
2. ARCHITECTURAL REFERENCE
2.1. Email Control Points
Email transmission sequences can touch many systems,
between the originator and the recipient. However for
most discussions about control, only five major
components are important:
+---------------+ +---------------+
| UA.o -> MTA.o | -> MTA.i -> | MTA.r -> UA.r |
+---------------+ +---------------+
UA.o: The originator's user agent,
operated by the user and under
their direct control
MTA.o: The mail transfer agent service
associated with the originator's
environment, possibly operated by
the sender and possibly operated
under separate control, such as by
their employer.
MTA.i: The mail transfer agent service
operated by an independent third-
party, such as an Internet Service
Provider (ISP)
MTA.r: The mail transfer agent service
associated with the recipient's
environment
UA.r: The recipient's user agent
In many organizations, the MTA service is multi-stage,
such as including a department MTA and an Internet
"firewall" MTA. This distinction is of fundamental
importance for making software and operations
decisions, but it does not have a significant impact on
a discussion about points of control. Points of
control are primarily affected by crossing
administrative boundaries. Therefore the distinction
between originator's environment, recipient's
environment and any independent third parties is
essential to this larger examination. These are
separate, independent administrative environments and
are subject to different policies. In particular, note
that a discussion about using control points hinges on
the scope of the control to be exercised.
Besides constituting a major burden to recipients, the
volume of spam traffic has become a serious problem for
transit services. Hence a precept in controlling spam
is to seek control as close to the source as possible.
The fewer downstream resources consumed by spam, the
better. Of course, the ideal would be a mechanism in
UA.o that would prevent spam from being sent in the
first place. Indeed, legal remedies seek to affect a
sender's motivations, so that they will not send the
spam at all.
Unfortunately there is no opportunity for software
control of spam in UA.o , because the software is
under the control of the originator. If they wish to
bypass any control mechanisms in UA.o , they will find
a way. Of course, some services have UA.o under
administrative control from the software's user. This
affords a software choice, placing controls in that
module, but does not permit the more general
architectural specification of controls there, because
the separate administrative control cannot be relied
on.
The next opportunity is MTA.o. Often this service is
operated by a group independent of the originator.
Wherever the detection mechanism is placed, the
critical challenge is to identify spam in real time, if
its relaying and delivery are to be stopped. The other
avenue is post-hoc removal of the right to make further
use of the MTA service. This may have strong utility
for spammers attempting to operate within acceptable
social bounds. It will have no effect upon spammers
who avoid accountability.
2.2. Terminology
When determining whether a message qualifies as spam,
different types of email attributes can be considered,
different types of analyses can be performed on them.
Equally the results of the analyses can be used in
different ways, for preventing, detecting or following
up occurrences of spam.
2.2.1. Evaluation Focus
When discussing both the attributes of spam and the
mechanisms for controlling it, the major distinction
for evaluation is between:
Originator: Evaluate the trustworthiness of
creator of the content. Will the
originator create spam?
Content: Evaluate the message content,
itself. Does the content contain
spam?
Destination: Evaluate whether special
destinations were specified, such
as honeypots
Traffic Evaluate the aggregate posting
behavior, to determine whether
multiple, related postings qualify
as "bulk"
Validating the originator can often be done with
excellent reliability. However current common
practises for author authentication have resisted wide-
scale adoption and this approach only protects against
spam indirectly. The creator might choose to violate
the criteria used to assess them. When validation of
the originator is based on the contents, this certifies
authorship, but does not certify any other
characteristic of the content.
By contrast evaluating content is direct -- either it
is spam or it is not -- but it is impossible to do the
evaluation perfectly. For example, legitimate
subscription-based bulk mail is technically identical
to spam, in every regard, except that it is solicited
or desired by its recipients. Simplistic content
evaluation criteria have a high rate of false positives
and are easily bypassed by spammers, leading to a high
rate of false negatives. Complex criteria are
difficult to create and maintain. They, too, are
likely to have a high rate of false assessments,
eventually, unless maintenance of the analysis rules is
diligent.
2.2.2. Originator Focus
Evaluation of the originator sub-divides between:
Author: Evaluate whether the person
creating the content is likely to
create spam.
System: Evaluate whether the system that is
sending email on a person's behalf
is likely to permit spam to be
sent.
Evaluating the person (or organization) creating the
message is direct, albeit still carrying the caveats
noted above. Evaluating the system is indirect, but
presumes that the system enforces quality assurance
policies on the email sent from it.
A larger problem with evaluating the originator of mail
is that Internet mail necessarily and desirably
involves receiving mail from strangers. Mailboxes that
are closed to mail from strangers do not have a spam
problem. On the other hand, it is impossible to know
whether copies of a message from a stranger are also
being sent to 30,000,000 of your closest friends.
Contrary to often-expressed hopes, a third party that
is also a stranger cannot attest to the virtue of a
mail sender. A letter of introduction from a stranger
does not make the bearer other than a stranger.
If the history of spam is any guide, organizations such
as Internet service providers and public key
infrastructure (PKI) providers cannot be expected to
ensure that their customers do not send spam. Even
with the best of intentions, they will always be
willing to open new accounts to strangers. The most
that can be expected is that they will punish their
spamming customers such as by imposing substantial fees
or filing lawsuits. It should be noted that the
"punishment" of terminating their account often is
meaningless, because many spammers create one-time
accounts.
2.2.3. Detection
Qualification performs tests against one or more
criteria. Test results are:
Positive: Message matches the test criteria.
Negative: Message fails to match the test
criteria.
When the tests are heuristic or statistical, some
portion of the results will be incorrect.
Incorrect results are classed as:
False Positive (FP):
The filter classified a non-spam message
as spam. That is, the message matches
the test criteria, but the criteria are
too aggressive.
False Negative (FN):
The filter classifies a spam message as
non-spam. That is, the message fails to
match the test criteria, but the
criteria are not sufficiently strong.
2.2.4. Disposition
Filters are used for two, basic and complementary
purposes:
Acceptance: Approves mail for delivery.
Rejection: Withholds or refuses permission for
delivery.
Implementations of filter mechanisms may provide for a
range of choices, rather than simple acceptance or
rejection.
Note that rules for acceptance are equally subject to
error. However Acceptance rules are usually for
simple, explicit rules rather than heuristics, so that
FP and FN results are not usually a concern. Hence
discussion of FP and FN are usually for Rejection
rules.
2.2.5. Simple Filtering
The combined range of capabilities for detection and
disposition of email can produce complex, heuristic
behaviors. For better efficiency and predictability,
such mechanisms usually permit specification of
explicit lists of criteria and values that, when
present in the message, prompt direct disposition. The
simplest method of testing is to have explicit lists of
simple identifier criteria, such as From address or
standard text in the Subject header.
Pre-assessed values are entered into:
Whitelist: Automatic Acceptance
Blacklist: Automatic Rejection
One approach to maintaining Whitelists and Blacklists
is to make explicit entries into them, manually. This
is often what a spam control service will offer to its
subscribers. Most such services are for blacklisting
known sources of spam.
A difficulty with these listing services is the set of
criteria used for adding and removing senders or sites.
These policies usually need to be explicit, objective,
documented and consistently applied. Even then,
blacklist operators are attractive targets for threats
of lawsuits claiming inappropriate listing,
interference in business or trade, etc.
3. APPROACHES TO CONTROLLING SPAM
3.1. Administrative and Legal Mechanisms
Both government law and service provider contracts can
be used for defining unacceptable behavior, requiring
preventive measures, and providing for remedies when
there are violations.
There are two major problems with this administrative
approach to the control of spam. One is that the
sender often cannot be readily identified by the
recipient of spam. There are many opportunities for
practically anonymous posting of email, including
Internet cafes, transient access services and free
email services. The second problem is that the sender
of spam may not be in the jurisdiction seeking to
exercise control or a jurisdiction responsive to the
recipient's jurisdiction. The Internet is global.
Unlike postal bulk mail, the cost of sending spam over
the Internet does not change as the mail crosses
jurisdictional boundaries.
Hence it seems likely that use of administrative
procedures can be effective for controlling
"responsible" spam -- that is, spam sent by
organizations operating as accountable social
participants. Perhaps they indulge in overly aggressive
policies, but they still desire to be socially
tolerable. The large number of "rogue" spammers is not
similarly burdened.
However, most "rogue" spammers are trying to sell a
product or service. There have been notable successes
against spammers by the U.S. federal government
""following the money." ." However the government staff
for these activities note their lack of resources and
the extensive effort to achieve the result.
3.2. Infrastructure and Operations
Enhancement of underlying Internet services might
reduce the effectiveness of some spam transmission
mechanisms. For example many spammers prefer to send
to domain name service MX secondaries because
secondaries are often not as well filtered as MX
primaries. Because of MX secondaries lack a
coordination protocol, the best advice for all but the
largest sites is to stop using MX secondaries. This
advice sounds radical, but MX secondaries are no longer
needed to compensate for intermittently connected or
sending MTAs. Today MX secondaries are generally
needed only for ""load balancing" " when there is more
incoming mail than can be handled by a single SMTP
server.
STD [1] An MX secondary coordination
OPP protocol could coordinate standardized
filtering rules, white- and blacklist
entries and other spam control data
among MX servers.
[2] Best Current Practises (BCP)
documentation of preferred MTA operation
for spam control, beyond that documented
in RFC 2505. For example, it is better
to reject spam by rejecting the SMTP
transaction with a 5yz status code than
to accept the transaction and later send
a delivery failure notification.
[3] BCPs for operational conventions
relevant to other other spam control
services, such as DNS blacklists
Postal mail imposes a fee on the sender for each
message that is sent. Such a fee makes the cost of
sending significant, and proportional to the amount
sent. In contrast, current Internet mail is very
nearly free to the sender. Hence there is interest in
exploring "sender pays" email.
One form of sender-pays is identical to postal
stamping. Another entails imposes post-hoc actions on
the sender, taking the fee for their posting only if
the recipient indicates they were unhappy to receive
it. For both models, it is not clear that it is
possible to retroactively fit the necessary mechanisms
to Internet mail. Its complete absence from the current
service and the existence of anonymous and free email
services may provide too much operational inertia. It
is also not clear who should receive the fees or how
they should be disbursed.
3.3. Filtering
The technical mechanism for real-time detection and
handling of spam is a filter, placed at MTA.o, MTA.i,
MTA.r and/or UA.r. A filter has two functions:
detection and action. Action is usually either adding a
special label to the message or disposing of it.
3.3.1. Traffic Analysis
Spam is often referred to as "unsolicited bulk mail" to
highlight that senders typically post very large
amounts quickly. Opt-in (subscription) email also
demonstrates this traffic pattern. Still there is
benefit in measuring aggregate email behavior.
STD [4] Traffic reporting protocol, to
OPP permit collaboration among independent
administrations.
3.3.2. Content Analysis
Filters look for message attributes, such as strings of
text in the headers or content of the message being
inspected. Other attributes include the address or
domain name of the originating system or the occurrence
of the same message content in multiple messages at the
same time. Simple filters look for specific strings. A
more powerful approach looks for multiple sets of
strings, assigning a positive or negative score to each
occurrence; it then labels spam according to its total
score.
Rule creation is done manually, or by a service, or by
analysis of a collection of messages. For example one
type of service observes email traffic at many Internet
locations and receives reports as recipients see new
types of spam. The service then propagates new rules to
its subscribers. One example of an analytic approach
performs empirical rule creation, using statistical
techniques, such as Bayesian, to discern string
occurrences in known spam, versus mail that is known
not to be spam.
As rules become common, spammers adapt their messages
to bypass filters, so that existing rules quickly
become less effective. Hence a long-term filter must
use rules that are continually modified. Empirical
rules generation must be repeated, or must operate
continuously, analyze all incoming mail.
Manual rule maintenance is simply not practical for
typical users; the effort is far too great and the
nature of rules such as ""regular expressions" " are
too arcane. A concern about services is that they are
inherently post-hoc. They are always updating the rule-
set after an "attack" commences, so that some spam is
certain to reach some recipients; however the view that
a small amount of spam is not dangerous mitigates this
concern. Lastly, methods using automated analysis rely
on heuristics, or guesses. They are certain to have
some percentage of "false negatives" (FN) that permit
real spam to reach the recipients, and some percentage
of "false positives" (FP) that incorrectly label
legitimate mail as spam.
Any effective, long term filtering mechanism must have
automatic or semi-automatic rule creation and must
upgrade the set of rules continuously or periodically.
STD [5] Format and exchange mechanisms,
OPP to permit sharing rules, rule
templates, white/black list entries.
[6] Sample message labeling and
exchange, to permit submission of
candidate content to remote service.
[7] Hash-based identifier of content
3.3.3. Tagging
Message originators and transit handlers can facilitate
filtering efforts by adding standardized information,
or tags. The most serious difficulty with any scheme
that relies on tagging is its relationship to the
larger body of email that is untagged. What does it
mean when the tag is not present? Is presence of the
tag a certain indicator of the intended information?
Is there benefit in falsely labeling the content? Does
the scheme contain a means of preventing this spoofing?
If tagging uses a simple string label, such as "ADV" to
indicate that the contents contain advertising, how is
this useful when most email is not labeled or is
labeled incorrectly? This is like postal-based mass
marketing that has an envelope marked "personal and
confidential" but is neither.
Non-forgeable tagging uses cryptographic techniques.
If the tagging identifies the sender, then the
recipient must have access to the cryptographic
identifier. If the tag is independent of the content --
that is, it identifies and authenticates the sender,
but uses a scheme that does not integrate the specific
content of the contained message -- then what is to
prevent re-using the identification inappropriately?
STD [8] Standardized tags, according to
OPP different criteria
3.3.4. Filter Rules
The simplest model for a filtering test is to have
entries containing a single, simple attribute, such as
sender email address or source system IP address or
domain name.
For assessments based on the identity of the sender,
rather than the content of the message, another concern
is validation of the key attribute used for
identification. What if the value for that attribute
is set falsely? For example, what if email was not
send by the address listed in the From field?
STD [9] Common metrics about message sender
OPP behaviors, to allow calculation of their
"reputation".
[10] Format and access to filter logs,
such as among MX secondaries. Spammers
sometimes spread their mail among the MX
secondaries for a domain. Correlating
typical SMTP log files merely by time
and data is onerous.
[11] Control protocol between recipient
and filtering service server, to permit
specifying policies and specific rules.
[12] Modify SMTP delivery status
notifications to avoid flooding innocent
mailboxes because of forged senders.
[Needs clarification. /ed]
[13] Codify best current practices of
filters to minimize sending DSN.
Delivery status notifications announcing
the rejection of spam often go to
innocent third parties when the sending
address of the spam has been forged.
Rejecting the message during the SMTP
transaction often, but not always,
prevents this ""collateral damage." ."
[This may duplicate a previous
opportunity. /ed]
[14] Codify DSN and SMTP status message
wording, such as saying that rejections
resulting from filtering should include
a URL for an extended explanation.
[Needs clarification. /ed]
[15] Replace SMTP.
The idea of replacing SMTP is appealing because it
permits thinking in terms of creating an infrastructure
that has accountability and restrictions built in.
Unfortunately an installed base the size of the
Internet is not likely to make such a change anytime
soon. It seems far more likely that successful spam
control mechanisms will be introduced as increments to
the existing Internet mail service.
Moreover, the feature of SMTP that is most responsible
for spam is the ability to receive mail from strangers.
Without this feature, there would be no flood of spam,
but many of the most valuable Internet commercial and
individual activities would also be impossible.
Replacing SMTP with a protocol that allows strangers to
send each other mail would not stop spam any more than
SMTP-AUTH stopped spam, contrary to insistent claims to
the contrary, before SMTP-AUTH became widely available
and used.
3.4. Negotiation
In addition to real-time analysis, a recipient may
engage in an explicit negotiation with the sender, to
validate them.
When this is performed at the time of message receipt,
it is called a "Challenge-Response"(CR) mechanism.
This mechanism might use regular email exchange, or
other media supporting interaction. An example of a
mechanism could have the recipient MTA contact the
putative sender's host, as addressed by the DNS MX
record associated with the Mail-From domain name. It
could send that domain a hash of the received message
and ask, "Did you really send this"? The effect is
essentially the same as a cryptographic message
authentication, but implemented through a callback
mechanism, rather than being carried with the message
content.
CR introduces delay in message receipt and creates at
least one additional email round-trip exchange for
every new sender/recipient pair. This is a substantial
burden both on participants and on the transit
service. Senders often refuse to respond to the
challenge, so that the mechanism dissuades senders from
all but the most urgent communications. In addition,
the delay imposed by CR can render time-sensitive
messages useless.
STD [16] Validation protocol (such as
OPP "challenge/response") between the
recipient's and the sender system
4. EVALUATING TECHNICAL APPROACHES
The complexity of Internet mail service and the nature
of spam make it difficult evaluate proposals for
control mechanisms. In this section, the key technical
factors affecting viability are examined.
4.1. Adoption
A critical barrier to the success of a new mechanism is
the effort it takes to begin using it. It is essential
to look carefully at the adoption process.
1) Adoption What is the effort for a new
Effort participant to start using the
proposed mechanism? This includes
installation, learning to use it
and performing initial
operations. This is also called
the "barrier to entry".
2) Threshold What is required before users get
to benefit some benefit from the mechanism?
Primarily, this looks for the
number of users who must adopt
the mechanism before the adopters
gain utility from it.
A key construct to examination of adoption and benefit
is "core-vs-edge". Generally, adoption at the edge of
a system is easier and quicker than adoption in the
core. If a mechanism affects the core (infrastructure)
then it usually must be adopted by most or all of the
infrastructure before it provides meaningful utility.
In something the scale of the Internet, it can take
decades to reach that level of adoption, if it ever
does.
Remember that the Internet comprises a massive number
of independent administrations, each with their own
politics and funding. What is important and feasible to
one might be neither to another. If the latter
administration is in the handling path for a message,
then it will not have implemented the necessary control
mechanism. Worse, it well might not be possible to
change this. For example a proposal that requires a
brand new mail service is not likely to gain much
traction.
By contrast, some "edge" mechanisms provide utility to
the first one, two or three adopters who interact with
each other. No one else is needed for the adopters to
gain some benefit. Each additional adopter makes the
total system incrementally more useful. For example a
filter can be useful to the first recipient to adopt
it. A consent mechanism can be useful to the first two
or three adopters, depending upon the design of the
mechanism.
3) Impact on What is the impact on the senders
Participants and receivers who adopt the
proposal? Senders and receivers
currently have certain styles of
operation. How are those styles
changed?
4) Impact on What is the impact on the senders
Others and receivers who do *not* adopt
the proposal? What effect does it
have on legitimate users of
email? What effect does it have
on spammers? Is the nature of
Internet mail changed for
everyone, including non-
adopters?
For example, a challenge-response system is irritating
for the person being challenged, and it imposes extra
delay on the desired communication. If the originator
and the recipient both access the Internet only
occasionally (such as through dial-up when mobile) a
challenge-response model can impose days of delay. For
some communications, this can be disastrous.
4.2. Burden
The purpose of spam control is to cause some email to
fail to reach its intended destination. This is, of
course, directly at odds with the constructive goal of
email. Hence spam control alters the basic model of
email service.
Effective mechanisms must place some kind of burden on
senders and receivers. Hence a challenge for spam
control mechanisms is to require enough of a burden to
be effective, but not so much that it makes email
unacceptably painful to use.
5) Ongoing UsageOnce a user has chosen to make
effort the change to adopt a mechanism,
how much effort does it take to
use it regularly? After the
effort to adopt the mechanism,
how does it affect regular email
use in an ongoing basis?
6)Balance of What is the nature and
burdens distribution of the burdens
placed on senders and receivers
who are affected by the proposed
mechanism? Who must work harder
to use the proposed mechanism?
4.3. Scaling
"Adoption" is the process of placing a new mechanism
into an operational environment. Scaling looks at the
effect of having very large numbers of participants use
that mechanism.
7) Use by Full What happens if everyone on the
Internet Internet adopts the proposed
mechanism? How is the fabric
Internet mail affected when there
is very large-scale use?
8) Growth of What if the Internet grows by a
Internet factor of 1000? How is the fabric
affected when there is much
larger-scale use?
Remember that "everyone" is approximately 100 million
users at the time of this writing. It will to grow to
10 billion, if we expect the Internet to be useful for
some decades. And it is likely there will be more email
users/accounts than there are people on the planet,
given that individuals and organizations occupy
multiple roles.
So, what will it be like for 100 million or 10 billion
users to employ the proposed mechanism? Are there
technology or operations "choke points" in the proposed
mechanism?
9) Efficiency Will the proposed mechanism be
sufficiently efficient? Is
Internet mail delivered in a
timely fashion? Is the burden on
processing and storage
acceptable?
10) Cost Will the proposed mechanism be
sufficiently inexpensive?
11) Reliability Will the proposed mechanism be
sufficiently reliable? Is non-
spam email more likely to be
delivered correctly? Less
likely?
There is another side to the scaling question:
12) Internet How much of the Internet will be
Impact affected by a proposal, if the
proposal is adopted?
13) Spam How much spam will be controlled
Impact by the proposed mechanism?
If a proposal requires substantial effort to adopt and
use, but will affect only a small percentage of spam,
the efficacy of that proposed mechanism is very much in
question. One example of this concern might be legal
scope, given that spam is global and there is no global
law enforcement.
4.4. Robustness
After a technique is adopted, spammers will adjust
their techniques, attempting to work around the
technique. For example, when people started using
header filters, spammers started using bland deceptive
subject lines, which mean that when spam gets past the
filters, people are more likely to open messages and
see porn pix. If whitelists become common, it is
possible to envision spammers attempting to forge From
addresses that are likely to be on the recipient's
whitelist.
14) Circumvention How difficult will it be for
spammers to change their mail to
bypass the proposed scheme? How
are circumvention efforts likely
to affect non-spam mail?
4.5. Scenarios
Almost any proposal will make sense for a particular
scenario that is sufficiently constrained. The real
test is how the proposal works for other, likely
scenarios.
Make sure the proposal considers these likely cases
carefully. There are many others. Here are some typical
scenarios that often discriminate among proposals for
changes to email:
15) Personal For two individuals wishing to
post/Reply exchange periodic email, how does
the proposed mechanism work for
initial contact? How does it
work for ongoing contact?
16) Mailing Mailing lists are particularly
List interesting because special
software performs a multi-cast
redistribution of a message.
Still, the From field of the
message is from the originator,
rather than the mailing list. How
does the mechanism perform in
this sort of mediated
distribution? Does a recipient
"reply" still work properly?
17) Inter- Two or more organizations often
Enterprise form special, cross-group teams
to collaborate on projects. What
is required to configure the
proposed mechanism to support
such teams? What is required to
maintain the mechanism, as
membership in the team changes?
How are intra-team communications
affected?
5. SECURITY CONSIDERATIONS
This note discusses types of mechanisms for evaluating
and filtering email. As such, it covers topics with
extremely sensitive security concerns. However it does
not propose any standards and therefore does not have
any direct security effects.
5.1. Privacy Considerations
Many spam control techniques affect the privacy of mail
senders, receivers, or both. Bulk counting techniques
can disclose the contents of mail, in systems that
exchange message bodies, and can permit traffic
analysis, in systems that use non-text message hashes
or digests. Content filters can reveal message contents
if filtered messages are examined by network personnel
to check for false positives or negatives. Aggressive
filtering can cause bounces and double bounces that
send messages into postmaster mailboxes, disclosing
content. If senders or recipients must appeal to have
filtering criteria changed to avoid false positives,
informal traffic analysis is possible based on the
filtering terms in question.
Sender tagging and other techniques intended to deter
address forgery make it more difficult to send
anonymous or pseudonymous mail. E-postage schemes can
identify senders unless the scheme allows users to buy
and redeem stamps anonymously.
Several popular spam control systems involve routing
incoming mail through the mail systems of third parties
that are responsible for filtering mail. This exposes
their contents to those parties.
These privacy risks can in principle be known to mail
receivers, although operators of mail systems often
fail to inform users of the anti-spam tools and third
party services through which their mail passes. Mail
senders often cannot know even in principle about these
risks to their privacy.
6. APPENDIX
6.1. Spam Control Proposal Evaluation Checklist
1) Adoption Effort
2) Threshold to benefit
3) Impact on Participants
4) Impact on Others
5) Ongoing Usage effort
6) Balance of burdens
7) Use by Full Internet
8) Growth of Internet
9) Efficiency
10)Cost
11)Reliability
12)Internet Impact
13)Spam Impact
14)Circumvention
15)Personal post/Reply
16)Mailing List
17)Inter-Enterprise
6.2. Acknowledgements
This note is motivate by discussions on the Anti-Spam
Research Group (ASRG) mailing list and draws a number
of points from discussion there. The sub-section
"Burden" was taken from a posting by Dave Hendricks.
6.3. Authors' Addresses
Dave Crocker
Brandenburg InternetWorking
675 Spruce Drive
Sunnyvale, CA 94086 USA
Tel: +1.408.246.8253
dcrocker@brandenburg.com
Vernon Schryver
Rhyolite Software
2482 Lee Hill Drive
Boulder, Colorado 80302
vjs@rhyolite.com
John R. Levine
Taughannock Networks
PO Box 727
Trumansburg NY 14886
Tel: +1.607.330.5711
johnl@iecctaugh.com
6.4. Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights
Reserved.
This document and translations of it may be copied and
furnished to others, and derivative works that comment
on or otherwise explain it or assist in its
implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice
and this paragraph are included on all such copies and
derivative works. However, this document itself may
not be modified in any way, such as by removing the
copyright notice or references to the Internet Society
or other Internet organizations, except as needed for
the purpose of developing Internet standards in which
case the procedures for copyrights defined in the
Internet Standards process must be followed, or as
required to translate it into languages other than
English.
The limited permissions granted above are perpetual and
will not be revoked by the Internet Society or its
successors or assigns.
This document and the information contained herein is
provided on an "AS IS" basis and THE INTERNET SOCIETY
AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.