MARF Working Group J. Falk, Ed.
Internet-Draft Return Path
Intended status: Informational M. Kucherawy, Ed.
Expires: May 22, 2012 Cloudmark
November 19, 2011
Redaction of Potentially Sensitive Data from Mail Abuse Reports
draft-ietf-marf-redaction-02
Abstract
Email messages often contain information which might be considered
private or sensitive, per either regulation or social norms. When
such a message becomes the subject of a report intended to be shared
with other entities, the report generator may wish to redact or elide
the sensitive portions of the message. This memo suggests one method
for doing so effectively.
[NOTE TO EDITOR: Murray Kucherawy is listed as an author only to
enable him to complete the publication process on behalf of J.D.
Falk. Please remove Murray from the author list prior to
publication.]
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 22, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
Falk & Kucherawy Expires May 22, 2012 [Page 1]
Internet-Draft Redaction November 2011
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Recommended Practice . . . . . . . . . . . . . . . . . . . . . 3
3. Security Considerations . . . . . . . . . . . . . . . . . . . . 4
3.1. General . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2. Digest Collisions . . . . . . . . . . . . . . . . . . . . . 4
3.3. Information Not Redacted . . . . . . . . . . . . . . . . . 4
4. Privacy Considerations . . . . . . . . . . . . . . . . . . . . 4
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5
6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6.1. Normative References . . . . . . . . . . . . . . . . . . . 5
6.2. Informative References . . . . . . . . . . . . . . . . . . 5
Appendix A. Example . . . . . . . . . . . . . . . . . . . . . . . 5
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . . 6
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 6
Falk & Kucherawy Expires May 22, 2012 [Page 2]
Internet-Draft Redaction November 2011
1. Introduction
[ARF] defines a message format for sending reports of abuse in the
messaging infrastructure, with an eye toward automating both the
generating and consumption of those reports.
For privacy considerations it might be the policy of a report
generator to redact, or obscure, portions of the report that might
identify an end user that caused the report to be generated.
Precisely how this is done is unspecified in [ARF] as it will
generally be a matter of local policy. That specification does
admonish generators against being too over-zealous with this
practice, as obscuring too much data makes the report non-actionable.
Previous redaction practices, such as replacing local-parts of
addresses with a uniform string like "xxxxxxxx", often frustrates any
kind of prioritizing or grouping of reports.
Generally, it is assumed that the recipient-identifying fields of a
message, when copied into a report, are to be obscured to protect the
identity of then end user who submitted the complaint about the
message. However, it is also presumed that other data will be left
intact, and that data could theoretically be correlated against log
files or other resources to determine the intended recipient of the
message.
2. Recommended Practice
To enable correlation of reports that might refer to a common but
anonymous source, the following redaction practice is recommended
(but not required):
1. Select an arbitrary string that will be used by an Administrative
Domain (ADMD) that generates reports. This string will not be
changed except according to a key rotation policy or similar.
Call this the "redaction key".
2. Identify string(s) (such as local-parts of email addresses) in a
message that need to be redacted. Call this the "private data".
3. Construct a new string that is a copy of the redaction key with
the private data concatenated to it.
4. Compute a digest of that string with any hashing/digest algorithm
such as one defined in [FIPS-180-3-2008].
5. Encode that hash with the base64 algorithm as defined in [MIME].
6. Replace the private data with the encoded hash when generating
the report.
This has the effect of obscuring the data in an irreversible way but
Falk & Kucherawy Expires May 22, 2012 [Page 3]
Internet-Draft Redaction November 2011
still allows the report recipient to observe that numerous reports
are about one particular end user. Such detection enables the
receiver to prioritize its reactions based on problems that appear to
be focused on specific end users that may be under attack.
3. Security Considerations
3.1. General
General security issues with respect to these reports are found in
[ARF].
3.2. Digest Collisions
Message digest collisions are a well-understood issue. Their
application here involves a report receiver improperly concluding
that two pieces of redacted information were originally the same when
in fact they are not. This can lead to a denial of service, where
the inadvertently improper application of complaint data causes
unjustified corrective action. Such cases are sufficiently unlikely
as to be of little concern.
3.3. Information Not Redacted
Although the identity of a report generator can be redacted using
this mechanism, other properties of a message (such as the Message-ID
field) that are not redacted could be used to recover the original
data. It is incumbent on the report generator to anticipate and
redact or otherwise obscure such data, or accept that such recovery
is possible.
Section 8 of [ARF] covers topics related to establishment of
bilateral agreements between report producers and consumers. The
issues raised here are also things to be considered when establishing
such agreements.
4. Privacy Considerations
While the method of redaction described in this document may reduce
the likelihood of some types of private data from leaking between
Administrative Domains, it is extremely unlikely that report
generation software could ever be created to recognize all of the
different ways that private information may be expressed through
human written language. If further protections are required,
implementers may wish to consider establishing some sort of out-of-
band arrangements between the relevant entities to contain private
Falk & Kucherawy Expires May 22, 2012 [Page 4]
Internet-Draft Redaction November 2011
data as much as possible.
5. IANA Considerations
This memo includes no request to IANA.
[RFC Editor note: This section may be removed prior to publication.]
6. References
6.1. Normative References
[ARF] Shafranovich, Y., Levine, J., and M. Kucherawy, "An
Extensible Format for Email Feedback Reports", RFC 5965,
August 2010.
6.2. Informative References
[FIPS-180-3-2008]
U.S. Department of Commerce, "Secure Hash Standard", FIPS
PUB 180-3, October 2008.
[MIME] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message
Bodies", RFC 2045, November 1996.
Appendix A. Example
Assume the following input message:
From: alice@example.com
To: bob@example.net
Subject: Make money fast!
Message-ID: <123456789@mailer.example.com>
Date: Thu, 17 Nov 2011 22:19:40 -0500
Want to make a lot of money really fast? Check it out!
http://www.example.com/scam/0xd0d0cafe
On receipt, bob@example.net reports this message as abusive through
whatever mechanism his email service provider has established. This
causes an [ARF] message to be generated. However, example.net wishes
to obscure Bob's email address lest it be relayed to the offending
agent, which could lead to more trouble for Bob.
Falk & Kucherawy Expires May 22, 2012 [Page 5]
Internet-Draft Redaction November 2011
Thus, example.net plans to redact the local-part of the recipient
address in the To: field. It has selected a redaction key of
"potatoes", and the private data in this case is the string "bob".
The concatenation of "potatoesbob" is digested with SHA1 and then
base64-encoded to the string "rZ8cqXWGiKHzhz1MsFRGTysHia4=".
Thus, when constructing the ARF message in response to Bob's
complaint, the following form of the received message is used in the
third part of the ARF report:
From: alice@example.com
To: rZ8cqXWGiKHzhz1MsFRGTysHia4=@example.net
Subject: Make money fast!
Message-ID: <123456789@mailer.example.com>
Date: Thu, 17 Nov 2011 22:19:40 -0500
Want to make a lot of money really fast? Check it out!
http://www.example.com/scam/0xd0d0cafe
Note, however, that it is possible the redacted information can be
recovered by agents at example.com by searching their logs for the
original envelope associated with the message by correlating with the
Message-ID contents, which were not redacted here. It is expected
that feedback loops generating such reports involve senders that have
been vetted against such information leakage.
Appendix B. Acknowledgements
Much of the text in this document was initially moved from other MARF
working group documents, crafted by Murray Kucherawy with
contributions from Monica Chew, Tim Draegen, Michael Adkins, and
myself. Additional feedback was provided by S. Moonesamy, Alessandro
Vesely, and Mykytka Yevstifeyev.
Authors' Addresses
J.D. Falk (editor)
Return Path
100 Mathilda Place, Suite 100
Sunnyvale, CA 94086
US
Email: ietf@cybernothing.org
URI: http://www.returnpath.net/
Falk & Kucherawy Expires May 22, 2012 [Page 6]
Internet-Draft Redaction November 2011
M. Kucherawy (editor)
Cloudmark
128 King St., 2nd Floor
San Francisco, CA 94107
US
Email: msk@cloudmark.com
Falk & Kucherawy Expires May 22, 2012 [Page 7]