MARF Working Group                                          J. Falk, Ed.
Internet-Draft                                               Return Path
Intended status: Standards Track                       M. Kucherawy, Ed.
Expires: July 21, 2012                                         Cloudmark
                                                        January 18, 2012


    Redaction of Potentially Sensitive Data from Mail Abuse Reports
                      draft-ietf-marf-redaction-05

Abstract

   Email messages often contain information that might be considered
   private or sensitive, per either regulation or social norms.  When
   such a message becomes the subject of a report intended to be shared
   with other entities, the report generator may wish to redact or elide
   the sensitive portions of the message.  This memo suggests one method
   for doing so effectively.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on July 21, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of



Falk & Kucherawy          Expires July 21, 2012                 [Page 1]


Internet-Draft                  Redaction                   January 2012


   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  Recommended Practice  . . . . . . . . . . . . . . . . . . . . . 3
   3.  Security Considerations . . . . . . . . . . . . . . . . . . . . 4
     3.1.  General . . . . . . . . . . . . . . . . . . . . . . . . . . 4
     3.2.  Digest Collisions . . . . . . . . . . . . . . . . . . . . . 4
     3.3.  Information Not Redacted  . . . . . . . . . . . . . . . . . 4
     3.4.  Key Management  . . . . . . . . . . . . . . . . . . . . . . 5
     3.5.  Algorithm Vulnerabilities . . . . . . . . . . . . . . . . . 5
   4.  Privacy Considerations  . . . . . . . . . . . . . . . . . . . . 5
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
     6.1.  Normative References  . . . . . . . . . . . . . . . . . . . 6
     6.2.  Informative References  . . . . . . . . . . . . . . . . . . 6
   Appendix A.  Example  . . . . . . . . . . . . . . . . . . . . . . . 6
   Appendix B.  Acknowledgements . . . . . . . . . . . . . . . . . . . 7
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 7





























Falk & Kucherawy          Expires July 21, 2012                 [Page 2]


Internet-Draft                  Redaction                   January 2012


1.  Introduction

   [ARF] defines a message format for sending reports of abuse in the
   messaging infrastructure, with an eye toward automating both the
   generating and consumption of those reports.

   For privacy considerations it might be the policy of a report
   generator to anonymize, or obscure, portions of the report that might
   identify an end user who caused the report to be generated.  This has
   come to be known in feedback loop parlance as "redaction".  Precisely
   how this is done is unspecified in [ARF] as it will generally be a
   matter of local policy.  That specification does admonish generators
   against being too over-zealous with this practice, as obscuring too
   much data makes the report non-actionable.

   Previous redaction practices, such as replacing local-parts of
   addresses with a uniform string like "xxxxxxxx", often frustrates any
   kind of prioritizing or grouping of reports.

   Generally, it is assumed that the recipient-identifying fields of a
   message, when copied into a report, are to be obscured to protect the
   identity of the end user who submitted the complaint about the
   message.  However, it is also presumed that other data will be left
   intact, and that data could be correlated against log files or other
   resources to determine the intended recipient of the message.


2.  Recommended Practice

   When redacting of reports is desired, in order to enable a report
   receiver to correlate reports that might refer to a common but
   anonymous source, the report generator SHOULD use the following
   practice:
   1.  Select an arbitrary string that will be used by an Administrative
       Management Domain (ADMD) that generates reports.  This string
       will not be changed except according to a key rotation policy or
       similar.  Call this the "redaction key".  The redaction key
       SHOULD be based on at least 64 bits of pseudo-random input.  (See
       Section 3.3 and Section 3.4 for additional discussion.)
   2.  Identify string(s) (such as local-parts of email addresses) in a
       message that need to be redacted.  Call these strings the
       "private data".
   3.  For each piece of private data, construct a new string that is a
       copy of the redaction key with the private data concatenated to
       it.
   4.  Compute a digest of each composite string with any hashing/digest
       algorithm; a secure hash such as one defined in [FIPS-180-3-2008]
       or a secure message digest algorithm based on a secure hash is



Falk & Kucherawy          Expires July 21, 2012                 [Page 3]


Internet-Draft                  Redaction                   January 2012


       suggested.  (See Section 3.3 for discussion.)
   5.  Encode each digest with the base64 algorithm as defined in
       [BASE64].
   6.  Replace each instance of private data with the corresponding
       base64-encoded hash when generating the report.

   This has the effect of obscuring the data in an irreversible way
   while still allowing the report recipient to observe that numerous
   reports are about one particular end user.  Such detection enables
   the receiver to prioritize its reactions based on problems that
   appear to be focused on specific end users that may be under attack.


3.  Security Considerations

3.1.  General

   General security issues with respect to these reports are found in
   [ARF].

3.2.  Digest Collisions

   Message digest collisions are a well-understood issue.  Their
   application here involves a report receiver improperly concluding
   that two pieces of redacted information were originally the same when
   in fact they are not.  This can lead to a denial of service, where
   the inadvertently improper application of complaint data causes
   unjustified corrective action.  Such cases are sufficiently unlikely
   as to be of little concern.

3.3.  Information Not Redacted

   Although the identity of a report generator can be redacted using
   this mechanism, other properties of a message (such as the Message-ID
   field) that are not redacted could be used to recover the original
   data by locating them in the message logs of the originating system
   or other data correlation techniques.  It is incumbent on the report
   generator to anticipate and redact or otherwise obscure such data, or
   accept that such recovery is possible even from the very simplest
   kinds of feedback.

   It is for this reason that the normative portions of this memo do not
   include stronger assertions about minimum lengths for the redaction
   key or the selection of particularly strong hashes.  Given the
   ultimate recoverability of the redacted information, the
   cryptographic strength of the hash and particularly long or
   unguessable keys are not critical security measures.




Falk & Kucherawy          Expires July 21, 2012                 [Page 4]


Internet-Draft                  Redaction                   January 2012


   The process of redacting a feedback report satisfies a privacy
   requirement established by local policy, and is not meant to provide
   strong security properties.

   [FBL-BCP] and Section 8 of [ARF] discuss topics related to
   establishment of bilateral agreements between report producers and
   consumers.  The issues raised here are also things to be considered
   when establishing such agreements.

3.4.  Key Management

   As with any application that uses secret keys, care must be taken to
   guard the redaction key against compromise.  If the key is no longer
   a secret, recovering the redacted information becomes a simple brute
   force attack.

   Also, periodically changing the key is a means of limiting the
   quantity of redacted information that would be exposed by the
   compromise of a single redaction key, and hence is advised.  However,
   a consideration when developing a key rotation policy is that
   correlation of the redacted form of the obscured information cannot
   occur across use of different redaction keys.

3.5.  Algorithm Vulnerabilities

   The simple key-message hash method described in Section 2 is
   vulnerable to some well known attacks that can be used to recover the
   redaction key.  This is especially important to consider if obtaining
   the redaction key also somehow creates an exposure for the report
   generator in other ways.  Although stronger mechanisms like [HMAC]
   close these loopholes, it is believed that such extra hardening is
   unnecessary given the discussion above.

   Future work that seeks to obscure private data in some way should not
   presume that this mechanism is sufficient.  It solves a simple policy
   requirement for this specific use case, and is not a reliable
   security mechanism for general use.


4.  Privacy Considerations

   While the method of redaction described in this document may reduce
   the likelihood of some types of private data from leaking between
   ADMDs, it is extremely unlikely that report generation software could
   ever be created to recognize all of the different ways that private
   information could be expressed through human written language.  If
   further protections are required, implementers may wish to consider
   establishing some sort of out-of-band arrangements between the



Falk & Kucherawy          Expires July 21, 2012                 [Page 5]


Internet-Draft                  Redaction                   January 2012


   relevant entities to contain private data as much as possible.


5.  IANA Considerations

   This memo includes no request to IANA.

   [RFC Editor note: This section may be removed prior to publication.]


6.  References

6.1.  Normative References

   [ARF]      Shafranovich, Y., Levine, J., and M. Kucherawy, "An
              Extensible Format for Email Feedback Reports", RFC 5965,
              August 2010.

   [BASE64]   Josefsson, S., "The Base16, Base32, and Base64 Data
              Encodings", RFC 4648, October 2006.

6.2.  Informative References

   [FBL-BCP]  Falk, J., "Complaint Feedback Loop Operational
              Recommendations", RFC 6449, November 2011.

   [FIPS-180-3-2008]
              U.S. Department of Commerce, "Secure Hash Standard", FIPS
              PUB 180-3, October 2008.

   [HMAC]     Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
              Hashing for Message Authentication", RFC 2104,
              February 1997.


Appendix A.  Example

   Assume the following input message:

     From: alice@example.com
     To: bob@example.net
     Subject: Make money fast!
     Message-ID: <123456789@mailer.example.com>
     Date: Thu, 17 Nov 2011 22:19:40 -0500

     Want to make a lot of money really fast?  Check it out!
     http://www.example.com/scam/0xd0d0cafe




Falk & Kucherawy          Expires July 21, 2012                 [Page 6]


Internet-Draft                  Redaction                   January 2012


   On receipt, bob@example.net reports this message as abusive through
   whatever mechanism his mailbox provider has established.  This causes
   an [ARF] message to be generated.  However, example.net wishes to
   obscure Bob's email address lest it be relayed to the offending
   agent, which could lead to more trouble for Bob.

   Thus, example.net plans to redact the local-part of the recipient
   address in the To: field.  It has selected a redaction key of
   "potatoes", and the private data in this case is the string "bob".
   The concatenation of "potatoesbob" is digested with SHA1 and then
   base64-encoded to the string "rZ8cqXWGiKHzhz1MsFRGTysHia4=".

   Thus, when constructing the ARF message in response to Bob's
   complaint, the following form of the received message is used in the
   third part of the ARF report:

     From: alice@example.com
     To: rZ8cqXWGiKHzhz1MsFRGTysHia4=@example.net
     Subject: Make money fast!
     Message-ID: <123456789@mailer.example.com>
     Date: Thu, 17 Nov 2011 22:19:40 -0500

     Want to make a lot of money really fast?  Check it out!
     http://www.example.com/scam/0xd0d0cafe

   Note, however, that it is possible the redacted information can be
   recovered by agents at example.com by searching their logs for the
   original envelope associated with the message by correlating with the
   Message-ID contents, which were not redacted here.  It is expected
   that feedback loops generating such reports involve senders that have
   been vetted against such information leakage.


Appendix B.  Acknowledgements

   Much of the text in this document was initially moved from other MARF
   working group documents, crafted by Murray S. Kucherawy with
   contributions from Monica Chew, Tim Draegen, Michael Adkins, and
   myself.  Additional feedback was provided by John Levine, S.
   Moonesamy, Alessandro Vesely, and Mykyta Yevstifeyev.











Falk & Kucherawy          Expires July 21, 2012                 [Page 7]


Internet-Draft                  Redaction                   January 2012


Authors' Addresses

   J.D. Falk (editor)
   Return Path
   100 Mathilda Place, Suite 100
   Sunnyvale, CA  94086
   US

   Email: ietf@cybernothing.org
   URI:   http://www.returnpath.net/


   M. Kucherawy (editor)
   Cloudmark
   128 King St., 2nd Floor
   San Francisco, CA  94107
   US

   Email: msk@cloudmark.com
































Falk & Kucherawy          Expires July 21, 2012                 [Page 8]