Personal Information Tagging for Logs (PITFoL)
draft-rao-pitfol-00

Document Type Active Internet-Draft (individual)
Last updated 2019-11-04
Stream (None)
Intended RFC status (None)
Formats plain text xml pdf htmlized bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                             S. Rao
Internet-Draft                                                      Grab
Intended status: Experimental                                   S. Sahib
Expires: May 7, 2020                                            R. Guest
                                                              Salesforce
                                                        November 4, 2019

             Personal Information Tagging for Logs (PITFoL)
                          draft-rao-pitfol-00

Abstract

   Software applications typically generate a large amount of log data
   in the course of their operation in order to help with monitoring,
   troubleshooting, etc.  However, like all data generated and operated
   upon by software systems, logs can contain information sensitive to
   users.  Personal data identification and anonymization in logs is
   thus crucial to ensure that no personal data is being inadvertently
   logged and retained which would make the logging application run
   afoul of laws around storing private information.  This document
   focuses on exploring mechanisms to specify personal or sensitive data
   in logs, to enable any server collecting, processing or analyzing
   logs to identify personal data and thereafter, potentially enforce
   any redaction.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 7, 2020.

Rao, et al.                Expires May 7, 2020                  [Page 1]
Internet-Draft                   PITFoL                    November 2019

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Motivation and Use Cases  . . . . . . . . . . . . . . . . . .   3
   4.  Techniques  . . . . . . . . . . . . . . . . . . . . . . . . .   4
     4.1.  Field Level Tagging . . . . . . . . . . . . . . . . . . .   4
     4.2.  Log Level Tagging . . . . . . . . . . . . . . . . . . . .   4
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   5
   8.  Normative References  . . . . . . . . . . . . . . . . . . . .   5
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   6

1.  Introduction

   Personal data identification and redaction is crucial to make sure
   that a logging application is not storing and potentially leaking
   users' private information.  There are known precedents that help
   discover and extract sensitive data, for example, we can define a
   regular expression or lookup rules that will match a person's name,
   credit card number, email address and so on.  Besides, there are data
   dictionary and datasets based training models that can predict the
   presence of sensitive data.  In most cases, however, what data is
   considered personal and sensitive is often subjective, provisional
   and contextual to the data source or the application processing the
   data, which makes it hard to use automated techniques to identify
   personal data.  The challenges are summarized as follows:

   - What comprises personal data is often subjective and use case
   specific.

Rao, et al.                Expires May 7, 2020                  [Page 2]
Internet-Draft                   PITFoL                    November 2019
Show full document text