Sieve Email Filtering: Detecting Duplicate Deliveries
RFC 7352

Note: This ballot was opened for revision 06 and is now closed.

Barry Leiba Yes

(Jari Arkko) No Objection

(Alia Atlas) No Objection

(Benoît Claise) No Objection

Comment (2014-06-26)
No email
send info
Background info:
While doing a PC migration, along with an email migration, I found myself in the situation where
- I had some duplicate emails in thunderbird
- Some of my emails were already manually classified into folders. So it was hard to discover those without redoing the manual classification.

This thunderbird add-on , http://removedupes.mozdev.org/, was very helpful to me.

Discussing with Stephan Bosch, I understand that Thunderbird add-on is used to remove duplicates from the user's mailbox, after delivery, while this Sieve extension is used to detect duplicates while they are being delivered. This is performed using a persistent duplicate tracking list with unique ID values (typically the Message-ID) of previous deliveries and not by searching the destination folder(s) for messages with a matching ID.

The issue with the SIEVE extension is that you have to enable this functionality in advance ... when you know that you're going to do something dangerous. Maybe that works ... but that reminds me of access-list management: "No, I will not do anything stupid! ... oops, I can't access my device any longer!". Or maybe you probably expect this SIEVE extension to be always on? I don't think it's mentioned.

Along the same line (and with my use case in mind):

  Implementations SHOULD let entries in the tracking list expire after
   a short period of time. 

I was thinking: "short" = seconds, or minute. So not applicable to my email/PC migration use case.
I saw later:

   A default expiration time of around 7 days is usually
   appropriate. 

Good.

I like your 4 examples, but the one I was more interested into was: can we send the duplicates into the same folder.
That would allow me to troubleshoot the root cause being the duplicates (subscribed to 2 mailing lists, synch issue, redirection, etc.)

Bottom line: this draft could be improved by discussing the use cases you have in mind. Certainly not a blocking factor though

Alissa Cooper (was Discuss) No Objection

Comment (2014-06-26 for -08)
No email
send info
Thanks for addressing my DISCUSS points. 

As for protecting scripts/address books in transit, per Ned's email, I would be interested to know if there is anyone interested in taking that work up. Or if not, if we could at least note it somewhere as an outstanding vulnerability -- maybe at https://trac.tools.ietf.org/group/ppm-legacy-review/ (which I can't load right now because the tools site seems to be down, so not sure if that is a good place)?

(Spencer Dawkins) No Objection

(Adrian Farrel) No Objection

Comment (2014-06-21 for -07)
No email
send info
A small point that is not at the level of a Discuss, but...
Are there not some implications on the integrity of the message ID on a message that should be stated. Clearly, if a message ID can be touched, the message can be made to appear to be a duplicate causing the sieve to throw it out.

(Stephen Farrell) No Objection

Comment (2014-06-26)
No email
send info
I wondered what'd happen if you used a DKIM-Signature
header with this, but I guess it should just work.
However, I don't recall if that header value is ok to
compare case-sensitive (e.g. "d=" might not be?).  I
don't think any of your examples show how to do the
tolower thing with a header field value, (or is that the
default for "set"?) so I guess you could add one that
does, but up to you, since I assume the intended
readership know this stuff.

Nothing to do with this draft in the end, but I think the 
security/privacy discussion ended up raising a couple of
interesting issues that might be worth revisiting if/when
someone has energy: those were a) if we could make some
good privacy-friendly (but also admin friendly) 
recommendations about logging mail and b) if we could
consider the privacy implications of sieve scripts or
other filters (I liked the "stupid boss" folder name one,
and am guilty of that for some of my own mail:-) and what 
those might expose. For (a) I could imagine a useful
informational RFC, not sure for (b).

(Brian Haberman) No Objection

Comment (2014-06-24 for -07)
No email
send info
I will watch, with interest, the discussion of Alissa's DISCUSS points.

(Joel Jaeggli) No Objection

(Ted Lemon) No Objection

(Kathleen Moriarty) (was Discuss) No Objection

Comment (2014-06-26)
No email
send info
Thanks for clarifying my questions in the updated text!

(Pete Resnick) No Objection

Comment (2014-06-25 for -07)
No email
send info
I have no objection to this extension; I think it will be helpful for some folks. Interestingly, I can't see ever using it myself: When I get duplicates from a mailing list, I *always* want the one that was sent from the list, with the List-* header fields on it, and *not* the one sent directly to me. Unfortunately, the one from the list is almost always going to arrive after the one sent directly to me, and the extension doesn't give me enough state to find the message(s) that came in earlier so I can decide which one to keep. But like I said, I can see others using this.

As to specific comments:

   As a side-effect, the "duplicate" test adds the message ID to an
   internal duplicate tracking list once the Sieve execution finishes
   successfully.

I think this may have been mentioned in response to someone else's comment, but perhaps this should be:

   As a side-effect, the "duplicate" test adds a unique identifier
   (again, by default the contents of the Message-ID header field) to an
   internal duplicate tracking list once the Sieve execution finishes
   successfully.

And then change other occurrences of "message ID" to "unique identifier" elsewhere in the document as appropriate.

(Martin Stiemerling) No Objection