IMAPEXT T. Sirainen Internet-Draft April 16, 2009 Intended status: Standards Track Expires: October 18, 2009 IMAP4 Extension for Fuzzy Search draft-ietf-morg-fuzzy-search-00 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 18, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This document describes an IMAP protocol extension enabling server to perform searches with inexact string matching and assigning relevancy scores for matched messages. Sirainen Expires October 18, 2009 [Page 1]
Internet-Draft IMAP4 FUZZY SEARCH April 2009 Note A revised version of this draft document will be submitted to the RFC editor as a Proposed Standard for the Internet Community. Discussion and suggestions for improvement are requested, and should be sent to morg@ietf.org. 1. Conventions used in this document In examples, "C:" indicates lines sent by a client that is connected to a server. "S:" indicates lines sent by the server to the client. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [Kwds]. 2. Introduction When humans perform searches in IMAP clients, they typically want to see the most relevant search results first. IMAP servers are able to do this in the most efficient way when they're free to internally decide how searches should match messages. This document describes a new SEARCH=FUZZY extension that provides such functionality. 3. The FUZZY Search Key FUZZY search key takes another search key as its argument. For this search key server is allowed to perform all string matching in an implementation-defined manner instead of performing a strict substring search as requred by [RFC3501]. It MUST NOT modify the behavior of search keys that don't compare strings. For example: C: A01 SEARCH FUZZY (SUBJECT "Helo mom" SMALLER 5000 SEEN) S: * SEARCH 1 5 10 S: A01 OK Search completed. Besides matching messages with subject "Helo mom", the above search may also match messages with subjects "Hello mom", "Hello mother", or anything else the server decides that might be a good match. However, the result MUST NOT contain messages that don't have a \Seen flag set or whose size is 5000 bytes or more. Sirainen Expires October 18, 2009 [Page 2]
Internet-Draft IMAP4 FUZZY SEARCH April 2009 4. Relevancy Scores for Search Results The server SHOULD assign a search relevancy score for each matched message when the FUZZY search key is given. This allows the client to show the most relevant search results first. Relevancy scores are meaningful only when compared to other relevancy scores from the same server. This means that this specification doesn't define a "best relevancy" score. Instead the values are represented by an 32 bit integer and implementations can decide how high values they want to use. For example some implementations might decide to use only values 1-5 while others might decide to use the full 32 bit range. If server advertises the ESEARCH capability as defined by [ESEARCH], the relevancy scores can be retrieved using the new RELEVANCY return option for SEARCH: C: A02 SEARCH RETURN (RELEVANCY ALL) FUZZY TEXT "Helo" S: * ESEARCH (TAG "A02") ALL 1,5,10 RELEVANCY (4 107 42) S: A02 OK Search completed. RELEVANCY return option SHOULD NOT be used unless FUZZY search key is also given, otherwise servers MAY decide to give the same relevancy score to all messages. 5. Extensions to SORT If server advertises the SORT capability as defined by [SORT], the results can be sorted by the new RELEVANCY sort criteria: C: A03 SORT (RELEVANCY) UTF-8 FUZZY SUBJECT "Helo" S: * SORT 5 10 1 S: A03 OK Sort completed. The message with the highest score is returned first. As with RELEVANCY return option, RELEVANCY sort criteria SHOULD NOT be used unless FUZZY search key is also given. [[anchor6: TODO: Describe also RELEVANCY with ESORT.]] 6. Formal Syntax The following syntax specification uses the augmented Backus-Naur Form (BNF) as described in [ABNF]. It includes definitions from [RFC3501], [IMAP-ABNF] and [SORT]. Sirainen Expires October 18, 2009 [Page 3]
Internet-Draft IMAP4 FUZZY SEARCH April 2009 capability =/ "SEARCH=FUZZY" score = number ;; <number> from [RFC3501] score-list = "(" [score *(SP score)] ")" search-key =/ "FUZZY" SP search-key search-return-data =/ "RELEVANCY" SP score-list ;; Conforms to <search-return-data>, from [IMAP-ABNF] search-return-opt =/ "RELEVANCY" ;; Conforms to <search-return-opt>, from [IMAP-ABNF] sort-key =/ "RELEVANCY" 7. Security Considerations This document is believed not to have any security implications. 8. IANA Considerations IMAP4 capabilities are registered by publishing a standards track or IESG approved experimental RFC. The registry is currently located at: http://www.iana.org/assignments/imap4-capabilities This document defines the X-DRAFT-I00-SEARCH=FUZZY [[anchor7: Note to RFC Editor: fix before publication]] IMAP capability. IANA is requested to add it to the registry. 9. Acknowledgements Alexey Melnikov has helped with this document. 10. Normative References [ABNF] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 5234, January 2008. Sirainen Expires October 18, 2009 [Page 4]
Internet-Draft IMAP4 FUZZY SEARCH April 2009 [ESEARCH] Melnikov, A. and D. Cridland, "IMAP4 Extension to SEARCH Command for Controlling What Kind of Information Is Returned", RFC 4731, November 2006. [IMAP-ABNF] Melnikov, A. and C. Daboo, "Collected Extensions to IMAP4 ABNF", RFC 4466, April 2006. [Kwds] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [RFC3501] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1", RFC 3501, March 2003. [SORT] Crispin, M. and K. Murchison, "Internet Message Access Protocol - SORT and THREAD Extensions", RFC 5256, June 2008. Author's Address Timo Sirainen Email: tss@iki.fi Sirainen Expires October 18, 2009 [Page 5]