Internet-Draft                                Thommy Eklof
Category: Informational                       Ericsson
Expires: April 14, 2000                       Leslie L. Daigle
                                              Thinking Cat Enterprises
                                              October 14, 1999


              Wide Area Directory Deployment Experiences
                  draft-eklof-dag-experiences-01.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 14, 2000.


Abstract

The TISDAG (Technical Infrastructure for Swedish Directory Access
Gateway) project provided valuable insight into the current
reality of deploying a wide-scale directory service.  This
document catalogues some of the experiences gained in developing
the necessary infrastructure for a national (i.e., multi-organizational)
directory service and pilot deployment of the service in an environment
with off-the-shelf directory service products.  A perspective on
the project's relationship to other directory deployment projects
is provided, along with some proposals for future extensions of
the work (larger scale deployment, other application areas).

These are our own observations, based on work done and general project
discussions. No doubt, other project participants have their own list
of project experiences; we don't claim this document is exhaustive!


1.0 Introduction

1.1 Overview of the TISDAG project

As described in more detail in [TISDAG], the original intention of the
TISDAG project was to provide the infrastructure for a national
whitepages directory service.  To be effective, such an infrastructure
needed to address the concrete realities of end-users' existing client
software, as well as the needs of information providers ("Whitepages
Directory Service Providers" -- WDSPs).  These realities include the
existence of multiple protocols (so-called directory service access
protocols, as well as more general Internet application protocols such
as HTTP and SMTP).  The project was also sensitive to the fact that
WDSPs have many good reasons for being reluctant to relinquish copies of
their subscribers' personal data.

1.2 Organization of this document

In an effort to communicate the experiences with this project, from
conception through implementation and pilot deployment, this
document is divided into 3 major sections.  The first section
reviews specific lessons learned by the authors through the TISDAG
project and implementation of one conformant system.  Next, some
perspectives are offered on the relationship of the TISDAG work to
other large-scale directory projects that are currently on-going, to
give a sense of how these efforts might possibly interact.  Finally,
some preliminary thoughts on applying the DAG system to other
applications and deployment environments are outlined.  More speculation
on useful development of architectural principles is provided in
a separate document ([DAG++]).

2.0 The TISDAG project itself

2.1  TISDAG overview

Briefly, the technical infrastructure proposed for the TISDAG
project (see [TISDAG] for the complete overview and technical
specification) provides end-user client software with connection
points to perform basic whitepages queries.  Different connection
points are provided for the various protocols end-users are likely
to wish to use to access the information -- WWW (http), e-mail (SMTP),
Whois++, LDAPv2 and LDAPv3.  For each client, a transaction will
be carried out within the bounds of the protocol's syntax and
semantics.  However, since the TISDAG system does not maintain
a replicated copy of all whitepages information, but rather an
index over the data that allows redirection (referrals) to services
that are likely to contain responses that match the client's query, a
fair bit of background work must be done by the DAG system in order to
fulfill the client's query.

The first, and most important step, is for the system to make a query
against the DAG Referral Index -- a server containing index
information (obtained by the Common Indexing Protocol (see [CIP1,
CIP2, CIP3]) in the Tagged Index Object format (see [TIO]).  This index
contains sufficient information to indicate which of the many
participating WDSPs should be contacted to complete the query.
Wherever possible, these referrals are passed back to the querying
client so that it can contact relevant WDSPs directly.  This
minimizes the amount of work done by the DAG system itself, and
allows WDSPs greater visibility (which is an incentive for participating
in the system).  Protocols which support referrals natively include
Whois++ and LDAPv3 -- although these may only be referred to
servers of the same protocol.

Since many protocols do not support referrals (e.g., LDAPv2), and
in order to address referrals to servers using a protocol other
than the calling client's own, a secondary step of "query chaining"
is provided to pursue these extra referrals within the DAG
system itself.  For example, if an LDAPv2 client connects to the
system, a query is made against the Referral Index to determine
which WDSPs may have answers for the query, and then resources within
the DAG system are used to pursue the query at the designated WDSPs'
servers.  The results from these different services are packaged
into a single response set for the client that made the query.

The architecture that was developed in order to support the required
functionality separated the system into distinct components
to handle incoming queries from client software ("Client Access
Points", or CAPs), a referral index (RI) to maintain an index over
the collected whitepages information and provide referrals, based
on actual data queries, to WDSPs that might have relevant information,
and finally components that mediate access to WDSP whitepages
servers to perform queries and retrieve results for the client's
query ("Service Access Points", or SAPs).  Several CAPs and SAPs
exist within the system -- at least one for every protocol supported
for incoming queries and WDSP servers, respectively.

Designed to be implementable as separate programs, these components
interact with each other through the use of an internal protocol --
the DAG/IP.  Pragmatically, the use of the protocol means that
different components can reside on different machines, for reasons
of load-balancing and performance enhancement.  It also acts
as a "common language" for the CAPs, SAPs and RI to express queries
and receive results.

This outlines the planned or ideal behaviour of the system; once
designed, a pilot phase was started for the project to compare
reality against expectations.  Two independent implementations of the
software were created, and a test deployment was set up within the
Swedish University Network (SUNET).  More detail on the project and its
current status can be found at http://tisdag.sunet.se/.

The rest of this section outlines some conclusions drawn from
making a reality of the proposed architecture -- both successes and
surprises.


2.2 Some successes

Implementation and pilot deployment of software meeting the TISDAG
technical specification did demonstrate some important successes
of the approach.

Most notably, the system works pretty much as expected (see exceptions
below) to provide transparent middleware for whitepages directory
services.  That is, client software and WDSP servers were minimally
affected -- from the point of view of behaviour and configuration,
the DAG system looked like a server to clients, and a client to servers.

The goal of the TISDAG project, operationally, was to be able to
provide responses to end-user queries in reasonable response times
(although not "an addressbook replacement").  The prototype systems
demonstrated some success in achieving responses within 10 seconds,
at least with the limited testbed of a configuration with 10 WDSP's
providing directory service information.  More observations on
system performance are provided below.

The DAG system does demonstrate that it is possible to build
referral-level services at a national level (although the deployment
has yet to prove conclusively that it can, in its current
formulation, operate as a transparent query-fulfillment proxy
service).

The success of the implementation demonstrated that it is possible,
in some sense, to do (semantic) protocol mapping with N+M complexity
instead of NxM mappings.  That is, protocol translations had to
be defined for "N" allowable end-user query access protocols to/from
the DAG/IP, and "M" supported WDSP server protocols, instead of
requiring each of the N input components to individually map to
the M output protocols.

As a correlated issue, the prototype system demonstrated some
successes with mapping between schema representations in the
different protocol paradigms -- in a large part because system's
schemas were kept simple and focused on the minimal needs to support
the base service requirements.


2.3 Some surprises

Over the span of a dozen months from the first "final" draft of
the specification through the implementation and first deployment
of the software system, a few surprises did surface.  These
fell into two categories:  those that surfaced when the theoretical
specification was put into practice, and others that became apparent
when the resulting system was put into operation with commercial
software clients and servers.

More detail is provided in the Appendix concerning specific
software issues encounterd, but some of the larger issues that surfaced
during the implementation phase are describe below.


2.3.1 LDAP objectclasses and the "o" attribute

It came as a considerable surprise, some months into the project,
that none of the "standard" LDAP person objectclasses   included
organization ("o") as an attribute. The basic assumption seems to be
that "o" will be part of the distinguished name for an entry, and
therefore there is little (if any) cause to list it out separately.
This does make it trickier to store information for people
across multiple organizations (e.g., at an ISP's directory server) and
use the organization name in query refinement. (Roland Hedberg caught
this issue, and has flagged it to the authors of the "inetorgperson"
objectclass document).

2.3.1 The Tagged Index Object

The Tagged Index Object ("TIO"), used to carry indexes of WDSP
information to the RI, is designed to have record (entry) tags to
reduce the number of false positive referrals generated when doing a
search in the RI.  One of the features of the first index object type,
Whois++'s centroid (see [centroid]) was the fact that the index object
size did not grow linearly with the size of data indexed -- i.e., at
some point the growth of the index object slowed as compared to that of
the underlying data set.  At first glance, this also seems to be the
case for the TIO.  However, as the index grows in size the compression
factor of the TIO may not achieve the same efficiency as the centroids.
One reason for this is that the tagged lists can get quite long,
depending on the ordering of the assignment of tags to the underlying
data.  That is, the tagging as defined allows for a compressed
expression of tag "ranges" -- e.g., "1-500" instead of "1,2,3,[...]500".
Thus, it might be interesting to explore an optimal "sorting" of
underlying data, before applying tags, in order to arrange the
most common tokens have consecutive tags (maximal compression of the
tag lists).  It's not clear if this can be done efficiently over
the entire set of records, attributes, and tokens, but it would bear
some investigation, to produce the most compressed TIO for transmission.

Additionally, in order to make (time) efficient use of the tags in the
RI in practice, it is almost necessary to "reinflate" the index object
to be able to do joins on tag lists associated with tokens that match.
Alternatively, the compressed tag list can be stored, and there is an
additional cost associated with comparing the tag lists for matching
tokens -- i.e., list comparison operations done outside the scope of a
base database management system.  There was an unexpected tradeoff to
be made.

2.3.3  Handling Status Messages

Mapping of status messages from multiple sub-transactions into a single
status communication for the end-user client software became something
of a challenge.  When chaining a query to multiple WDSPs (though the
SAPs), it is not uncommon for at least one of the WDSP servers to
return an error code or be unavailable.  If one WDSP cannot be reached,
out of several referrals, should the client software be given the
impression that the query was completed successfully, or not?  Most
client protocol error handling models are not sophisticated enough to
make this level of distinction clear.

2.3.4  Deployment with Commercial Software

When it then was time to test the resulting software with standard
commercial client and server software, a few more surprises came
to light (primarily in terms of these softwares' expected worldview
and occasional implementation shortcuts).  Again, more detail
is provided in the Appendix, but highlights included client
software that could only handle a very small subset of a protocol's
defined status message lexicon (e.g., 2 system messages supported),
and client software that automatically appended additional terms
to a query specified by the user (e.g., adding "or email=<what
the user typed in to the query>").


2.4 Some observations

2.4.1 Participation of the WDSPs

One of the things that came to light was that the nature of the
index object generated by the WDSPs has an important impact on
performance -- both in terms of integrating the index object
into the Referral Index, and in terms of efficiency of handling
queries.  A proposal might be either to define more clearly how
the WDSPs should generate the CIP index object (currently left to their
discretion), or to alert individual WDSPs when their index objects
are considered substandard.

On another front, when chaining referrals to WDSP servers, some
servers perform more efficiently than others, affecting the overall
response time of the DAG system.  From a service point of view,
it should also be possible to suggest to WDSP's that are consistently
slow (longer than some selected response time) that they are substandard.

2.4.2 Index Objects and Referral Index size

As described in more detail [complex], there are many factors that
can influence the growth factor of index objects (as more data is
indexed).  That work dealt specifically with tokenized data for
Whois++ centroids, and is not immediately generalizable to all forms
of the Tagged Index Object.  However, the particular structure of
the TIO used for the TISDAG project is similar enough in structure
to a centroid that the same "order of magnitude" and growth
characteristics are applicable.

Factors that affects the size of the data ("number of entries"):

        . Number of generated tokens
          The number of tokens generated from the directory data depends
          on what is tokenized. If data is tokenized on names and
          addresses (i.e. not unique data like phone numbers) a rough
          estimation is that the
          number_of_tokens = 0.2 * number_of_data_records. The growth
          is linear in the span from a few thousend to at least 1.2
          million records. The growth should then level off since the
          sets of names and addresses are finite, but the current tests
          have not shown a break point.

          If data is tokenized on something that is unique, e.g. phone
          numbers, then a rough estimation is that the
          number_of_tokens = number_of_data_records. Note that it is
          possible to tokenize in different ways, for example divide the
          phone numbers in parts. This would result in fewer tokens.

        . Number of directories
          Since the tokens are generated individually for each directory,
          the data size depends on the number of directories.
          10 directories with 100.000 records will generate the same
          amount of tokens as one directory with 1.000.000 records.

2.4.3 Index Object and Query Performance

Factors that affects the performance ("queries/second"):

        . Type of query (exact, substring, etc.)
          A 'substring' query is slower than an 'exact' query due to:
          1) somewhat slower look-up in the internal DAG database than
             an exact query.
          2) Mostly, a larger amount of data is fetched from the
             internal DAG database due to more hits, which generates more
             index processing.
          3) Substring queries are sent to the directory servers which
             also results in more hits and more data fetched. The
             directory servers may also be more or less effective
             in handling substring queries.

        . Number of search attributes
          A query with one or few attributes will most of the time
          result in many hits, which results in a lot of data, both
          internally in DAG and from the directory servers. On the other
          hand, a query with many attributes will result in a somewhat
          slower look-up in the internal DAG database.

        . Number of directories
          A larger number of directories may result in many referrals,
          but it depends on the query. A simple query will generate a
          lot of referrals, which means a lot of data from the
          directories has to be fetched. It will also result in a
          somewhat slower look-up in the internal DAG database.

        . Number of chained referrals
          Queries that are not chained are faster, since the result data
          does not have to be sent through the DAG system. Chained
          queries to several directories can be processed in parallel in
          the SAPs, but all data has to be processed in the CAP before
          sent to the client.

        . Response time in the directory servers
          The response time from the directory servers are of course
          critical. The total response time for DAG is never faster
          than the slowest involved directory server.

        . Number of tokens (size of Tagged Index Objects)
          The number of tokens has little impact on the look-up time in
          the internal DAG database.


2.5 Some evolutions

To date, the TISDAG project has been "alive" for just over two years.
During that time, there have been a number of evolutions -- in
terms of technologies and ideas outside the project (e.g., user
and service provider expectations, deployment of related software, etc)
as well as goals and understanding within the scope of the project.

Chief among these last is the fact that the project set out to
primarily fulfill the role of a national referral service, and
gradually evolved towards becoming more of a transparent protocol
proxy service, fulfilling client queries as completely as possible,
within the client protocol's semantics.  This evolution was probably
provoked by a number of reasons -- existing client & server software
has a narrower range of accepted (expected) behaviour than their
protocol specs may describe, once the technology was there for
some proxying, going all the way seemed to be within reach, etc.

>From the point of view of providing a national whitepages service,
this is a very positive evolution.  However, it did place some
strains on the original system architecture, for which some
adjustments have been proposed (more detail below).  What is less
clear is the impact this evolution will have on the flexibility
of the system architecture -- in terms of addressing other applications,
different protoocols (and protocol paradigms), etc.  That is,
the original intention of the system was to very simply fulfill an
unsophisticated role -- "find things that sort of match the input
query and let the client itself determine if the match is close enough".
As the requirements become more sophisticated, the simplicity of the
system is impacted, and perhaps more brittle.  (Some proposals for
avoiding this are outlined in [DAG++], which attempts to return to
the underlying principles and propose steps forward at that level).

In terms of impact within the TISDAG project, this evolution lead
to the following technical adjustments:

        . The latest version of the technical specification makes
          a distinction (in the internal protocol grammar) between
          queries directed at the Referral Index, and those passed
          to SAPs to fulfill a query.  This distinction keeps the
          query-routing queries simple, but allows more sophistication
          in expressing a query designed to fulfill the client's
          original semantic expression.

        . The additional constraints in the SAP query language
          is still not enough to allow the internal protocol to
          express very sophisticated queries.  Originally intended
          only for query-routing queries, the DAG/IP expects all
          queries to be token-based (whereas LDAP queries are
          phrase-oriented).  This means that SAPs have to do
          a good deal of "post-pruning" of WDSP result sets to match
          the DAG/IP query sent by a CAP for query fulfillment.
          And, CAPs must in turn do more post-pruning to match
          the DAG/IP results (from the SAPs) to the original query
          semantics.

The real strength of the TISDAG project was that it separated
the technical framework needed to support the service from
the configuration required in order to support a particular
application or service -- query & schema mapping, configuration
for protocols, etc.  Future improvements should focus on
evolving that framework, maintaining the separation from the specific
applications, services, and protocols that may use it.


3.0 Related Projects

The TISDAG project is not alone in attempting to solve the problems
of providing coordinated access to resources managed by multiple,
disparate services.

3.1 The Norwegian Directory of Directories (NDD)

Described in [NDD], the Norwegian Directory of Directories project
also aims to provide necessary infrastructure for a national
directory service.  It assumes LDAP (v2 or v3) accessibility
of WDSP information (provided by the WDSP itself, or through
other arrangements), and aims to resolve some of the trickier
issues associated with hooking together already-operational LDAP
servers into a coherent network:  uniform distinguished naming
scheme, and content-based referrals.  It also addresses some of the
pragmatic realities of being compatible with different versions
of LDAP clients -- e.g., v2, which does not support referrals, and v3,
which does.

At the heart of the system is the "Referral Index and Organizational
information" (RIO) server, which provides a searchable catalogue
over Norwegian organization. This faciliates the location of
whitepages servers for individual organizations (assuming the
query includes information about which organization(s) is(are)
interesting).

This work can be seen as being complementary to the TISDAG work,
in that it provides a more focused service for integrating LDAP
directory servers.  However, there is still some requirement that
one knows the organization to which a person belongs before doing
a search for their e-mail address. This may be reasonable for
seeking mail addresses associated with a person's work organization,
but is less often successful when it comes to finding a personal
e-mail address -- in an age where ISPs abound, a priori knowledge
of a user's ISP identification is unlikely.

3.2 DESIRE Directory Services

The EC funded project DESIRE II (http://www.desire.org) is developing a
distributed European indexing system for information on Research and
Education. The Directory Services work undertaken by DANTE and SURFnet
proposes an architecture applied to a server mesh structure to create a
wide-area directory service infrastructure.

This service is intended to support both whitepages information with
LDAP servers at WDSPs, as well as a Web-search meshes at various places
using Whois++ for information about resources and routing of queries to
other index-based services.

Like the TISDAG project, the DESIRE directory services project
aims to act as a focal point for queries, allowing client software
to access appropriate resources from a wide range of disparate
services.

There are architectural differences between the approach used in
the TISDAG project and the DESIRE directory service project, but
many of the driving needs are the same, and the approach of using
content-based indexing and referrals was also selected.



4.0 Some Directions for TISDAG Next Steps

The fun thing with technology is that there are always more tweaks
and changes that can be made.  However, a service should evolve
in response to specific customer needs, and there are 3 critical
ways in which the TISDAG service itself should advance.  These are
outlined below, in terms of possibilities perceived at this time,
rather than specific recommendations for underlying technology changes
that would be necessary to fulfill them.


4.1 Integrating multiple DAG networks: mesh.

4.1.1  Overview of mesh possibilities

The Common Indexing Protocol is designed to facilitate the creation
not only of query referral indexes, but also of meshes of
(loosely) affiliated referral indexes. The purpose of such a mesh of
servers is to implement some kind of distributed sharing of indexing
and/or searching tasks across different servers. So far, the TISDAG
project has focused on creating a single referral index; the obvious
next step is to integrate that into a larger set of interoperating
services.

Two different possibilities are possible for extending the TISDAG
service to a mesh model (or some combination of both).  First, it
should be possible to create a mesh of DAG-based services.  Or,
it might be interesting to use the mesh architecture to incorporate
access to other types of services (e.g., the Norwegian Directory
of Directories).  In either case, the basic principle for establishing
a mesh is that interoperating services should exchange index objects,
according to the architecture of the mesh (e.g., hierarchical, or
graph-like, preferrably without loops!).

As is outlined in the CIP documentation ([CIP1]), many possibilities
exist for mechanisms for creating indexes over multiple referral
servers -- for example, WDSP index objects could be passed along
untouched, or a referral index server's contents could be aggregated
into a new index object, generating referrals back to that server.

The proposal is that the mesh should be constructed using index
objects aggregated over participating services' servers.  That is,
referrals will be generated to other recognized services, not
their individual participants.  This can be done as a hierarchy
or a level mesh one-layer deep, but the important reason for
not simply passing forward index objects (unaggregated) is that
individual services may support different ranges of access protocols,
have particular security requirements, etc.  Referrals should
be directed to a CAP or CAPs -- either the standard ones used
by the DAG system, or new ones established to support particular
semantics of remote systems (e.g., other query types, etc).  Within
a given DAG system,  referrals to these remote servers will look
just like any other referral, although a particular SAP or SAPs
may be established to provide query fulfillment (again, to enable
translations between variations of service, to allow secure access if
the relationship between the services is restricted, etc).

In the following scenarios of mesh traversal, the assumption is
that the primary service in discussion (Country A in Scenario 1,
Country B in Scenario 2) is a DAG-based service.  The scenarios
are presented in the light of interoperating DAG services, but in
most cases it would be equally applicable if the remote service
was provided by some other service architecture.  Again, the key
element for establishing a mesh of any sort is the exchange of the
CIP index object, not internal system architecture.

4.1.2  Scenario 1:  Top Down

Suppose 2 countries tie their services together.  A user makes a query
in Country A.  A certain number of hits are made against the index
objects of A's WDSPs.  There is also a hit in the aggregate index of
Country B.  There are 3 possible cases under which this must be handled:

Case 1:

Country A and Country B are running services that are essentially the
same -- in terms of protocols, queries, and schema that are supported.
In this case, one referral should be generated per protocol supported
by Country B's service.  The referral can be passed back as far as the
client, if its protocol supports referrals.  Alternatively, the CAP
may chain the referral through an appropriate SAP, in the usual fashion.
In other words, the CAPs of Country B's service act as WDSPs to
Country A's service.

Consider the following illustration (only relevant CAPs, SAPs, etc, are
shown; others suppressed for lack of room):

        +-----------------+
   (1)  |-----+ Country A |     +-------+
 ------>|Prot1|   DAG     |     |A-WSDP1|
 <------| CAP |     +-----|     | Prot1 |
   (2)  |-----+     |Prot1|     +-------+
        |           | SAP |
 ----+  |           +-----|     +-------+
  (3)|  |    +-------+    |     |A-WDSP2|
     |  |    | RI-A  |    |     | Prot1 |
     |  +-----------------+     +-------+
     |
     |                          +-------+
     |                          |A-WDSP3|
     |                          | Prot2 |
     +----------------+         +-------+
                      |          [...]
                      |
                      |         +-----------------+
                      |         |-----+ Country B |     +-------+
                      +-------->|Prot1|   DAG     |     |B-WSDP1|
                                | CAP |     +-----|     | Prot2 |
                                |-----+     |Prot1|     +-------+
                                |           | SAP |
                                |           +-----|     +-------+
                                |    +-------+    |     |B-WDSP2|
                                |    | RI-B  |    |     | Prot1 |
                                +-----------------+     +-------+
                                                         [...]

where
        Prot[i] is some particular query protocol
        RI-A has an index over all A-WDSP[i] and RI-B
        RI-B has an index over all B-WDSP[i]
        (1) is the query to the Country A DAG system, which
            yields a referral based on the index object from RI-B
        (2) is that referral
        (3) is the resolution of that referral, which the client takes
            to the Country B DAG system directly (to find out which, if
            any, B-WDSP[i] have relevant information)


Case 2:

Country A and Country B are running services that address the same
service type (e.g., whitepages), but are not using an identical
collection of protocols, allowed queries, or schema.  The index object
that Country B sent to Country A's DAG service must be constructed in
terms of Country A's service, in order for appropriate hits to be
generated against the index object (i.e. for referrals to Country B's
service).  However, to resolve the referral, it will be necessary to do
some further protocol/schema/query mapping.  This can be done by a
special SAP established within Country A's service, that maps
Country A's service into the published service of Country B.  Country A
may then elect to support only one of Country B's access protocols, and
the designated SAP will always contact one type of CAP at Country B.

Alternatively, Country B can establish a particular CAP that does the
mapping from Country A's service into something that is most appropriate
against the internal structure of its service.  In this case,
Country A's referral will be to a special CAP in Country B's service
(which, again, will look like a WDSP to the Country A service); in fact,
the referral may be handled directly by the client software.  The
difference between the two possible approaches lies in the
responsibility of managing the relationship between the 2 service types.
On the one hand, Country A could handle it if it knows its service as
well as the published access to Country B. On the other, Country B
could be responsible for establishing a CAP for every country that may
want to connect to it.  The latter can, in some cases, be justified by
the amount of internal optimization that can be done, and because it
reduces the overhead for Country A's service (can pass the referral
directly back to the client software).

Consider the following illustration (only relevant CAPs, SAPs, etc, are
shown; others suppressed for lack of room):

        +-----------------+
   (1)  |-----+ Country A |     +-------+
 ------>|Prot1|   DAG     |     |A-WSDP1|
 <------| CAP |     +-----|     | Prot1 |
   (2)  |-----+     |Prot1|     +-------+
        |           | SAP |
 ----+  |           +-----|     +-------+
  (3)|  |    +-------+    |     |A-WDSP2|
     |  |    | RI-A  |    |     | Prot1 |
     |  +-----------------+     +-------+
     |
     |                          +-------+
     |                          |A-WDSP3|
     |                          | Prot2 |
     +----------------+         +-------+
                      |          [...]
                      |
                      |         +-----------------+
                      |         |-----+ Country B |     +-------+
                      |         |Prot3|   DAG     |     |B-WSDP1|
                      |         | CAP |     +-----|     | Prot3 |
                      |         |-----+     |Prot3|     +-------+
                      |         |---------+ | SAP |
                      |         |Country A| +-----|
                      +-------->|CAP:Prot1|       |
                                |---------+       |     +-------+
                                |    +-------+    |     |B-WDSP2|
                                |    | RI-B  |    |     | Prot3 |
                                +-----------------+     +-------+
                                                         [...]

where
        Prot[i] is some particular query protocol
        RI-A has an index over all A-WDSP[i] and RI-B
        RI-B has an index over all B-WDSP[i]
        (1) is the query to the Country A DAG system, which
            yields a referral based on the index object from RI-B
        (2) is that referral
        (3) is the resolution of that referral, which the client takes
            to the Country B DAG system directly, but to a CAP that
            is specifically designed to accommodate protocols from
            Country A's service, and map it (and schema) into Country
            B's service.  Likely, all Country B referrals will be
            chained for the Country A client

Case 3:

The third possibility is, in fact, a refinement of the first.  If
Country A and Country B are running services that are every way
identical except for the data (WDSPs covered), then it may make
sense to NOT aggregate Country B's WDSP index objects, but to
copy them to Country A's server.  Then, Country A's CAPs might
be given access to the SAPs of Country B in order to carry out
chaining directly at the remote service (instead of implicating
Country A's SAPs and Country B's CAPs, as in the first example
above).  The answer does not come from technology -- it depends
entirely on the nature of the relationship that can be established
between Country A and Country B's services.

4.1.3  Scenario 2:  Working Up

The above scenario implicitly assumes that Country A's server had
received index objects from Country B's server.  This will be the
case if Country A's server is higher in the levels of a hierarchy
of services (established by agreements between the service operators),
or if the network is comprised of servers that share their index
objects with all others, for example.  In the latter case, searching
at any one of the servers in the service yields the full range of
results -- referrals will be made to any other server that might
have data that fulfills the user's query.  The sharing of the
index objects is a mechanism to allow each server to manage local
data, while enabling distributed load-sharing on the basic query
handling.

However, if a hierarchical, or at least not-completely-connected
model is used for the server network, queries carried out at a
level other than the top of the hierarchy, or in one particular branch
of the hierarchy, will not actually be matched against all index
objects.  Therefore, there may be other servers to which the query
should be directed if the full space needs to be searched. Suppose,
for example, that in the above example Country B is in fact lower in
the hierarchy than Country A.  A user sending a query to Country
B's service may be content to limit the scope of the query to that
country's information (this is true in enough real-life situations
that this hierarchical relationship becomes an effective mechanism
for scoping queries and avoiding having to flood the entire network
with every single query or keep full copies of all data in every
server).

Still in theoretical stages, the DAG/IP provides control constructs to
allow DAG components to act according to the topology of the mesh.  A
CAP might use the "polled-by" system command to establish what other
servers in the mesh exist in higher levels (and therefore would be worth
contacting if the scope of the search is to be increased).  In
the example above, a CAP in Country B's system could determine that
Country A's service was polling Country B, and therefore make it
a logical target for expanding the scope of the query.  More
experience (primarily with server mesh topologies) is necessary
before it will be clear how to best make use of these capabilities:

        . should the CAP always broaden the scope? only if there
          are no local referrals? under user direction?
        . should the CAP use a local SAP to contact the remote
          service's CAP?
        . is it better to completely connect the mesh of servers, or
          produce some kind of hierarchy?
        . etc

4.1.4 Other considerations

Depending on the context in which a mesh is established (e.g.,
between national white pages services, or different units of
a corporate organization, etc), it may be useful to allow individual
WDSPs to indicate whether they are willing to have their data included
in a DAG system's aggregated index object (i.e., allowing the DAG
system to receive referrals from other systems in the mesh).


4.2 Security support

There is a need for security considerations when making use of a
wide-scaled directory system in other application areas than
the public white-pages application of the TISDAG project.  There
are issues whether the directory service is distributed across the
Internet, or even if it functions completely within an internal,
closed network.

In the medical area, searching for patient information over multiple
hospitals and other institutions within the health-care community, the
requirements for security are very high. The DAG-system applied to
medical application should only be integrated with the existing security
model already implemented in the health-care community; the use of
DAG in a medical application will have impact on the DAG-system's
security architecture.  Proposals for such a system are presented
elsewhere, but some of the specific requirements are listed below.

In an internal closed hospital network, it is possible to expect
dedicated, application-specific interfaces and protocols.  For
this client software, the specific steps of achieving authorization
and carrying out the transaction should be welded into a single
service interaction. The DAG-system needs support for a uniform
authentication and authorization service interface for facilitating
access control decisions and requesting access control information
about users, roles, organisations.

For example, access control requirements of a medical application
may include:

   - authentication of the user, e.g. doctor
   - authorization, classified by roles for individual users, roles and
     organizations
   - time availability, e.g. time of the day or day of the week
   - encryption of the information
   - required confidentiality/integrity information protection
     based on relation to users, roles and organisations.
   - secure network communications, host properties

Security in updates and CIP index objects is provided by encryption and
signature of objects from registered WDSPs. Using CIP index objects
inherits the security considerations of CIP, for more details see
[CIP1].

4.3 WDSPs attributes and schemas

Today the DAG system makes use of 2 information schemas -- the DAGPERSON
schema for information about specific people, and the DAGORGROLE schema
for organizational roles. The technical specification includes a
definition of the schema, as well as an understood mapping to (and from)
some standard schemas used in the supported protocols.  Nevertheless,
to include new WDSPs which may not have all attributes in schemas, may
use different schemas as well as query attributes, it should be possible
to provide creation and use of new customized/standardlized schemas and
perform schema mapping if it's neccessary. It might also be possible to
constrain queries to desired query attributes, templates, or object
classes.  In practice, this means that different WDSP's may choose to
use different subparts of one defined schema, or even implement local
customizations.


5.0 Other applications of DAG

One of the tests of flexibility of an architecture is to see how
well it stands up when tried in new environments.  In that light,
we present 2 completely different applications in which DAG-like
systems can be considered.

5.1 DAG as applied to medical applicationns

As alluded to above, the need for accurate and complete information is
very large within the health-care community. The information about
patients can not be centralized from all different institutions
that own and administrate the data. Patients do not always face the
same doctor or even go to the same hospital or clinic within the
community. Instead of requiring centralized mirroring of complete
information and to avoid rebuilding the whole structure for information,
the DAG can be used as a gateway which interconnects the different
patient records databases within a hospital or within a geographical
area.

There is some architectural differences to apply DAG to medical
application such as definition of schemas and security considerations.


5.2 Wireless access to DAG

The ability to provide differentiated forms of access to directory
service information will be very important in the future.  Particularly
in the context of wide-area directory deployment (such as white pages
services for companies and operators, but also over multiple operators,
etc) will be very important in the future. Users will demand to access
information from their PC at work or at home, from PDA's and from
cellular phones (wireless).

Wireless Application Protocol -- WAP, as specified in [WAP] and other
forms of access are providing new ways to access the Internet. Cellular
phones that can handle WAP (i.e browser functionality in the phone)
are already  on the market, as well as operators running WAP server
functionality in their cellular phone networks.

The architecture of DAG is very modular and facilitates the addition
of new CAPs, such as a "WAP-CAP" or other supported protocol by WAP
servers.

Using WAP as an access mechanism would enable a person to have access
to their personal address book, a company or operator's directory
service. This information could be used for several other services:
voice calling, click to dial, click to buy etc.  The advantage here is
that the directory service provides a context for the user. This
context, or the search to the directory service, must be limited,
for example to avoid an overgeneralized query and too many hits
for the query.

Providing this type of service will also be useful in the context
of Unified Messaging Services. Unified Messaging services are becoming
more and more used today -- the user recieves an email which is
forwarded to their cellular phone. A wide area directory system, such
as a DAG system, could be used to facilitate things like searches for,
or lookup of the E.164 number for an email adress.  Of specific
interest is that this would allow searches for information over
multiple operators.

Like the TISDAG project's pilot deployment, this section only describes
"read-only" access from different clients to the DAG-system.  In
the longer term, it would be useful to distinguish between reads,
searches/lookups and writes.  This suggests that proper integration
with the Authentication, Authorisation and Accounting (AAA policy)
is needed for the different accesses to DAG. This security issue have to
be addressed again and more about that is provided in a separate
document ([DAG++]).



6.0 Some conclusions

Although fewer people now hold out the hope of a unified global
directory service, based on standardize protocols,  it is interesting
to see more projects providing infrastructure that permits unified
access to what is otherwise an unforgivingly diverse and dislocated
set of information servers.  What cannot be dictated (in standardized
protocols and schemas) may yet be accommodated through service
infrastructure.  The right approach seems to be to build better and
better frameworks for supporting such diversified services, without
making the framework architecture dependent on specific technologies.


7.0 Acknowledgements

This document outlines the perspectives and opinions of the authors,
based on experience as well as many fruitful and enlightening
discussions with others:  Roland Hedberg, Torbjorn Granat, Patrik
Granholm, Rikard Wessblad and Sandro Mazzucato.


8.0 Authors' Addresses

Thommy Eklof
Ericsson
S-126 25 STOCKHOLM
Sweden
Email: thommy.eklof@ericsson.com

Leslie L. Daigle
Thinking Cat Enterprises
Email:  leslie@thinkingcat.com


9.0 References

Request For Comments (RFC) and Internet Draft documents are available
from numerous mirror sites.

        [CIP1]  J. Allen, M. Mealling, "The Architecture of the
                        Common Indexing Protocol (CIP)", RFC 2651, August
                        1999.

        [CIP2]  J. Allen, M. Mealling, "MIME Object Definitions for
                        the Common Indexing Protocol (CIP)", RFC 2652,
                        August 1999.

        [CIP3]  J. Allen, P. Leach, R. Hedberg, "CIP Transport
                        Protocols", RFC 2653, August 1999.

        [DAG++]         L. Daigle, T.Eklof, "An Architecture for Integrated
                        Directory Services, "Internet Draft (work in
                        progress), June 1999

        [TISDAG]        L. Daigle, R. Hedberg "Technical Infrastructure for
                        Swedish Directory Access Gateways (TISDAG)," RFC XXXX,
                      June 1999

        [centroid]      Deutsch, et al., "Architecture of the WHOIS++
                        service", RFC 1835, August 1995.

        [NDD]           R. Hedberg, H. Alvestrand, "Technical Specifica-
                        tion, The Norwegian Directory of Directories
                        (NDD)," Internet Draft (work in progress), May 1999

        [TIO]           R. Hedberg, B. Greenblatt, R. Moats, M. Wahl, "A
                        Tagged Index Object for use in the Common Indexing
                        Protocol", RFC 2654, August 1999.

        [complex]       P.  Panotzki, "Complexity of the Common Indexing
                        Protocol: Predicting Search Times in Index Server
                        Meshes",  Master's Thesis, KTH, September 1996

        [WAP]           The Wireless Application Protocol,
                        http://www.wapforum.org



Appendix -- Specific Software Issues and Deployment Experiences

The following paragraphs outline practical deployment experiences
in an anecdotal fashion.  This is not meant to be construed as
an exhaustive, authoritative evaluation of existing client software,
but rather an indication of the types of challenges the average
implementation team may expect to encounter in a development and
deployment effort.

Character encoding
------------------
One client's addressbook sends iso-8859 encoding (depending on the
font configuration in the browser) when querying a directory server but
the directory server responds with Unicode (UTF-8) encoding. This means
that the LDAP CAP would have to handle different character set
encodings for request and response.

Referrals
---------
Today there appears to be only one commercial addressbook supporting
LDAPv3.  All the others support only LDAPv2.  However, this LDAPv3
client software does not handle referrals correctly -- the client
couldn't handle server the result contains "response code 10"
(designated for referrals).  From what was observed, there was
now way for the client or the end-user to decide if, or which,
referrals to follow-up.   It is therefore not clear how the LDAP
clients handle a combination of both referrals and results  -- but
the supposition is that it doesn't work.

Objectclasses in LDAP
---------------------
No objectclass is defined in the query to the DAG-system from the
LDAP-clients. This means that the DAG-system doesn't see any differences
between "inetOrgPerson" and "organisationalRole" when attribute "cn" is
representing both "name" and "role".  This is not so much a problem
as that it has interesting side effects.  Namely, although most
directory user interfaces (found in browsers, mail programs) claim
only to support person-related queries, in practise a user of the
client could use the interface to send a query with role in the name
entry.

Query with attribute Organisation
---------------------------------
It is possible to send a query with attribute "organisation" but it
would result in no hits because of that the organisation attribute is
not included in the objectclass "inetOrgPerson". Roland Hedberg has
proposed a change for the latest release of the objectclass
definition document.

To provide the desired ability to narrow search focus to some
range of organization names (attribute values), there are three
possible approaches with differing merits/detractions:


        Recommend the use of the "locality" attribute -- although
        a more standard definition would be required (locality is
        currently used for everything from organization to county
        to map coordinates).

        Recommend or require that the attribute organisation should be
        inherited in objectclass "inetOrgPerson".

        Build the LDAP DAG-SAP to submit 2 query to the WDSP. The second
        is the same as the first, with only cn filters if the entire
        query including "o" results in no hits (i.e., back off from
        the organization filtering if it doesn't seem to be supported).

Configuration
-------------
It is not possible to see what character set a LDAP clients want to use.
The recommendation so far in he project has been to define a unique
port for each character set.  This requires extra end-user configuration
of client software, and proper advertizing of the port number-charset
mapping provided in the service.


DN
--
When the user wants to look-up more information about a person found
in a preliminary search, the  LDAP client uses the entry's DN together
with host and port to the DAG system.  Not only does that mean
that the client submits a non-compliant query to the DAG system,
as DNs are not part of any of the defined queries for the service,
it simply does not provide the desired effect of getting to the
user's entry.


Response Codes
--------------
The LDAPv3 client that was used does not support more than 2 response
codes -- "success" and "size limit exceeded". All the other response
codes are translated to "size limit exceeded", although no
results are returned.   That is, if the error was in fact that
the size limit was exceeded, the results up to the size limit are
presented.  If it was another response code mapped to that one,
no results are presented.


Sending and loading CIP Index Objects
-------------------------------------
At least one server is quoting the CIP-object incorrectly for
the Swedish characters A-Ring, A-Umlaut and O-Umlaut. Sending quoted
printable CIP-objects with PINE mail software works.


Source - Labeled URI
--------------------
The original plan for the use of the labeled-URI attribute was
to use it to return a pointer to the WDSP that provided the user
information.  However, the standard use of the labeled-URI attribute,
which may in fact be populated in the data returned by a WDSP, is
to contain the URI for more private related homepages.