Overview of Edge Data Discovery
draft-mcbride-edge-data-discovery-overview-00

Document Type Active Internet-Draft (individual)
Last updated 2018-10-22
Stream (None)
Intended RFC status (None)
Formats plain text xml pdf html bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
T2TRG                                                         M. McBride
Internet-Draft                                               D. Kutscher
Intended status: Standards Track                                  Huawei
Expires: April 25, 2019                                      E. Schooler
                                                                   Intel
                                                           CJ. Bernardos
                                                                    UC3M
                                                        October 22, 2018

                    Overview of Edge Data Discovery
             draft-mcbride-edge-data-discovery-overview-00

Abstract

   This document describes the problem of distributed data discovery in
   edge computing.  Increasing numbers of IoT devices and sensors are
   generating a torrent of data that originates at the very edges of the
   network and that flows upstream, if it flows at all.  Sometimes that
   data must be processed or transformed (transcoded, subsampled,
   compressed, analyzed, annotated, combined, aggregated, etc.) on edge
   equipment along the way, particularly in places where multiple high
   bandwidth streams converge and where resources are limited.  Support
   for edge data analysis is critical to make local, low-latency
   decisions (e.g., regarding predictive maintenance, the dispatch of
   emergency services, identity, authorization, etc.).  In addition,
   (transformed) data may be cached, copied and/or stored at multiple
   locations in the network on route to its final destination.  Although
   the data might originate at the edge, for example in factories,
   automobiles, video cameras, wind farms, etc., as more and more
   distributed data is created, processed and stored, it becomes
   increasingly dispersed throughout the network and there needs to be a
   standard way to find it.  New and existing protocols will need to be
   identified/developed/enhanced for distributed data discovery at the
   network edge and beyond.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any

McBride, et al.          Expires April 25, 2019                 [Page 1]
Internet-Draft             Edge Data Discovery              October 2018

   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 25, 2019.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
   2.  The Edge Data Discovery Scope . . . . . . . . . . . . . . . .   4
     2.1.  Types of Discovery  . . . . . . . . . . . . . . . . . . .   5
   3.  Protocols for Discovering Resources . . . . . . . . . . . . .   6
   4.  Protocols for Discovering Functions . . . . . . . . . . . . .   7
   5.  Naming the Data . . . . . . . . . . . . . . . . . . . . . . .   8
   6.  Edge Data Discovery . . . . . . . . . . . . . . . . . . . . .   8
   7.  Use Cases of edge data discovery  . . . . . . . . . . . . . .   8
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
   10. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . .   9
   11. Normative References  . . . . . . . . . . . . . . . . . . . .   9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   9

1.  Introduction

   Edge computing is an architectural shift that migrates Cloud
   functionality (compute, storage, networking, control, data
   management, etc.) out of the back-end data center to be more
   proximate to the IoT data being generated at the edges of the
   network.  Edge computing provides local compute, storage and
   connectivity services, often required for latency- and bandwidth-
   sensitive applications.  Thus, Edge Computing plays a key role in

McBride, et al.          Expires April 25, 2019                 [Page 2]
Internet-Draft             Edge Data Discovery              October 2018

   verticals such as Energy, Manufacturing, Automotive, Video Analytics,
   Gaming, Healthcare, Mining, Buildings and Smart Cities.

   Edge computing is motivated at least in part by the sheer volume of
   data that is being created by IoT devices (sensors, cameras, lights,
   vehicles, drones, wearables, etc.) at the very network edge and that
   flows upstream, in a direction for which the network was not
   originally provisioned.  In fact, in dense IoT deployments (e.g.,
   many video cameras are streaming high definition video), where
   multiple data flows collect or converge at edge nodes, data is likely
   to need transformation (transcoded, subsampled, compressed, analyzed,
   annotated, combined, aggregated, etc.) to fit over the next hop link,
   or even to fit in memory or storage.  Note also that the act of
   performing compute on the data creates yet another new data stream!
   In addition, (transformed) data may be cached, copied and/or stored
   at multiple locations in the network on route to its final
   destination.  With an increasing percentage of devices connecting to
   the Internet being mobile, support for in-the-network caching and
   replication is critical for continuous data availability, not to
   mention efficient network and battery usage for endpoint devices.
   Additionally, as mobile devices' memory/storage fill up, in an edge
   context they may have the ability to offload their data to other
   proximate devices or resources, leaving a bread crumb trail of data
   in their wakes.  Therefore, although data might originate at edge
   devices, as more and more data is continuously created, processed and
   stored, it becomes increasingly dispersed throughout the physical
   world (outside of or scattered across managed local data centers),
   increasingly isolated in separate local edge clouds or data silos.
   Thus there needs to be a standard way to find it.  New and existing
   protocols will need to be identified/developed/enhanced for these
   purposes.  Being able to discover distributed data at the edge or in
   the middle of the network - will be an important component of Edge
   computing.

   An IETF T2T RG Edge discussion was held and a comparative study on
   the definition of Edge computing was presented in multiple sessions
   in T2T RG this last year.  An IETF BEC (beyond edge computing) effort
   has been evaluating potential gaps in existing edge computing
   architectures.  Edge Data Discovery is one potential gap that needs
   evaluation and a solution.

   And businesses, such as industrial companies, are starting to
   understand how valuable the data is that they've kept in silo's.
   Once this data is able to be aggregated on edge computing platforms,
   they will be able to monetize the value of the data.  But this will
   happen only if data can be discovered and searched among equipment in
   a standard way.  Discovering the data, that its most useful to a
   given market segment, will be extremely useful in building business

McBride, et al.          Expires April 25, 2019                 [Page 3]
Internet-Draft             Edge Data Discovery              October 2018

   revenues.  Having a mechanism to provide this granular discovery is
   the problem that needs solving either with existing, or new,
   protocols.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

1.2.  Terminology

   o  Edge: The device edge is the boundary between digital and physical
      entities in the last mile network.  Sensors, gateways, compute
      nodes are included.  The infrastructure edge includes equipment on
      the network operator side of the last mile network including cell
      towers, edge data centers, cable headends, etc.

   o  Edge Computing: distributed computation that is performed near the
      edge, where the nearness is determined by the system requirements.
      This includes high performance compute, storage and network
      equipment on either the device or infrastructure edge.

   o  Data Discovery: process of finding required data from edge
      databases and consolidating it into a single source, perhaps name,
      that can be evaluated

   o  NDN: Named Data Networking.  IP packets name information, content
      or endpoints (IP addresses) at the network layer.

2.  The Edge Data Discovery Scope

   Edge Computing data will typically be found at the device or
   infrastructure edges.  This is where we are focusing our efforts in
   defining this edge data discovery problem space.  Edge data will also
   be sent to the cloud as needed.  Discovering data which has be sent
   to the cloud is out of scope of this document.

McBride, et al.          Expires April 25, 2019                 [Page 4]
Internet-Draft             Edge Data Discovery              October 2018

                     +-------------------------------+
                     |   Core Data Center            |
                     +-------------------------------+
                              ***   Backbone
                             *   *  Network
                              ***
                     +-------------------------------+
                     |   Regional Data Center        |
                     +-------------------------------+
                              ***   Metropolitan
                             *   *  Network
                              ***
                     +-------------------------------+
                     | Infrastructure Edge|
                     +-------------------------------+
                              ***   Access
                             *   *  Network
                              ***
                     +-------------------------------+
                     |          |Device Edge
                     +-------------------------------+

                    Figure 1: Edge Data Discovery Scope

2.1.  Types of Discovery

   There are many aspects of discovery.

   Discovery of new devices added to an environment.  Discovery of their
   capabilities/services in client/server environments.  Discovery of
   these new devices automatically.  Discovering a device and then
   synchronizing the device inventory and configuration for edge
   services.  There are many existing protocols to help in this
   discovery: UPnP, mDNS, DNS-SD, SSDP, NFC, XMPP, W3C network service
   discovery, etc.

   Edge devices discover each other in a standard way.  We can use DHCP,
   SNMP, SMS, COAP, LLDP, and routing protocols such as OSPF for devices
   to discovery one another.

   Discovery of link state and traffic engineering data/services by
   external devices.  BGP-LS is one solution.

   There is discovery of aggregated data on edge compute device, which
   is the focus of this draft.  How can we discover aggregated data on
   the edge and make use of it.

McBride, et al.          Expires April 25, 2019                 [Page 5]
Internet-Draft             Edge Data Discovery              October 2018

   Besides sensor data being aggregated on the edge computing
   infrastructure, there will also be streaming data (from a camera),
   meta data (about the data or about the device that generated the data
   or about the context, etc), or control data regarding an event that
   triggered, or an executable that embodies a function, method or
   service, or other piece of code or algorithm.  And it could be new
   data that is created after (multiple) streams converge at the edge
   node and are processed/transformed in some manner.

   Discovery of functions in an SFC environment: Service function
   chaining (SFC) allows the instantiation of an ordered set of service
   functions and subsequent "steering" of traffic through them.  Service
   functions provide an specific treatment of received packets,
   therefore they need to be known so they can be used in a given
   service composition via SFC.  So far, how the SFs are discovered and
   composed has been out of the scope of discussions in IETF.  While
   there are some mechanisms that can be used and/or extended to provide
   this functionality, work needs to be done.  An example of this can be
   found in "I-D.bernardos- sfc-discovery".

   Discovery of resources in an NFV environment: virtualized resources
   do not need to be limited to those available in traditional data
   centers, where the infrastructure is stable, static, typically
   homogeneous and managed by a single admin entity.  Computational
   capabilities are becoming more and more ubiquitous, with terminal
   devices getting extremely powerful, as well as other types of devices
   that are close to the end users at the edge (e.g., vehicular onboard
   devices for infotainment, micro data centers deployed at the edge,
   etc.).  It is envisioned that these devices would be able to offer
   storage, computing and networking resources to nearby network
   infrastructure, devices and things (the fog paradigm).  These
   resources can be used to host functions, for example to offload/
   complement other resources available at traditional data centers, but
   also to reduce the end-to- end latency or to provide access to
   specialized information (e.g., context available at the edge) or
   hardware.  Similarly to the discovery of functions, while there are
   mechanisms that can be reused/extended, there is no complete solution
   yet defined.  An example of work in this area is I-D.bernardos-
   intarea-vim-discovery"

3.  Protocols for Discovering Resources

   Mainly two types of situations need to be covered:

   1.  A set of resources appears (e.g., by a mobile node hosting them
       joining a network) and they have to be discovered by an existing
       virtualization infrastructure.

McBride, et al.          Expires April 25, 2019                 [Page 6]
Internet-Draft             Edge Data Discovery              October 2018

   2.  A mobile device wants to discover virtualization resources
       available at the current location.

   Different alternatives of protocols can be used for this: from
   approaches coupled with the access technology used, to solutions over
   the top such as UPnP, mDNS, DNS-SD, SSDP, also including solutions
   embedded into IP discovery/autoconfiguration, such as Neighbor
   Discovery or DHCP.

4.  Protocols for Discovering Functions

   In an SFC environment deployed at the edge, the discovery protocol
   may need to make available the following information per SF:

   o  Service Function Type, identifying the category of SF provided.

   o  SFC-aware: Yes/No.  Indicates if the SF is SFC-aware.

   o  Route Distinguisher (RD): IP address indicating the location of
      the SF(I).

   o  Pricing/costs details.

   o  Migration capabilities of the SF: whether a given function can be
      moved to another provider (potentially including information about
      compatible providers topologically close).

   o  Mobility of the device hosting the SF, with e.g. the following
      sub- options:

         Level: no, low, high; or a corresponding scale (e.g., 1 to 10).

         Current geographical area (e.g., GPS coordinates, post code).

         Target moving area (e.g., GPS coordinates, post code).

   o  Power source of the device hosting the SF, with e.g. the following
      sub- options:

         Battery: Yes/No.  If Yes, the following sub-options could be
         defined:

         Capacity of the battery (e.g., mmWh).

         Charge status (e.g., %).

         Lifetime (e.g., minutes).

McBride, et al.          Expires April 25, 2019                 [Page 7]
Internet-Draft             Edge Data Discovery              October 2018

5.  Naming the Data

   Named Data Networking (NDN) is one of five research projects funded
   by the U.S.  National Science Foundation under its Future Internet
   Architecture Program.  NDN has its roots in an earlier project,
   Content-Centric Networking (CCN), which Van Jacobson started at Xerox
   PARC around the time of his Google talk, to turn his architecture
   vision into a running prototype (see also his CoNEXT 2009 paper and
   especially Jacobsons ACM Queue interview).  The motivation is the
   mis-match of todays Internet architecture and its usage.  Today we
   build, support, and use Internet applications and services on top of
   an extremely capable architecture not designed to support them.  What
   if we had an architecture designed to support them?  Specifically,
   todays IP packets can name only endpoints of conversations (IP
   addresses) at the network layer.  What if we generalize this layer to
   name any information (or content), not just endpoints?  We make it
   easier to develop, manage, secure, and use our networks.  NDN can be
   applied to edge data discovery to make it much easier to extract data
   by naming it.  If data was named we would be able to discover the
   appropriate data simply by its name.

6.  Edge Data Discovery

   How can we discover aggregated data on the edge and make use of it?
   There are proprietary implementations of collecting data from various
   databases and consolidating it for evaluation.  We need a standard
   protocol set for doing this data discovery, on the device or
   infrastructure edge, in order to meet the requirements of many use
   cases.  We will have terabytes of data on the edge and need a way to
   identify its existance and find the desired data.  A user requires
   the need to search for specific data in a data set and evaluate it
   using their own tools.  The tools are outside the scope of this
   document, but the discovery of that data is in scope.

7.  Use Cases of edge data discovery

   1.  Autonomous Vehicles

   Description: Autonomous vehicles rely on the processing of huge
   amounts of complex data in real-time for fast and accurate decisions.
   These vehicles will rely on high performance compute, storage and
   network resources to process the volumes of data they produce in a
   low latency way.  Various systems will need a standard way to
   discover the pertinent data for decision making

   1.  Video Surveillance

McBride, et al.          Expires April 25, 2019                 [Page 8]
Internet-Draft             Edge Data Discovery              October 2018

   Description: The majority of the video surveillance footage will
   remain at the edge infrastructure (not sent to the cloud data
   center).  This footage is coming from vehicles, factories, hotels,
   universities, farms, etc.Much of the video footage will not be
   interesting to those evaluating the data.  A mechanism, set of
   protocols perhaps, is needed to identify the interesting data at the
   edge.  The data will be in storage systems or in flight in networking
   equipment.

   1.  Elevator Networks

   Description: Elevators are one of many industrial applications of
   edge computing.  Edge equipment receives data from 100's of elevator
   sensors.  The data coming into the edge equipment is vibration,
   temperature, speed, level, video, etc.  We need the ability to
   identify where the data we need to evalute is located.

8.  IANA Considerations

   N/A

9.  Security Considerations

   Security considerations will be a critical component of edge data
   discovery particularly as intelligence is moved to the extreme edge
   where data is to be extracted.

10.  Acknowledgement

11.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

Authors' Addresses

   Mike McBride
   Huawei

   Email: michael.mcbride@huawei.com

   Dirk Kutscher
   Huawei

   Email: dirk.kutscher@huawei.com

McBride, et al.          Expires April 25, 2019                 [Page 9]
Internet-Draft             Edge Data Discovery              October 2018

   Eve Schooler
   Intel

   Email: eve.m.schooler@intel.com

   Carlos J. Bernardos
   Universidad Carlos III de Madrid
   Av. Universidad, 30
   Leganes, Madrid  28911
   Spain

   Phone: +34 91624 6236
   Email: cjbc@it.uc3m.es
   URI:   http://www.it.uc3m.es/cjbc/

McBride, et al.          Expires April 25, 2019                [Page 10]