Skip to main content

Live Media and Metadata Ingest Protocol
draft-mekuria-mmediaingest-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Expired".
Authors
Last updated 2018-05-07
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-mekuria-mmediaingest-00
Internet Engineering Task Force             R. Mekuria
Internet-Draft                              Unified Streaming B.V.
Expires: November 7th, 2018                 

Intended status: Best Current Practice      May 7, 2018

          Live Media and Metadata Ingest Protocol
             draft-mekuria-mmediaingest-00.txt

Abstract

   This Internet draft presents a protocol specification for 
   ingesting live media and metadata content from a 
   live media source such as a live encoder towards a media 
   processing entity or content delivery network. 
   It defines the media format usage, the preferred transmission 
   methods and the handling of failovers and redundancy. 
   The live media considered includes high quality encoded 
   audio visual content. The timed metadata supported 
   includes timed graphics, captions, subtitles and 
   metadata markers and information. This protocol can 
   for example be used advanced live streaming workflows 
   that combine high quality live encoders and advanced 
   media processing entities. The specification follows
   best current industry practice.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as 
   reference material or to cite them other than as "work in progress."

   
   
   
   
   
   
   
   
   <Mekuria>          Expires November 7 2018                [Page1]
   



   
Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.
   
   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction
   2.  Conventions and Terminology
   3.  Media Ingest Protocol Behavior
   4.  Formatting Requirements for Timed Text, Captions and Subtitles
   5.  Formatting Requirements for Timed Metadata Markers
   6.  Guidelines for Handling of Media Processing Entity Failover
   7.  Guidelines for Handling of Live Media Source Failover
   8.  Security Considerations
   9.  IANA Considerations
   10. Contributors
   11. References
     11.1.  Normative References
     11.2.  Informative References
     11.3.  URL References
   Author's Address
 
1.  Introduction

   This specification describes a protocol for media ingest from 
   a live source (e.g. live encoder) towards media processing 
   entities. Examples of media processing entities
   include media packagers, publishing points, streaming origins, 
   content delivery networks and others. In particular, we 
   distinguish active media processing entities and passive media 
   processing entities. Active media processing entities perform 
   media processing such as encryption, packaging, changing (parts of)
   the media content and deriving additional information. Passive 
   media processing entities provide pass through functionality 
   and/or delivery and caching functions that do not alter the media 
   content itself. An example of a passive media processing entity
   could be a content delivery network (CDN) that provides
   functionalities for the delivery of the content. 
   An example of an active media processing entity could 
   be a just-in-time packager or a just in time transcoder.

  
     <Mekuria>          Expires November 7 2018                [Page2]


     
   
   Diagram 1: Example workflow with media ingest
   Live Media Source -> Media processing entity -> CDN -> End User
   
   Diagram 1 shows the workflow with a live media ingest from a 
   live media source towards a media processing entity. The media 
   processing entity provides additional processing such as 
   content stitching, encryption, packaging, manifest generation, 
   transcoding etc. Such setups are beneficial for advanced 
   media delivery. The ingest described in this draft includes 
   the latest technologies and standards used in the industry 
   such as timed metadata, captions, timed text and encoding 
   standards such as HEVC [HEVC]. The media ingest protocol 
   specification and associated requirements were discussed 
   with stakeholders, including broadcasters, live encoder vendors,
   content delivery networks, telecommunications companies 
   and cloud service providers. While this draft specification 
   has also been extensively discussed and reviewed by these 
   stakeholders representing current best practices.
   Nevertheless,   this current draft solely reflects the 
   point of view of the authors of this draft taking received 
   feedback from these stakeholders into account. Some 
   insights on the discussions leading to this draft 
   can be found on [fmp4git].
   
2.  Conventions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119].
   
   This specification uses the following additional terminology.
   ISOBMFF: the ISO Base Media File Format specified in [ISOBMFF].
   ftyp: the filetype and compatibility "ftyp" box as described 
         in the ISOBMFF [ISOBMFF] that describes the "brand" 
   moov: the container box for all metadata "moov" described in the 
         ISOBMFF base media file format [ISOBMFF]
   moof: the movie fragment "moof" box as described in the 
         ISOBMFF  base media file format [ISOBMFF] that describes 
         the metadata of a fragment of media.
   mdat: the media data container "mdat" box contained in 
         an ISOBMFF [ISOBMFF], this box contains the 
         compressed media samples
   kind: the track kind box defined in the ISOBMFF [ISOBMFF] 
         to label a track with its usage
   mfra: the movie fragment random access "mfra" box defined in 
         the ISOBMFF [ISOBMFF] to signal random access samples 
         (these are samples that require no prior 
         or other samples for decoding) [ISOBMFF].
   tfdt: the TrackFragmentDecodeTimeBox box "tfdt" 
         in the base media file format [ISOBMFF] used
         to signal the decode time of the media 
         fragment signalled in the moof box.    
         
<Mekuria>          Expires November 7 2018                [Page3]


    
 
 
   mdhd: The media header box "mdhd" as defined in [ISOBMFF], 
         this box contains information about the media such 
         as timescale, duration, language using ISO 639-2/T codes
         [ISO639-2]      

   pssh: The protection specific system header "pssh" box defined 
         in [CENC] that can be used to signal the content protection 
         information according to the MPEG Common Encryption (CENC)
   sinf: Protection scheme information box "sinf" defined in [ISOBMFF] 
         that provides information on the encryption 
         scheme used in the file
   elng: extended language box "elng" defined in [ISOBMFF] that 
         can override the language information
   nmhd: The null media header Box "nmhd" as defined in [ISOBMFF] 
         to signal a track for which no specific 
         media header is defined, often used for metadata tracks
   HTTP: Hyper Text Transfer Protocol, 
                version 1.1 as specified by [RFC2626]
   HTTP POST: Command used in the Hyper Text Transfer Protocol for 
              sending data from a source to a destination [RFC2626]
   fragmentedMP4stream: stream of [ISOBMFF] fragments 
           (moof and mdat) see page 5 for definition
   POST_URL: Target URL of a POST command in the HTTP protocol 
             for pushing data from a source to a destination.
   TCP: Transmission Control Protocol (TCP) as defined in [RFC793]
   URI_SAFE_IDENTIFIER: identifier/string 
          formatted according to [RFC3986]
   Connection: connection setup between a host and a source.
   Live stream event: the total media broadcast stream of the ingest.
   (Live) encoder: entity performing live encoding and producing 
   a high quality encoded stream, can serve as Media ingest source
   (Media) Ingest source: a media source ingesting media content 
   , typically a live encoder but not restricted to this, 
   the media ingest source could by any type of media ingest 
   source such as a stored file that is send in partial chunks
   Publishing point: entity used to publish the media content, 
   consumes/receives the incoming media ingest stream
   Media processing entity: entity used to process media content, 
   receives/consumes a media ingest stream.
   Media processing function: Media processing entity   

3.  Media Ingest Protocol Behavior

   The specification uses multiple HTTP POST and/or PUT requests 
   to transmit an optional manifest followed by encoded media data 
   packaged in fragmented [ISOBMFF]. The subsequent posted segments 
   correspond to those decribed in the manifest. Each HTTP POST sends 
   a complete manifest or media segment towards the processing entity. 
   The sequence of POST commands starts with the manifest and init 
   segments that includes header boxes (ftyp and moov boxes). 
   It continues with the sequence of segments 
   (combinations of moof and mdat boxes).
  
<Mekuria>          Expires November 7 2018                [Page4]


   

   An example of a POST URL 
   targeting the publishing point is:
   http://HostName/presentationPath/manifestpath
   /rsegmentpath/Identifier

   The PostURL the syntax is defined as follows using the 
   IETF RFC 5234 ANB [RFC5234] to specify the structure.

   PostURL = Protocol ://BroadcastURL Identifier
   Protocol = "http" / "https"
   BroadcastURL = HostName "/" PresentationPath
   HostName = URI_SAFE_IDENTIFIER
   PresentationPath = URI_SAFE_IDENTIFIER
   ManifestPath = URI_SAFE_IDENTIFIER
   Rsegmentpath = URI_SAFE_IDENTIFIER
   Identifier = segment_file_name
   
   In this PostURL the HostName is typically the hostname of the 
   media processing entity or publishing point. The presentation path 
   is the path to the specific presentation at the publishing point. 
   The manifest path can be used to signal the specific manifest of 
   the presentation. The rsegmentpath can be a different optional 
   extended path based on the relative paths in the manifest file. 
   The identifier describes the filename of the segment as described 
   in the manifest. The live source sender first sends the manifest 
   to the path http://hostname/presentationpath/ allowing 
   the receiving entity to setup reception paths for the following 
   segments and manifests. In case no manifest is used any POST_URL
   setup for media ingest such as http://hostname/presentationpath/
   can be used. The fragmentedMP4stream can be defined
   using the IETF RFC 5234 ANB [RFC5234] as follows.
   
   fragmentedMP4stream = headerboxes fragments
   headerboxes = ftyp moov
   fragments = X fragment
   fragment = Moof Mdat
    
   The communication between the live encoder/media ingest source 
   and the receiving media procesing entity follows the following 
   requirements. 
   
   1. The live encoder or ingest source communicates to 
      the publishing point/processing entity using the HTTP 
      POST method as defined in the HTTP protocol [RFC2626],
      or in the case for manifest updates the HTTP PUT Method. 
   2. The live encoder or ingest source SHOULD start
      by sending an HTTP POST request with an empty "body"
      (zero content length) by using the same POSTURL. 
      This can help the live encoder or media 
      ingest source to quickly detect whether the 
      live ingest publishing point is valid, 
      and if there are any authentication or other conditions required. 
      
<Mekuria>          Expires November 7 2018                [Page5]


     

      
   3. The live encoder/media source SHOULD use secured 
       transmission using HTTPS protocol 
      as specified in [RFC2818] for connecting  
       to the receiving media processing entity 
      or publishing point. 
   4. In case HTTPS protocol is used, 
      basic authentication HTTP AUTH [RFC7617] 
      or better methods like 
      TLS client certificates SHOULD be used to 
      secure the connection.  
   5. As compatibility profile for the TLS encryption 
      we recommend the mozzilla 
      intermediate compatibility profile which is supported 
      in many available implementations [MozillaTLS]. 
   6. Before sending the segments 
      based on fragmentedMP4Stream the live encoder/source
      MAY send a manifest 
      with the following the limitations/constraints.
   6a. Only relative URL paths to be used for each segment
   6b. Only unique paths are used for each new presentation
   6c. In case the manifest contains these relative paths, 
      these paths MAY be used in combination with the 
      POST_URL + relative URLs 
      to POST each of the different segments from 
      the live encoder or ingest source 
      to the processing entity. 
   6d. In case the manifest contains no relative paths, 
      or no manifest is used the 
      segments SHOULD be posted to the original 
      POST_URL specified by the service. 
   6e. In this case the tdft and trackids MAY 
       be used by the processing entity 
       to distinguish incoming segments 
       instead of the target POST_URL.

  7. The live encoder MAY send an updated version of the manifest, 
     this manifest cannot override current settings and relative 
     paths or break currently running and incoming POST requests. 
     The updated manifest can only be slightly different from 
     the one that was send previously, e.g. introduce new segments 
     available or event messages. The updated manifest SHOULD be 
     send using a PUT request instead of a POST request.
         
     Note: this manifest will be useful for passive media processing 
           entities mostly, for ingest towards active media processing 
           entities this manifest could be avoided and information 
           is signalled through the boxes available in the ISOBMFF.
    
  8. The encoder or ingest source MUST handle any error or failed 
     authentication responses received from the media processing 
     entity such as 403 (forbidden), 400 bad request, 415 
     unsupported media type, 412 not fulfilling conditions
         
<Mekuria>          Expires November 7 2018                [Page6]


 

  9. In case of a 412 not fullfilling conditions or 415 
      unsupported media type, 
      the live source/encoder MUST resend the init segment 
      consisting of a "moov" and "ftyp" box.
  10. The live encoder or ingest source SHOULD start 
      a new HTTP POST segment request sequence with the 
      init segment including header boxes "ftyp" and "moov"
  11. Following media segment requests SHOULD be corresponding 
      to the segments listed in the manifest if a manifest was sent. 
  12. The payload of each request MAY start with the header boxes 
      "ftyp" and "moov", followed by segments which consist of 
      a combination of "moof" and "mdat" boxes. 
      
      Note that the "ftyp", and "moov" boxes (in this order) MAY be 
      transmitted with each request, especially if the encoder must 
      reconnect because the previous POST request was terminated 
      prior to the end of the stream with a 412 or 
      415 message. Resending the "moov" and "ftyp" boxes 
      allows the receiving entitity to recover the init segment 
      and the track information needed for interpreting the content.
  13. The encoder or ingest source MAY use chunked transfer 
      encoding option of the HTTP POST command [RFC2626] for uploading 
      as it might be difficult to predict the entire content length 
      of the segment. This can be used for example to support use 
      cases that require low latency.
  14. The encoder or ingest source SHOULD use individual HTTP POST 
      commands [RFC2626] for uploading media segments when ready. 
  15. If the HTTP POST request terminates or times out with a TCP 
      error prior to the end of the stream, the encoder MUST issue 
      a new POST request by using a new connection, and follow the 
      preceding requirements. Additionally, the encoder MAY resend 
      the previous two segments that were already sent again.
  16. In case fixed length POST Commands are used, the live source 
      entity MUST resend the segment 
      to be posted decribed in the manifest entirely 
      in case of responses HTTP 400, 412 or 415 together 
      with the init segment consisting of "moov" and "ftyp" boxes.
  17. In case the live stream event is over the live media 
      source/encoder should signal 
      the stop by transmitting an empty "mfra" box 
      towards the publishing point/processing entity
  18. The trackFragmentDecodeTime box "tfdt" box
      MUST be present for each segment posted.
  19. The ISOBMFF media fragment duration SHOULD be constant, 
      to reduce the size of the client manifests. 
      A constant MPEG-4 fragment duration also improves client 
      download heuristics through the use of repeat tags. 
      The duration MAY fluctuate to compensate 
      for non-integer frame rates. By choosing an appropriate 
      timescale (a multiple of the frame rate is recommended) 
      this issue can be avoided.
      
<Mekuria>          Expires November 7 2018                [Page6]


 

  20. The MPEG-4 fragment duration SHOULD be between 
      approximately 2 and 6 seconds.
  21. The fragment decode timestamps "tfdt" of fragments in the 
      fragmentedMP4stream and the indexes base_media_decode_ time 
      SHOULD arrive in increasing order for each of the different 
      tracks/streams that are ingested.
  22. The segments formatted as fragmented MP4 stream SHOULD use 
      a timescale for video streams based on the framerate 
      and 44.1 KHz or 48 KHz for audio streams 
      or any another timescale that enables integer 
      increments of the decode times of 
      fragments signalled in the "tfdt" box based on this scale.
  23. The manifest MAY be used to signal the language of the stream, 
      which SHOULD also be signalled in the "mdhd" box or "elng" boxes
      in the init segment and/or moof headers ("mdhd")
  24. The manifest SHOULD be used to signal encryption specific 
      information, which SHOULD also be signalled in the "pssh", 
      "schm" and "sinf" boxes in the segments of 
      the init segment and media segments
  25. The manifest SHOULD be used to signal information 
      about the different 
      tracks such as the durations, media encoding types, 
      content types, which SHOULD also be signalled in the 
      "moov" box in the init segment or the "moof" box 
      in the media segments
  26. The manifest SHOULD be used to signal information 
      about the timed text, images and sub-titles in adaptation 
      sets and this information SHOULD also be signalled 
      in the "moov" box in the init segment, 
      for more information see the next section.
  27. Segments posted towards the media procesing entity MUST contain 
      the bitrate "btrt" box specifying the target bitrate of 
      the segments and the "tfdt" box specifying the fragments 
      decode time and the "tfhd" box specifying the track id.
  28. The live encoder/media source SHOULD repeatedly resolve 
      the Hostname to adapt to changes in the IP to Hostname mapping
      such as for example by using the dynamic naming system 
      DNS [RFC1035] or any other system that is in place.
  29. The Live encoder media source MUST update the IP to hostname 
      resolution respecting the TTL (time to live) from DNS 
      query responses, this will enable better resillience 
      to changes of the IP address in large scale deployments 
      where the IP adress of the publishing point media 
      processing nodes may change frequenty.
  30. To support the ingest of live events with low latency, 
      shorter segment and fragment durations MAY be used 
      such as segments with a duration of 1 second.
  31. The live encoder/media source SHOULD use a separate TCP
      connection for ingest of each different bit-rate 
      tracks ingested 

      
      
<Mekuria>          Expires November 7 2018            [Page8]


  

      
4. Formatting Requirements for Timed Text, Captions and Subtitles

The specification supports ingest of timed text, 
images, captions and subtitles. we follow the normative 
reference [MPEG-4-30] in this section. 

  1. The tracks containing timed text, images, captions 
  or subtitles MAY be signalled in the manifest by 
  an adaptationset with the different segments 
  containing the data of the track.  
  2. The segment data MAY be posted to the URL 
  corresponding to the path in the manifest for the segment, 
  else they MUST be posted towards the original POST_URL  
  3. The track will be a sparse track signalled by a null media header 
  "nmhd" containing the timed text, images, captions corresponding 
  to the recommendation of storing tracks in fragmented MPEG-4 [CMAF]
  4. Based on this recommendation the trackhandler "hdlr" shall 
  be set to "text" for WebVTT and "subt" for TTML 
  5. In case TTML is used the track must use the XMLSampleEntry 
  to signal sample description of the sub-title stream 
  6. In case WebVTT is used the track must use the WVTTSampleEntry 
  to signal sample description of the text stream
  7. These boxes SHOULD signal the mime type and specifics as 
  described in [CMAF] sections 11.3 ,11.4 and 11.5
  8. The boxes described in 3-7 must be present in the init 
  segment ("ftyp" + "moov") for the given track
  9. subtitles in CTA-608 and CTA-708 can be transmitted 
  following the recommendation section 11.5 in [CMAF] via 
  SEI messages in the video track
  10. The "ftyp" box in the init segment for the track 
      containing timed text, images, captions and sub-titles 
      can use signalling using CMAF profiles based on [CMAF] 
   
   10a. WebVTT   Specified in 11.2 ISO/IEC 14496-30 
        [MPEG-4-30] 'cwvt'
   10b.TTML IMSC1 Text  Specified in 11.3.3 [MPEG-4-30] 
       IMSC1 Text Profile   'im1t'
   10c.TTML IMSC1 Image Specified in 11.3.4 [MPEG-4-30] 
       IMSC1 Image Profile  'im1i'
   10d. CEA  CTA-608 and CTA-708 Specified in 11.4 [MPEG-4-30] 
       Caption data is embedded in SEI messages in video track; 
      'ccea'
   11. The segments of the tracks containing Timed Text, Images,
       Captions and Sub-titles SHOULD use the bit-rate box "btrt" to 
       signal bit-rate of the track in each segment. 
 

 
 <Mekuria>          Expires November 7 2018                [Page9]


  

5. Formatting Requirements for Timed Metadata

  This section discusses the specific formatting requirements 
  for ingest of timed metadata related to events and markers for 
  ad- insertion or other timed metadata relating to the media
  content such as information about the content. 
  When delivering a live streaming presentation with a rich 
  client experience, often it is necessary to transmit time-synced 
  events, metadata or other signals in-band with the main 
  media data. An example of these are opportunities for dynamic 
  live ad insertion signalled by SCTE-35 markers. This type of 
  event signalling is different from regular audio/video streaming 
  because of its sparse nature. In other words, the signalling data 
  usually does not happen continuously, and the interval can 
  be hard to predict. Examples of timed metadata are ID3 tags 
  [ID3v2], SCTE-35 markers [SCTE-35] and DASH emsg 
  messages defined in section 5.10.3.3 of [DASH]. For example, 
  DASH Event messages contain a schemeIdUri that defines 
  the payload of the message. Table 1 provides some 
  example schemes in DASH event messages and Table 2 
  illustrates an example of a SCTE-35 marker stored 
  in a dash emsg. The presented approach allows ingest of 
  timed metadata from different sources, 
  possibly on different locations by embedding them in 
  sparse metadata tracks.

Table 1 Example of DASH emsg schemes  URI

Scheme URI               | Reference
-------------------------|------------------
urn:mpeg:dash:event:2012 | [DASH], 5.10.4
urn:dvb:iptv:cpm:2014    | [DVB-DASH], 9.1.2.1 
urn:scte:scte35:2013:bin | [SCTE-35] 14-3 (2015), 7.3.2
www.nielsen.com:id3:v1   | Nielsen ID3 in MPEG-DASH

Table 2 example of a SCTE-35 marker embedded in a DASH emsg
Tag                     |          Value
------------------------|-----------------------------
scheme_uri_id           | "urn:scte:scte35:2013:bin"
Value                   | the value of the SCTE 35 PID
Timescale               | positive number
presentation_time_delta | non-negative number expressing splice time
                        | relative  to tfdt 
event_duration          | duration of event
                        | "0xFFFFFFFF" indicates unknown duration
Id                      | unique identifier for message
message_data            | splice info section including CRC

<Mekuria>          Expires November 7 2018                [Page10]


  

  The following steps are recommended for timed metadata 
  ingest related to events, tags, ad markers and 
  program information:
  1. Create a fragmentedMP4stream that contains only a sparse 
   metadata track which are tracks without audio/video.
  2. Metadata tracks MAY be signalled in a manifest using an 
   adaptationset with a sparse track, the actual data 
   is in the sparse media track in the segments.
  3. For a metadata track the media handler type is "meta" 
   and the tracks handler box is a null media header box "nmhd".
  4. The URIMetaSampleEntry entry contains, in a URIbox, 
     the URI following the URI syntax in [RFC3986] defining the form 
     of the metadata 

     (see the ISO Base media file format specification [ISOBMFF]).    
     For example, the URIBox could contain for ID3 tags  [ID3v2] 
     the URL  http://www.id3.org
  5. For the case of ID3, a sample contains a single ID3 tag. 
     The ID3 tag may contain one or more ID3 frames.
  6. For the case of DASH e-msg, a sample may contain 
     one or more event message ("emsg") boxes.  
     Version 0 Event Message SHOULD be used. 
     The presentation_time_delta field is relative to the absolute 
     timestamp specified in the TrackFragmentBaseMediaDecode-TimeBox 
    ("tfdt"). The timescale field should match the value specified 
     in the media header box "mdhd".
  7. For the case of a DASH e-msg, the kind box 
     (contained in the udta) MUST be used to signal 
     the scheme URI of the type of metadata
  8. A BitRateBox ("btrt") SHOULD be present at the end of 
     MetaDataSampleEntry to signal the bit rate information 
     of the stream.
  9. If the specific format uses internal timing values, 
     then the timescale must match the timescale field set 
     in the media header box "mdhd".
  10. All Timed Metadata samples are sync samples [ISOBMFF], 
    defining the entire set of metadata for the time interval 
    they cover. Hence, the sync sample table box is not present.
  11.   When Timed Metadata is stored in a TrackRunBox ("trun"), 
    a single sample is present with the duration set to the 
    duration for that run.
  
  Given the sparse nature of the signalling event, the following 
  is recommended:
  12. At the beginning of the live event, the encoder or 
      media ingest  source sends the initial header boxes to 
      the processing entity/publishing point, 
      which allows the service to register the sparse track.
  13. When sending segments, the encoder SHOULD start sending 
      from the header boxes, followed by the new fragments. 

      
        
 <Mekuria>          Expires November 7 2018                 [Page11]


  

      
  14. The sparse track segment becomes available to the 
     publishing  point/processing entity when the corresponding 
     parent track fragment that has an equal or larger timestamp 
     value is made available. For example, if the sparse fragment 
     has a timestamp of t=1000, it is expected that after the 
     publishing point/processing entity sees "video" 
    (assuming the parent track name is "video") 
    fragment timestamp 1000 or beyond, it can retrieve the 
    sparse fragment t=1000. Note that the actual 
    signal could be used for a different position 
    in the presentation timeline for its designated purpose. 
    In this example, it is possible that the sparse fragment 
    of t=1000 has an XML payload, which is for inserting 
    an ad in a position that is a few seconds later.
  15.   The payload of sparse track fragments can be in 
    different formats (such as XML, text, or binary), 
    depending on the scenario

6. Guidelines for Handling of Media Processing Entity Failover

  Given the nature of live streaming, good failover support is 
  critical for ensuring the availability of the service. 
  Typically, media services are designed to handle various types 
  of failures, including network errors, server errors, and storage 
  issues. When used in conjunction with proper failover 
  logic from the live encoder side, customers can achieve 
  a highly reliable live streaming service from the cloud. 
  In this section, we discuss service failover scenarios. 
  In this case, the failure happens somewhere within the service, 
  and it manifests itself as a network error. Here are some 
  recommendations for the encoder implementation for handling 
  service failover:
  1.    Use a 10-second timeout for establishing the
     TCP connection. 
    If an attempt to establish the connection takes longer 
    than 10 seconds, abort the operation and try again.
  2.    Use a short timeout for sending the HTTP requests. 
    If the target segment duration is N seconds, use a send 
    timeout between N and 2 N seconds; for example, if 
    the segment duration is 6 seconds, 
    use a timeout of 6 to 12 seconds. 
    If a timeout occurs, reset the connection, 
    open a new connection, 
    and resume stream ingest on the new connection. 
    This is needed to avoid latency introduced
    by failing connectivity in the workflow.
  3. completely resend segments from the ingest source 
    for which a connection was terminated early
  4.    We recommend that the encoder or ingest source 
    does NOT limit the number of retries to establish a
    connection or resume streaming after a TCP error occurs.
    

<Mekuria>          Expires November 7 2018                 [Page12]




  5.    After a TCP error:
   a. The current connection MUST be closed, 
      and a new connection MUST be created 
      for a new HTTP POST request.
   b. The new HTTP POST URL MUST be the same 
      as the initial POST URL for the 
      segment to be ingested.
   c. The new HTTP POST MUST include stream 
      headers ("ftyp", and "moov" boxes) that are 
      identical to the stream headers in the 
      initial POST request for fragmented media ingest.
   d. The last two fragments sent for each segment 
      MAY be retransmitted. Other ISOBMFF fragment 
      timestamps MUST increase continuously, 
      even across HTTP POST requests.
  6.  The encoder or ingest source SHOULD terminate 
    the HTTP POST request if data is not being sent 
    at a rate commensurate with the MP4 segment duration. 
    An HTTP POST request that does not send data can 
    prevent publishing points or media processing entities 
    from quickly disconnecting from the live encoder or 
    media ingest source in the event of a service update. 
    For this reason, the HTTP POST for sparse (ad signal) 
    tracks SHOULD be short-lived, terminating as soon as 
    the sparse fragment is sent. 
   In addition this draft defines responses to the 
   POST requests in order to signal the live media source its status.  
   7.  In case the media processing entity cannot process the manifest 
    or segment POST request due to authentication or permission 
    problems then it can return a permission denied HTTP 403 
   8.  In case the media processing entity can process the manifest 
    or segment POSTED to the POST_URL it returns HTTP 200 OK or 
    202 Accepted
   9.  In case the media processing entity can process 
    the manifest or segment POST request but finds 
    the media type cannot be supported it returns HTTP 415 
    unsupported media type
   10. In case an unknown error happened during
       the processing of the HTTP 
        POST request a HTTP 400 Bad request is returned 
   11. In case the media processing entity cannot 
       proces a segment posted 
       due to missing init segment, a HTTP 412 
       unfulfilled condition
       is returned
   12. In case a media source receives an HTTP 412 response, 
       it SHOULD resend the manifest and "ftyp" and "moov" 
       boxes for the track.    

       
<Mekuria>          Expires November 7 2018                 [Page13]




An example of media ingest with failure and HTTP 
responses is shown in the following figure:

||===============================================================|| 
||=====================            ============================  || 
||| live media source |            |  Media processing entity |  ||
||=====================            ============================  ||
||        ||                                     ||              ||
||===============Initial Manifest Sending========================||
||        ||                                     ||              ||
||        ||-- POST /prefix/media.mpd  -------->>||              ||
||        ||          Succes                     ||              ||
||        || <<------ 200 OK --------------------||              ||
||        ||      Permission denied              ||              ||
||        || <<------ 403 Forbidden -------------||              ||
||        ||             Bad Request             ||              ||
||        || <<------ 400 Forbidden -------------||              ||
||        ||         Unsupported Media Type      ||              ||
||        || <<------ 415 Unsupported Media -----||              ||
||        ||                                     ||              ||
||==================== Segment Sending ==========================|| 
||        ||-- POST /prefix/chunk.cmaf  ------->>||              ||
||        ||          Succes/Accepted            ||              ||
||        || <<------ 200 OK --------------------||              ||
||        ||          Succes/Accepted            ||              ||
||        || <<------ 202 OK --------------------||              ||
||        ||      Premission Denied              ||              ||
||        || <<------ 403 Forbidden -------------||              ||
||        ||             Bad Request             ||              ||
||        || <<------ 400 Forbidden -------------||              ||
||        ||         Unsupported Media Type      ||              ||
||        || <<------ 415 Forbidden -------------||              ||
||        ||         Unsupported Media Type      ||              ||
||        || <<-- 412 Unfulfilled Condition -----||              ||
||        ||                                     ||              ||
||        ||                                     ||              ||
||=====================            ============================  || 
||| live media source |            |  Media processing entity |  ||
||=====================            ============================  ||
||        ||                                     ||              ||
||===============================================================|| 

<Mekuria>          Expires November 7 2018                 [Page13]




7. Guidelines for Handling of Live Media Source Failover
  Encoder or media ingest source failover is the second type
  of failover scenario that needs to be addressed for end-to-end 
  live streaming delivery. In this scenario, the error condition 
  occurs on the encoder side. The following expectations apply fro
  m the live ingestion endpoint when encoder failover happens:
  1.    A new encoder or media ingest source instance 
        SHOULD be created to continue streaming
  2.    The new encoder or media ingest source MUST use 
        the same URL for HTTP POST requests as the failed instance.
  3.    The new encoder or media ingest source POST request 
        MUST include the same header boxes moov 
        and ftyp as the failed instance.
  4.    The new encoder or media ingest source 
        MUST be properly synced with all other running encoders 
        for the same live presentation to generate synced audio/video 
        samples with aligned fragment boundaries. 
        This implies that UTC timestamps 
        for fragments in the "tdft" match between decoders, 
        and encoders start running at
        an appropriate segment boundary.
  5.    The new stream MUST be semantically equivalent 
        with the previous stream, and interchangeable 
        at the header and media fragment levels.
  6.    The new encoder or media ingest source SHOULD 
        try to minimize data loss. The basemediadecodetime tdft 
        of media fragments SHOULD increase from the point where 
        the encoder last stopped. The basemediadecodetime in the 
        "tdft" box SHOULD increase in a continuous manner, but it 
        is permissible to introduce a discontinuity, if necessary. 
        Media processing entities or publishing points can ignore 
        fragments that it has already received and processed, so 
        it is better to error on the side of resending fragments 
        than to introduce discontinuities in the media timeline.

8.  Security Considerations

   No security considerations except the ones mentioned 
   in the preceding text. Further
   security considerations will be updated 
   when they become known.

9.  IANA Considerations
   
  This memo includes no request to IANA.

10.  Contributors

Arjen Wagenaar, Dirk Griffioen, Unified Streaming B.V.
We thank all of the individual contributors to the discussions 
in [fmp4git] representing major content delivery networks, 
broadcasters, commercial encoders and cloud service providers.
                  
<Mekuria>          Expires November 7 2018                 [Page14]




11.  References
   

11.1.  Normative References

    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

    [DASH]  MPEG ISO/IEC JTC1/SC29 WG11, "ISO/IEC 23009-1:2014: 
            Dynamic adaptive streaming over HTTP (DASH) -- Part 1: 
            Media presentation description and segment formats," 2014.
    
    [SCTE-35] Society of Cable Television Engineers, 
              "SCTE-35 (ANSI/SCTE 35 2013) 
               Digital Program Insertion Cueing Message for Cable," 
               SCTE-35 (ANSI/SCTE 35 2013).
               
    [ISOBMFF] MPEG ISO/IEC JTC1/SC29 WG11, " Information technology  
              -- Coding of audio-visual objects Part 12: ISO base 
              media file format ISO/IEC 14496-12:2012" 
  
    [HEVC]    MPEG ISO/IEC JTC1/SC29 WG11, 
              "Information technology -- High efficiency coding 
              and media delivery in heterogeneous environments 
              -- Part 2: High efficiency video coding", 
              ISO/IEC 23008-2:2015, 2015.
        
    [RFC793]  J Postel IETF DARPA, "TRANSMISSION CONTROL PROTOCOL,"
               IETF RFC 793, 1981. 
               
    [RFC3986] R. Fielding, L. Masinter, T. Berners Lee, 
              "Uniform Resource Identifiers (URI): Generic Syntax," 
               IETF RFC 3986, 2004.
               
    [RFC1035] P. Mockapetris, 
              "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION" 
              IETF RFC 1035, 1987.
          
    [CMAF]   MPEG ISO/IEC JTC1/SC29 WG11, "Information technology 
             (MPEG-A) -- Part 19: Common media application 
             format (CMAF) for segmented media," 
             MPEG, ISO/IEC International standard
           
    [RFC5234] D. Crocker "Augmented BNF for Syntax Specifications: 
              ABNF"  IETF RFC 5234 2008    
    
    [CENC]   MPEG ISO/IEC JTC1 SC29 WG11 "Information technology -- 
             MPEG systems technologies -- Part 7: Common encryption 
             in ISO base media file format files"
             ISO/IEC 23001-7:2016

 <Mekuria>          Expires November 7 2018                [Page15]



    
    
    [MPEG-4-30] MPEG ISO/IEC JTC1 SC29 WG11 
              "ISO/IEC 14496-30:2014 Information technology 
              Coding of audio-visual objects -- Part 30": 
              Timed text and other visual overlays in 
              ISO base media file format
              
   [ISO639-2] ISO 639-2  "Codes for the Representation of Names
              of Languages -- Part 2 ISO 639-2:1998"
              
   [DVB-DASH] ETSI Digital Video Broadcasting 
               "MPEG-DASH Profile for Transport of ISOBMFF
               Based DVB Services over IP Based Networks" 
               ETSI TS 103 285
    
   [RFC7617] J Reschke "The 'Basic' HTTP Authentication Scheme"
             IETF RFC 7617 September 2015
             
11.2.  Informative References

    [RFC2626]  R. Fielding et al 
             "Hypertext Transfer Protocol HTTP/1.1", 
             RFC 2626 June 1999
    
    [RFC2818] E. Rescorla RFC 2818 HTTP over TLS 
             IETF RFC 2818 May 2000

                            
11.3.  URL References

   [fmp4git]    Unified Streaming github fmp4 ingest, 
                "https://github.com/unifiedstreaming/fmp4-ingest".
   
   [MozillaTLS] Mozilla Wikie Security/Server Side TLS 
                https://wiki.mozilla.org/Security/Server_Side_TLS
                #Intermediate_compatibility_.28default.29 
                (last acessed 30th of March 2018)
                
    [ID3v2]      M. Nilsson  "ID3 Tag version 2.4.0 Main structure" 
                http://id3.org/id3v2.4.0-structure
                November 2000 (last acessed 2nd of May 2018)
                         
Author's Address

   Rufael Mekuria (editor)
   Unified Streaming
   Overtoom 60 1054HK 

   Phone: +31 (0)202338801
   E-Mail: rufael@unified-streaming.com

   
     
<Mekuria>          Expires November 7 2018                 [Page16]