IETF 87 Berlin - Tuesday, 30 July 2013 Minutes of AVTEXT WG Meeting Chairs: Keith Drage and Jonathan Lennox Minute Takers: Mo Zanaty and Roni Even MP3 Recording: http://www.ietf.org/audio/ietf87/ietf87-potsdam2-20130730-1700-pm3.mp3 Meetecho Recording: http://ietf87.conf.meetecho.com/index.php/Recorded_Sessions#AVTEXT Note Well, Agenda, and WG Status Chair Status: The chairs presented the WG status slides, including the Note Well, Agenda, and Document Status. Support for Multiple Clock Rates in an RTP Session has completed WGLC, and a Publication Request will be sent shortly after the IETF. Duplicating RTP Streams will go to WGLC shortly after the IETF session. The chairs asked for reviewers; Bo Burman and Flemming Andreason volunteered to review the document. RTP Taxonomy: draft-lennox-raiarea-rtp-taxonomy-01 Bo Burman presented the RTP Taxonomy document. The purpose of the document is to explain and clarify names and relationship of concepts; the purpose of the presentation is to focus on concepts and semantics rather than the specifics of names. Keith (as chair): i.e., if you don't like the names chosen, discuss it on the list; if you don't like the semantics, discuss that here. Capture Device Cullen Jennings commented that the terms presented on the slides for Capture Device as the ones WebRTC uses aren't accurate. Bo said that the concepts don't quite align precisely, and he was trying to describe things that are close. Keith said that if there are relevant documents that use different names, they should perhaps change to align with this document. Cullen said that the relevant W3C spec (in this case, GetUserMedia) is near their equivalent of WGLC and is unlikely to change. Bo suggested that perhaps this document can explain where and why names differ. Media Source Martin Thomson said that WebRTC Source and RTCMediaStreamTrack are distinct. I think "Media Source" is source, not track. Jonathan commented that perhaps we need someone who knows WebRTC well to co-author the document. Martin and Cullen Jennings said they could review, but not co-author. Jonathan: we need non-slide review. Roni Even: what's the difference between a media source and a recording device? Martin: I think that's the point of confusion here. Suhas Nandakumar: "Capture Device" corresponds to physical device, Media Source is a raw stream of media data. (I.e. delete "Source of a raw stream" from the definition.) Martin: Name is poorly chosen, conflicts with W3C definitions. Jonathan (as chair): namesmithing to the list, but point taken. Dan Burnett: The current W3C source concept is something which the browser can give you, which may not be directly from a physical device. A track has 1 source, but multiple tracks can have the same source. Cullen Jennings: tracks can have multiple channels, like stereo mics. Dan Burnett: Channel is too low level for WebRTC. Apps can't manipulate channels. Bo: the draft is not very explicit about this aspect today, and it should be. The thinking that is in there today is that if a media source is separable at all it's a single channel. Paul Kyzviat: This is just an abstraction; if you have media it's encoded somehow. Bo: yes. Mo Zanaty: W3C media source is like a container which is a missing concept in many RTP apps, and in SDP. Bo: yes, but you have aggregation of different levels. Keith: this isn't meant to be a representation only of what RTP does; if there are concepts that WebRTC needs that don't correspond directly to what RTP does we should add them. Roni Even: Media source is an m-line which can be encoded in some way(s). Bo: yes, and the mapping to SDP of this concept is cumbersome, because a single media source can end up on multiple m-lines. [no name stated]: html5 media source is different from this definition. It contains multiplexed encoded media. Bo: yes, the term is heavily overloaded. Jonathan Lennox: is there currently a concept in this taxonomy that maps to the html5 concept? Bo: no. Harald: The last line on the slide should be "One m=line can describe one Media Source *when using the unified plan terminology*". Otherwise you're in the current sad state. Jonathan: that discussion is in scope for MMUSIC; this draft should reflect the thing or set of things that MMUSIC ends up deciding. Martin Thomson: +1. Cullen: agreed; I'd like to see the RFC which allows an m-line to describe multiple media sources. Media Stream Martin Thomson: is a media stream the thing with an SSRC? Bo: yes. Martin: That's not going to work. Bo: this isn't a one-to-one mapping; the alternate usages are trying to map somewhere near existing concepts. Media stream isn't quite the same as SSRC. Cullen: never use the name "media stream"; also, the definition is too vague to uniquely identify anything. Keith: Can you send some text to the list? Jonathan (as individual): there are two or possibly three important concepts here, you're being vague about them, we don't have good names for any of them. We need a name for a set of SSRCs including repair/layer/simulcast flows. Bo: let's defer this until the figures. Colin Perkins: Is this a stream of RTP packets, or something more general? Bo: Yes. Colin: in RFC 3550, SSRC is more conceptual in RTP than just the stream of RTP packets. Does media stream include FEC or retransmissions, or an SVC stream with multiple SSRCs? Bo: no, a media stream can have related media streams. Martin: do you intend this to identify the same thing as you deal with things like SSRC collisions? A stream of RTP packets with the same SSRC, subject to SSRC changes for collision, is easy to understand. If media stream refers to something else like a group of related SSRCs, that is too fuzzy. Suhas: That is the definition in the draft, media stream is an SSRC. RTP Session Martin Thomson: The existing definition of RTP Session is adequate, we don't need to change it. Colin Perkins: Yes, the distinguishing feature of an RTP Session is the shared SSRC space.Colin: Yes, the distinguishing feature is the shared SSRC space. Bo: we could just remove the concept from the draft, but we were asked to put it in. Harald Alvestrand: The concept is needed, but the "alternate usages" section of the slide is mis-leading. One m-line describes (not maps to) one RTP session. Many m-lines in a large system can be describing the same session. Bo: An m-line gives one view of the RTP session. Colin: +1 Suhas: this is the definion from RFC 3550, copy&pasted. Bo: Will keep in the draft as a description of an existing entity, referring to the existing definition from the RTP spec. Jonathan (as individual): we need to talk about RTP sessions in this document, since other entities we're describing have complicated relationships with them, but they're not strictly part of the hierarchy we're defining. Media Transport Colin Perkins: Issue with media transport carrying "one or more RTP sessions", which is what SHIM would have enabled. Or you can view SHIM as part of the transport and leave the transport to only support one RTP session. Cullen Jennings: Does this include the SRTP layer or just the UDP flow? Bo: Do we need to discuss encryption as a separate layer in the taxonomy? Cullen: Gets complicated when we have separate keying on the same UDP flow. Martin Thomson: That won't happen if EKR has his way and we have DTLS everywhere, which doesn't support that. Suhas: A media transport is just a UDP flow. Eric Rescrola: DTLS with SHIM would still result in multiple key sets over the same 5-tuple. Colin: yes, but then you essentially have several transports. Bo: I think we need a better definition of transport. Dan Burnett: WebRTC media stream is not represented in your alternate usages. Is it later? Bo: Not in the current draft. Martin Thomson: Is synchronization context upcoming: Bo: yes. Martin Thomson: Is media source relevant for the sender and receiver's view? Bo: Yes, same name/concept. Martin: I agree, but it makes the term "source" odd. Bo: please suggest names. I think if we have concepts names should be easy. Martin: no, names are extremely important. Relationships. Martin Thomson: Cardinality is interesting. A provider provides exactly 1 stream? Bo: No. UML diagrams in draft answer this. Roni Even: What comes out of the media renderer, raw or encoded media? Bo: Raw media. Jonathan: Asymmetry in source and render diagrams is confusing. Justin Uberti: What's the distinction between the capture and the media source? Suhas: Capture is the physical device, media source is what is produced. Jonathan: Capture on the slide should say capture device. Justin: Which is the WebRTC media source? Bo: The capture device. Dan: No. Justin and Dan: Device can emit multiple sources. Martin: I disagree with Dan. Need to sort this out in W3C media capture group, before we can explain our mapping. Hopefully this week. Bo: I'm looking forward to this information. Emil Ivov: How does a mixer fit in here? Bo: It's a media source. Jonathan: Mixer should produce media sources. It's not a capture device, it's a similar but separate concept. Stephen Botzko: It's also a renderer-side device. Stephen Botzko: What about cameras which support multiple simultaneous resolutions? Bo: Same capture device with multiple media sources. Stephen: We may need a rendering sink before the rendering device, for symmetry with source/device, looks like there's something missing. Bo: Currently nothing's there, perhaps media source should be there on the rendering side as well. Colin Perkins: We need clear terminology for media which has been decoded and must be displayed. Cullen Jennings: Can we name the arrows between boxes? We also need to distinguish in rendering between a window and a screen. Keith: When we were discussing mixers before, was that with the intent of documenting them or is it too complex? Bo: I think we can capture the most important aspects of it. Mixer needs a text description but does not impact the core model -- it's an object that has a number of renderers and providers. Synchronization Context Jonathan: Synchronization was one of the original purposes of RTCP CNAME, but not the only one. Martin Thomson: Synchronization is transitive, so WebRTC media tracks in multiple media imply sync of those media streams. Cullen(?): I don't agree with that. Stephen Botzko: Is there a similar concept for strong spatial relationships for layout? Bo: not currently. Rob Hansen: CLUE scene describes aggregation of media streams but not necessarily sync. Bo: Is there anything more appropriate in CLUE to use? Rob: No, our intention is to use RTCP CNAME, not invent something new. Cullen Jennings: Confusing. What is the difference between a lip sync group and a CNAME? That answer may clarify the confusion. Containment Context Colin Perkins: Stop using the term RTP session multiplexing, it's very is unclear -- multiple sessions, or multiple things in a session?. Martin Thomson: Not sure we need this concept, unless it is for layered/simulcast/repair flows. Equivalence Context Martin Thomson: How do 2 media providers map to 1 media stream? Bo: 2 layers in 1 SSRC. Martin: This is not MST as shown. It would also be useful to show multiple RTP sessions for completeness. Justin Uberti: Confusion about 2 layers in 1 SSRC along with 1 layer in 1 SSRC. Is this a hybrid case? Martin Thomson: Yes, sometimes layers get encoded in the same stream, sometimes in different streams. Jonathan: The example is a hybrid. Mo Zanaty: Hybrid case is important in real deployments. Layered and simulcast streams need clearer terminology. Consider dependent streams as a general concept for layered, FEC, RTX streams. Bo: The foundation for this should already be in the draft. If a specific case is important we should put it in the draft. Chair slides on RTP header extensions The chairs presented slides on four drafts that have proposed RTP header extensions. VP8 Temporal Layer RTP Header Extension (Adam Fineberg) draft-fineberg-avtext-temporal-layer-ext-00 Should this be generalized, or left as codec specific? Is the use case as described to avoid re-encryption really useful and feasible? Colin Perkins and Justin Uberti: Not clear if it's useful to spend time on it, given the need to have crypto keys to re-authenticate after modifying RTP headers and to generate proper RTCP reports. Mo Zanaty: Confused if the use case is dropping entire layers due to endpoint capabilities, or dropping some packets for congestion control. An extension for some notion of payload priority like SVC PRID, not necessarily just temporal layer, may be useful for the latter. Jonathan: Ask the list, since Adam is not here. Eric Rescrola: The feature itself seemed desirable: I'd like to be able to do scalable coding in a middlebox without needing to do re-encryption. Is the feature itself desirable/useful? Jonathan: It would be desirable but technically difficult. Justin Uberti: Regarding Mo's suggestion, an extension for general priority may complicate things, consider audio, etc. Interesting but I'm not sure Adam wants to not sign up for that. No Plan RTP Header Extension (Emil Ivov) Emil: The idea is to enable associating different related SSRCs like layers without having to look at signaling. Colin Perkins and Jonathan: Sounds related to App ID and SRCNAME. SDP App ID/Token RTP Header Extension (Jonathan Lennox and Roni Even) Jonathan: Declared in SDP, bind identifier at a somewhat higher level than SSRC. Sent in RTP header extension and RTCP SDES item. Discussion will be in MMUSIC. Unified Plan RTP Header Extension (Adam Roach) Jonathan: Correlates bundled RTP streams to SDP m-lines. Feels like App ID. Adam: Agree to unify this work with App ID. Closing Discussion Keith: RTP header extensions should at least be reviewed in AVTEXT. For general extensions broadly applicable to multiple groups, the work should be done in AVTEXT. Keith: Taxonomy draft should be updated in next 2-3 weeks, even if some sections say we haven't reached resolution on some topics. Suggest a virtual interim (2-3 hours) shortly after. WG adoption call after interim. Dan Burnett and Martin Thomson: W3C should have formal review call of this, but after more progress in IETF WG. Current draft may cause confusion. Key W3C reviewers are already present in this room. Harald Alvestrand: The next rev should be reviewed by W3C.