Use Cases for Telepresence Multistreams
draft-ietf-clue-telepresence-use-cases-09
Yes
(Gonzalo Camarillo)
(Sean Turner)
No Objection
(Barry Leiba)
(Brian Haberman)
(Jari Arkko)
(Joel Jaeggli)
(Martin Stiemerling)
(Stephen Farrell)
(Stewart Bryant)
Note: This ballot was opened for revision 08 and is now closed.
Gonzalo Camarillo Former IESG member
Yes
Yes
(for -08)
Unknown
Sean Turner Former IESG member
Yes
Yes
(for -08)
Unknown
Spencer Dawkins Former IESG member
Yes
Yes
(2014-01-07 for -08)
Unknown
Overall, this draft is nice work. I share Adrian's comments, so I won't repeat (most of) them here. In addition, I have a few non-blocking comments you might want to consider, along with any other comments you receive. In this text: 2. Telepresence Scenarios Overview We describe such a telepresence system as sending one or more video streams, audio streams, and presentation streams to the remote system(s). (Note that the number of audio, video or presentation streams is generally not identical.) I thought the parenthetical remark was saying the number of audio channels I was sending wasn't necessarily the number of video or presentation channels I was sending, but I started losing clarity wondering whether this was intended to also describe streams in both directions (so, I have three cameras and you have five as in section 3.2. "Point to point meeting: asymmetric", not identical), or streams to different participating sites (so, I send mono audio to you and stereo audio to another participating site, as in section 3.3. "Multipoint meeting", not identical). Could you clarify what was actually meant? Or maybe you could just delete the parenthetical - the use cases provide clearer text. For example, section 3.1. "Point to point meeting: symmetric" says this: The number of microphones and the number of audio channels are often not the same as the number of cameras. Also the number of microphones is often not the same as the number of loudspeakers. In this text: 3.5. Heterogeneous Systems Some may be able to handle multiple streams and others can handle only a single stream. (We are not here talking about legacy systems, but rather systems built to participate in such a conference, although they are single stream only.) In a single video stream , the stream may contain one or more compositions depending on the available screen space on the device. In most cases an intermediate transcoding device will be relied upon to produce a single stream, perhaps with some kind of continuous presence. the doc has been careful to take the participants view of each use case so far, and that means for example that the text is silent on whether a mixer is present - the stream selection could be performed at the sender, receiver, or at a mixer. As the doc introduces the first intermediate device (a transcoder) here and section 3.6 explicitly says "MCU", I started wondering if there are any considerations in earlier sections that only arise when intermediate devices are present. I'm also wondering if intermediate devices should be introduced somewhere in section 2 instead of just popping up. In Section 6 ... please be guided by shepherds and ADs, but I especially support this part of Adrian's comments. I wonder if you could include any text on telepresence-specific security considerations in this document, so we're not starting the conversation while balloting solution drafts. Even a bulleted list of attack surfaces would seem helpful to me.
Adrian Farrel Former IESG member
No Objection
No Objection
(2014-01-03 for -08)
Unknown
Wondered why H.264 doesn't qualify for a reference in Section 1. --- In 3.1 The important thing here is that each of the 2 sites has the same number of screens. Each screen is paired with a corresponding camera. Each camera / screen pair is typically connected to a separate codec, producing a video encoded stream for transmission to the remote site, and receiving a similarly encoded stream from the remote site. I understand the pairing of screen to camera. I did not understand why each site (i.e., both sites) has the same number of screens. It was only when I reached 3.2 that I realised that 3.1 was drawing out a sub-case of point-to-point meetings and that 3.2 was describing the other sub- case. Reading (and re-reading)-: the two sections, I think that you have not drawn out adequately the concept of seats. That is, Section 3.2 gets in a bit of a mess trying to handle the mismatch between seats and cameras/ screens. But this issue is a separate thing from the mismatch of number of cameras/screens at the two sites. --- While the situation described in Section 3.5 is clearly very important (indeed, maybe at least as important as the perfect "everyone is in a telepresence suite" use cases), it seems to diverge from the definition of telepresence presented earlier in the document. That is, according to the description of telepresence, someone connecting in using a lap-top with tiny video windows is not participating in telepresence. Rather than rule Section 3.5 out of scope (because I think it is important) it would be better if you refined your definition of telepresence earlier in the document to allow this "special case" to be in scope. --- Section 6 reads like a cop-out! Surely you can spend some time describing the security issues for the general telepresence use cases, which would seem to be: - disruption - knowledge of participation - witnessing - uninvited participation It is probable that all use cases demand that all of these issues are completely prevented, but it may be that some of the issues are considered more severe in some use cases.
Barry Leiba Former IESG member
No Objection
No Objection
(for -08)
Unknown
Benoît Claise Former IESG member
No Objection
No Objection
(2014-01-09 for -08)
Unknown
1. Furthermore, these use cases do not describe all the aspects needed to create the best user experience. I guess you mean "aspects" that are not relevant to the multiple audio and video streams requirement. So not interesting for us. You might want to expand on this. 2. OLD: One common policy is called site switching. Let's say the speaker is at site A and everyone else is at a "remote" site. When the room at site A shown, all the camera images from site A are forwarded to the remote sites. NEW: One common policy is called site switching. Let's say the speaker is at site A and everyone else is at different "remote" sites. When the room at site A shown, all the camera images from site A are forwarded to the remote sites. 3. Section 3.5. Heterogeneous Systems Question: Is there a use case for a PC running Skype to participate to the telepresence? I mean: I'm in the telepresence room, but I have someone on Skype on my own PC: can I plug a cable from my PC, and make that person participating. I know that we're far from the telepresence experience, but I've seen that type of Skype set up in IETF meetings where the chair microphone is close to the PC speaker, at least allowing reasonable audio performance. Or maybe that doesn't qualify as a telepresence use case? 4. MCU acronym Warning: As an OPS AD, I will carefully review the operational requirements in draft-ietf-clue-telepresence-requirements-07
Brian Haberman Former IESG member
No Objection
No Objection
(for -08)
Unknown
Jari Arkko Former IESG member
No Objection
No Objection
(for -08)
Unknown
Joel Jaeggli Former IESG member
No Objection
No Objection
(for -08)
Unknown
Martin Stiemerling Former IESG member
No Objection
No Objection
(for -08)
Unknown
Stephen Farrell Former IESG member
No Objection
No Objection
(for -08)
Unknown
Stewart Bryant Former IESG member
No Objection
No Objection
(for -08)
Unknown
Ted Lemon Former IESG member
No Objection
No Objection
(2014-01-09 for -08)
Unknown
You didn't quite mention the use case where participants are all wearing immersive VR devices like the Oculus Rift, and the simulated virtual space is synthesized onto a single display for each participant. I _think_ that you covered this adequately anyway in 3.7, but I mention it here just to make sure that this scenario was in fact considered. In this scenario the image of participants wearing VR headsets would have to be synthesized on-site, but it's otherwise pretty much the same as the virtual space scenario with cameras and large displays.