Skip to main content

Use Cases for Telepresence Multistreams
draft-ietf-clue-telepresence-use-cases-09

Yes

(Gonzalo Camarillo)
(Sean Turner)

No Objection

(Barry Leiba)
(Brian Haberman)
(Jari Arkko)
(Joel Jaeggli)
(Martin Stiemerling)
(Stephen Farrell)
(Stewart Bryant)

Note: This ballot was opened for revision 08 and is now closed.

Gonzalo Camarillo Former IESG member
Yes
Yes (for -08) Unknown

                            
Sean Turner Former IESG member
Yes
Yes (for -08) Unknown

                            
Spencer Dawkins Former IESG member
Yes
Yes (2014-01-07 for -08) Unknown
Overall, this draft is nice work. 

I share Adrian's comments, so I won't repeat (most of) them here.

In addition, I have a few non-blocking comments you might want to consider, along with any other comments you receive.

In this text:

2.  Telepresence Scenarios Overview

   We describe such a telepresence system as sending one or more video
   streams, audio streams, and presentation streams to the remote
   system(s).  (Note that the number of audio, video or presentation
   streams is generally not identical.)

I thought the parenthetical remark was saying the number of audio channels I was sending wasn't necessarily the number of video or presentation channels I was sending, but I started losing clarity wondering whether this was intended to also describe streams in both directions (so, I have three cameras and you have five as in section 3.2. "Point to point meeting: asymmetric", not identical), or streams to different participating sites (so, I send mono audio to you and stereo audio to another participating site, as in section 3.3. "Multipoint meeting", not identical). 

Could you clarify what was actually meant? Or maybe you could just delete the parenthetical - the use cases provide clearer text. For example, section 3.1.  "Point to point meeting: symmetric" says this:

   The number of microphones and the number
   of audio channels are often not the same as the number of cameras.
   Also the number of microphones is often not the same as the number of
   loudspeakers.

In this text:

3.5.  Heterogeneous Systems

   Some may be able to handle multiple streams and others can handle
   only a single stream.  (We are not here talking about legacy systems,
   but rather systems built to participate in such a conference,
   although they are single stream only.)  In a single video stream ,
   the stream may contain one or more compositions depending on the
   available screen space on the device.  In most cases an intermediate
   transcoding device will be relied upon to produce a single stream,
   perhaps with some kind of continuous presence.

the doc has been careful to take the participants view of each use case so far, and that means for example that the text is silent on whether a mixer is present - the stream selection could be performed at the sender, receiver, or at a mixer. 

As the doc introduces the first intermediate device (a transcoder) here and section 3.6 explicitly says "MCU", I started wondering if there are any considerations in earlier sections that only arise when intermediate devices are present.

I'm also wondering if intermediate devices should be introduced somewhere in section 2 instead of just popping up.

In Section 6 ... please be guided by shepherds and ADs, but I especially support this part of Adrian's comments. I wonder if you could include any text on telepresence-specific security considerations in this document, so we're not starting the conversation while balloting solution drafts. Even a bulleted list of attack surfaces would seem helpful to me.
Adrian Farrel Former IESG member
No Objection
No Objection (2014-01-03 for -08) Unknown
Wondered why H.264 doesn't qualify for a reference in Section 1.

---

In 3.1

   The important thing here is that each of the 2 sites has the same
   number of screens.  Each screen is paired with a corresponding
   camera.  Each camera / screen pair is typically connected to a
   separate codec, producing a video encoded stream for transmission to
   the remote site, and receiving a similarly encoded stream from the
   remote site.

I understand the pairing of screen to camera. I did not understand why
each site (i.e., both sites) has the same number of screens. It was only
when I reached 3.2 that I realised that 3.1 was drawing out a sub-case 
of point-to-point meetings and that 3.2 was describing the other sub-
case.

Reading (and re-reading)-: the two sections, I think that you have not
drawn out adequately the concept of seats. That is, Section 3.2 gets in
a bit of a mess trying to handle the mismatch between seats and cameras/
screens. But this issue is a separate thing from the mismatch of 
number of cameras/screens at the two sites.

---

While the situation described in Section 3.5 is clearly very important
(indeed, maybe at least as important as the perfect "everyone is in a
telepresence suite" use cases), it seems to diverge from the definition
of telepresence presented earlier in the document. That is, according to
the description of telepresence, someone connecting in using a lap-top
with tiny video windows is not participating in telepresence.

Rather than rule Section 3.5 out of scope (because I think it is 
important) it would be better if you refined your definition of 
telepresence earlier in the document to allow this "special case" to be
in scope.

---

Section 6 reads like a cop-out! Surely you can spend some time
describing the security issues for the general telepresence use cases,
which would seem to be:
- disruption
- knowledge of participation
- witnessing
- uninvited participation
It is probable that all use cases demand that all of these issues are
completely prevented, but it may be that some of the issues are 
considered more severe in some use cases.
Barry Leiba Former IESG member
No Objection
No Objection (for -08) Unknown

                            
Benoît Claise Former IESG member
No Objection
No Objection (2014-01-09 for -08) Unknown
1.

   Furthermore, these use cases do not describe all the aspects needed
   to create the best user experience.

I guess you mean "aspects" that are not relevant to the multiple audio and video streams requirement.
So not interesting for us.  You might want to expand on this.

2.
OLD:

   One common policy is called site switching.  Let's say the speaker is
   at site A and everyone else is at a "remote" site.  When the room at
   site A shown, all the camera images from site A are forwarded to the
   remote sites. 

NEW:          
   One common policy is called site switching.  Let's say the speaker is
   at site A and everyone else is at different "remote" sites.  When the room at
   site A shown, all the camera images from site A are forwarded to the
   remote sites. 


3.
Section 3.5. Heterogeneous Systems
Question: Is there a use case for a PC running Skype to participate to the telepresence? I mean: I'm in the telepresence room, but I have someone on Skype on my own PC: can I plug a cable from my PC, and make that person participating.
I know that we're far from the telepresence experience, but I've seen that type of Skype set up in IETF meetings where the chair microphone is close to the PC speaker, at least allowing reasonable audio performance.
Or maybe that doesn't qualify as a telepresence use case?

4.
MCU acronym


Warning: As an OPS AD, I will carefully review the operational requirements in draft-ietf-clue-telepresence-requirements-07
Brian Haberman Former IESG member
No Objection
No Objection (for -08) Unknown

                            
Jari Arkko Former IESG member
No Objection
No Objection (for -08) Unknown

                            
Joel Jaeggli Former IESG member
No Objection
No Objection (for -08) Unknown

                            
Martin Stiemerling Former IESG member
No Objection
No Objection (for -08) Unknown

                            
Stephen Farrell Former IESG member
No Objection
No Objection (for -08) Unknown

                            
Stewart Bryant Former IESG member
No Objection
No Objection (for -08) Unknown

                            
Ted Lemon Former IESG member
No Objection
No Objection (2014-01-09 for -08) Unknown
You didn't quite mention the use case where participants are all wearing immersive VR devices like the Oculus Rift, and the simulated virtual space is synthesized onto a single display for each participant.   I _think_ that you covered this adequately anyway in 3.7, but I mention it here just to make sure that this scenario was in fact considered.   In this scenario the image of participants wearing VR headsets would have to be synthesized on-site, but it's otherwise pretty much the same as the virtual space scenario with cameras and large displays.