Minutes for RTCWEB at IETF-interim-2011-rtcweb-1
Real-Time Communication in WEB-browsers
||Minutes for RTCWEB at IETF-interim-2011-rtcweb-1
Minutes for RTCWEB WG Interim Meeting on the 8th of September
Chairs: Magnus Westerlund, Ted Hardie, Cullen Jennings
Thanks to three minutes taker: Cary Bran, Mary Barnes, and Stephan Wenger
There are about 45 people on the call.
List of people on call:
Alan Johnston, Alissa Cooper, Allyn Romanow, Andy Hutton, Atin Banerjee,
Francois Audet, Bernard Aboba, Bert Greevenbosch, Cary Bran, Charles Eckel,
Christer Holmberg, Christian Schmidt, Colin Perkins, Cullen Jennings, Dan
Burnett, Dan Romanescu, Daryl Malas, Dan York, Enzo (?), Eric Rescorla (EKR),
Ernst Horvath, Francois Audet, Francois Daoust, Haralard Alvestrand, Jim
McEachern, John Elwell, Jonathan Lennox, Justin Uberti, Keith Drage, Kevin
Fleming, Krisztian Kiss, Magnus Westerlund, Markus Isomaki, Mary Barnes,
Matthew Kaufman, Michael Lundbreg, Muthu Arul, Neil Straford, Olle Johansson,
Parthasarathi Ravindran, pm(?), Ram Mohan, Randell Jesup, Salvatore Loreto,
Sebastien Cubaud, Serge Lachapelle, Sohel Khan, Spencer Dawkins, Stefan
Hakansson, Stephan Wenger, Ted Hardie, Thomas Stach, Tim Panton, Wolfgang Beck
WG Chairs slides
Note Well presented and applies Meeting will be recorded Agenda presented
Use Cases (1 hr)
20 minutes: On Recording (John Elwell)
- There was a discussion about the architectural scheme for recording (Slide 6).
No major controversies here
- There was question on how common remote recording are compared local from
Justin. Which could not be answered, but John commented that the need for
remote is likely bigger in RTCWEB due to the lack of middleboxes to do
recording in compared to SIP.
- Ted asked what is the most common today, which is to record locally and then
upload. The reasons for remote recording is the desire to do the recording in
real-time. Ted saw three alternatives:
1) Middlebox alternative,
2) The recording server is a RTCWEB peer and performs analysis directly on
3) Recording is done locally and analysis is done afterwards.
Justin expressed a preference for 2, while Ted made it clear that he would
prefer to avoid two streams.
- Ralph Giles proposed that the API should give access to the media streams
locally for local processing of the media. John Elwell asked if that really
could be done in real-time in the end-point.
- John E made it clear that his desire is an rtc-web architecture that do not
preclude session recording in the future.
- There is an underlying security question here: do you need to know the set of
participants in the the rtc-web session. And are you allowed to forward the
media from one participants to another participant or set of participant
without their explicit consent? Will defer this question to the EKR security
- Ted concluded that method 2) above can be supported assuming no
security issues. And in general there seems to be nothing in the architecture
beyond the security that precludes recording use cases. There will be need
for text in the use case document.
Action Item: John Elwell to work with use case
draft owners to add the recording use case(s) to the requirements draft.
Review and discussion of other use cases proposed
(Use case draft author team). Stefan H?kansson presenting.
Magnus Westerlund commented about the implications of adding more use cases
results in more functionality, which means more work to complete before
documents can be approved.
TURN Case: Cullen presented the use case he originally proposed. There was a
question of what requirements really this imposes. Magnus asked if this
primarily is an implementation question allowing for local configuration of
additional TURN servers, similar to how web proxies work today in browsers.?
Justin asked if these could be detected. Cullen thought it is similar to socks
proxies but not certain. Francois asked if this really special, isn't it
similar to the other parameters that you need, in for example a SIP call?
Harald commented that he hoped that there are not as many ways to detect
proxies to find a TURN server.
Action Item: Needs more discussion and clarification on the mailing list.
E911 Case: Ted clarified that there are two different cases here. The first
being you go to 911 provider web site and use it to communicate. The second
being is federated case or a dial out case where a calling service reaches
beyond its domain to the E911 service. Ted believes the first one is not
different from a normal RTCWEB service. The second is out of scope and the
client aren't really providing location information, and if something is
available it is unassured data, which NENA folks don't' want. Cullen commented
that he don't buy the unassured data reason. People will build services that
provide PSTN like connectivity. In those cases people will call emergency
services. We should not stick our head in the sand, but we also should not
create requirements that no one will implement. Ted added that the service
needs to know which emergency center to use, thus it needs LOST so it can
provide the server with its location. From a service perspective, not terrible.
There is also DoS concerns by tieing up resources at the emergency service
centers. Cullen, thought the DoS angle was silly as this already easily can be
done. Ted thought the issues and the use case in RTCWEB's context should be run
by ECRIT. Bernard Aboba commented that there is a doc in ECRIT about this.
Stephan Wenger agreed with Ted arguments. He also commented that we can look at
the regulatory requirements in some cases, like Skype out. Manufactures or
service providers are required to provide support for emergency services. If we
rule this out of scope we are sticking our heads in the sand. Francois
commented that there are large variations depending on countries or
jurisdiction. We our out of our depth here. Ted supported his earlier argument
that we should take this to ECRIT and look at it from their perspective first.
Randall J brought up the requirement to get location data in the browser
requires security and user permission issues. Stephan W commented that for 911
calls in the US the caller ID blocking is overridden. Randall clarified that is
ok, but you don't want to expose the information to other services. Stefan H
responded that this is already resolved as there exist a location API in W3C
that gets permission from the user.
Action Item: Paul and Stephan to talk to ECRIT chairs to identify subject
matter expert to help with E911 use cases.
Emergency access for disabled: Randall commented that this use case is not only
for emergency, it will be useful in general. Bernard added that there is not a
lot more in this use case than what is in the Emergency Service use case. The
added functionality is conferencing support to allow an interpreter to be added
into the emergency call. Action Item: Bernard to work with emergency use case
proponents to add the additional requirement.
CLUE Case: Magnus stated his view that he didn't see CLUE being clear enough
what it meant. Mary Barnes (CLUE WG chair) commented that their timeline is
similar to RTCWEB's and the use case are mostly done. Thus RTCWEB shouldn't do
something that precludes CLUE use cases. Also in cases like multi-stream we
should not have different solutions. Christer Holmberg commented that CLUE use
cases include the requirement to support single stream end-points. So although
then may not get the full experience they will be able to communicate. Cullen
stated that he doesn't understand what is really required to support CLUE. Mary
responded that one will need to support the data for the functionality that is
being passed, and that is currently being worked on in CLUE and an interim
meeting are coming up which it would be good if some RTCWEB people could
Markus Isomaki asked if these requirements would be on the JS application
rather than the browser that implements RTCWEB? Mary say we don't quite know
that, including if RTCWEB will use SDP O/A or something else. Justin commented
that also RTCWEB will need to handle multiple source inside SDP. Roni Even
commented that the multi-stream being talked in CLUE is multiple video streams,
which is different than most of RTCWEBs use cases where there are multiple
pairs of streams. Also the negotiation should not be a big issue, but we don't
know that yet. Stephan Wenger's view is that as CLUE is mostly about call setup
and control which appears to reside in the web application, thus the
requirement is more in the reverse direction. It is more important for CLUE to
consider that what they develop can be implemented in JS for a RTCWEB compliant
browser. Magnus Westerlund stated the that it is very unclear what the
requirements really are. Is it compatible media planes, or that interoperable
applications at all can be developed. So what is this use case as we don't know
in detail what CLUE is. Mary stated as neither group is far enough to real be
into the details She thinks CLUE interim should add to its agenda a discussion
of use case that are overlapping and see what requirements are needed in each
group. It is desirable for a RTCWEB end-point to be able to participate in a
telepresence conference. Stephan Wenger commented that participation is likely
not an issue, what needs consideration is to provide the more rich telepresence
experience in the RTCWEB context. John Elwell agreed that RTCWEB client needs
to be able to get the benefits for CLUE. However, what the requirements on the
API and functions are not clear.
Cullen concluded the topic and pointed out the clear Action Item and future
alignment discussion will be needed
Action Item: Mary Barnes to allocate time at CLUE interim meeting to discuss
RTC Web/CLUE inter-operability, RTC-Web representation should attend.
Large Multiparty session Stefan presented the use case. The discussion on the
list seems to indicate that the use case is very similar to the centralized
multi-party conference that is present. The suggestion is to skip this use
case. No one objected. Security camera/baby monitor Randall commented that there
clearly are security discussion that needs to discussed. Justin asked if this
include pan, tilt and zoom functions. Randall answered that it would be handy.
Remote assistance Randall this is a common use case with installed applications,
and it would be good to do this without external apps or plugin. Harald
commented that this might belong in W3C space. It is primarily about being able
to take the screen as input. Randall agreed that video and audio from the
system is needed. Jonathan Lennox asked if there is reverse direction also,
where input is transferred? Randall answered that this is possible. The data
stream might not be IETF issue as all and all be an W3C defined opaque stream
from IETF perspective. Clearly there is some security concerns around the
control. Justin commented that this seems to imply that there is some reliable
data transport between the peers. Randall agreed. Cullen Jennings are very
interested in the use case, especially the screen sharing to be able to
implement WEBEX in RTCWEB. The remote control aspect is less important. Stephan
Wenger asked if the reverse scope really is in scope. The screen sharing
clearly is a common use case. The reverse path appear to be out of scope.
Randall commented that the screen may not be transported as video, but over
some reliable or unreliable protocol, like VNC.
General for use cases
Stefan H raised the issue of how we should work around use cases in the near
future. Ted presented the WG chairs thinking on the issue. The WG chair will
run a series of consensus calls for the use cases. It is important that the use
cases captures the requirement they have on functionality so that it is clear
what impact a use case has. If that isn't present the chairs may declare that
there is no consensus. Stefan Wenger objected that the is harsh demands. One
can probably argue the use cases, but not be able to determine the
requirements. Ted responded that we are not going to be mean. But chairs might
declare that the consensus is not determined and that additional work is
required before coming to consensus.
Signalling (1 hr)
--15 minutes Issue Overview (Matthew Kaufmann)
What needs to be standardized
1) between browser and web server - http is already there - does anything else
need to be standardized 2) media transport needs to be standardized. 3)
between web servers - do we need to standardize signaling federation. There
are already existing protocols - i.e., SIP, SDP O/A 4) within browsers: API
but this is a W3C problem.
1) Options: leave to appl developer, SIP, SIP-lite, not SIP but should be SDP
2) Propose to use existing media transport protocols
3) Options: SIP, Other (e.g., XMPP Jingle), up to SPs
Cullen: another option - SIP might be used but SPs could also do what they want.
John E: suggest this doesn't have to be SIP but if it is SIP should be
compliant. Matthew: thinks this is premature (i.e., federations). Note, this
doesn't define what should be between browser and web server.
4) While this isn't an IETF problem, it may be influenced by the solutions for
others (e.g., 1) - How much of calling built-in? -- 1) Based on SIP -- 2)
-- 3) Intermediate choices.
- How does address selection and NAT traversal work?
-- 1) Peer connection is passed SDP blob
-- 2) Peer connection is passed a candidate list, etc.
-- 3) Peer connection has APIs for ICE, etc.
- How does codec selection work?
-- 1) SDP O/A
--- Appls can query for capabilities, allows for more complex APIS, leverages
-- can still leverage MMUSIC
-- can also use SDP as a command mechanism (and not O/A)
-- avoid solving problems out of scope
-- maximize flexibility for applications - turn browser into new operating
system and not just bolting on SIP phone
What needs to be standardized
3) Leave to SPs for now. Look at SIP later
4) Don't build a SIP phone into browser. Implement ICE natively. Don't build
O/A into PeerConnection object. APIs for codec choice, etc.
Discussion: Justin agreed with many presumptions here. However 4) is thorny.
Codec selection should be done with something other than SDP. Matthew responded
API if you want to use O/A. This may not be in scope for IETF - should be done
in W3C. Justin remarked that if one don't pass SDP blobs, then query for object
of capabilities and then configure with another object, would that address the
functionality Matthew wants? Matthew confirmed yes, but may reuse some of the
blobs from MMUSIC. Justin then thought that in the trivial case one could ask
for blob and pass that blob, which is similar to the PeerConnection model that
can generate O/A. Matthew commented that is as simple as getting a JSON blob
and sending to other end and taking that blob out at the other end.
Cullen: What we're talking about here is what can we pull out the blob for
codecs, etc. Matthew answered that we need to come up with a minimal set of
reqruiements and not defining blobs. Cullen responded that you've said
advertisement model allows you to do more innovative things than O/A. Thinks
SDP O/A is rich enough to control this. Matthew responded that the API as
currently proposed for W3C says you don't get to know capabilities until you
generate an offer. Example: 10 party conference call, everyone has selected
codes that work for them. What do you tell an 11th party that doesn't support
those codecs. API knows all the capabilities. Thus, web server can make a more
intelligent decision - e.g., switch codecs for call. Cullen responded that O/A
allows you to accomplish the same. Current APIs don't work. Don't see why O/A
changes innovation. Matthew believes they are mappable. Can generate an offer
and deconstruct it. This is about building an O/S. This question should be
answered at W3C. Thinks that either of these is equivalent.
Lots of Q&As on 4) have slopped over into 1) and 3).
Francois remarked that we should be thinking about how to maximize possibility
that we're successful. Seems to me that we should focus on the things in our
mandate - i.e., protocol for transport. Thinks for Q3, should say that
federation protocol is SIP. Agree with Matthew that protocol to Web Server is
to be done by W3C. They should figure out what is best.
15 minutes Offer/Answer architectural text (Cullen Jennings)
1) Need SDP O/A semantics as used by SIP rather than reinvent.
2) will be possible to gateway between legacy SIP devices that support ICE
3) When a new codec is defined, JS API doesn't change.
Matthew asked if you defining what goes on the wire? Cullen answered no.
Matthew agrees with 2) and 3) Thinks client can choose 1)
Suggest SIP as a federation protocol. But, no SIP between Web Server and
Browser. SDP blobs are not enough by themselves. Prefer SIP O/A semantics and
not just RFC 3264.
- able to pass SDP O/A
- indicate context of passing O/A
- Deal with two phase SIP - 180/200
- signal errors in SDP
Summary: need more than just SDP. Needs to be mappable for SIP.
Francois agrees you need more than O/A. If this was all at server to server
I/F, would 100% agree that is all that is need. Concerns over browser-Server
I/F where we shouldn't go out of comfort zone and not mandate an
implementation. Should focus on Server-Server. May not need O/A on wire. Like
now when is calling from my SIP phone to Skype. Cullen responded that he knows
that Skype isn't mappable to SIP directly. Yes, talking about federation
protocol. But, do think that what comes in and out of API, then goes through
magical protocol - that protocol must be simple and MUST be mappable. The 3
interfaces need the information - don't necessarily need SIP O/A on all those
interfaces. Francois thinks this is moving in the right direction.
Keith commented that SIP level doesn't have anything to do with media. Does A
server need to know anything about server B. Do we need feature tags? Cullen
responded that focus has been on info needed at RTP level.
Justin thinks this is the right direction. Need errors, timers etc. for glare,
etc. Concerned about the "much more along these lines" - that's a slippery
slope. Will we pull in more SIP stuff later? Harald commented also think that if
we can push away two phase media commit at API level, that would be good. SIP
has spent a lot of effort getting ? right. Would prefer if we could with a
somewhat simpler model. Cullen responded that the thing he dislikes the most
about the whole SIP discussion is the slippery slope. Obviously I don't have
concreate proposal. We need a proposal where we can go have a detailed debate.
For example on issues like 2-phase commit, which our use case currently have
canned. I am glad to have the discussion, but we need a document that describes
details. Harald: Hurray!
Cullen wants to get into Harald's document "Design Principles" detailed on the
slide. In short, points 1) and 2) needs to be mappable to SIP O/A. Francois
commented that point 1) should be soften and strengthen point 2) a little would
maximize our chances to get things right.
Discussion: Jonathan Lennox asked if DTLS-SRTP and SAVPF is required, or are we
going to have SDP CAPNEG which everyone dislikes, and if the former does
everything else need to go to media G/W? Cullen assumes that direction is that
we want to negotiate secure versus insecure media. Thus we need some
negotiation with limited device and the browser. Accepting that we must interop
with SDP and it is not trivial. Only thing harder is to replace SDP with
something that is simpler that still does what people wants. Everytime that has
been attempted it has failed. And like to persuade the WG to avoid going down
that road, as this is a multi-year effort. And if that happens what the WG
decides will be totally irrelevant as code will have shipped a long time ago.
Kevin Fleming asked if it is that SDP is hard or is some problem space where we
use SDP is hard? Cullen responded that yes, it's probably the latter.
Ted: need more discussion of Matthew's and Cullen's presentations (for 10
minutes after the break)
Ted started up the discussion after the break. There is a trade-off here. The
more we have standardized below the API the simpler the JS code can become.
Yes, the JS-library can attempt to cover such functionality; however they are
commonly frustrating incomplete in functionality. The one place where I am
quite concerned is congestion control. Multiple streams in multiple layers
difficult. If that is the case, how can we have the thin part below the API and
still have the congestion control in the browser. Has anyone thought of this?
Cullen commented that both CC and security function in browser needs to have a
fairly deep understanding of what the application intends. That is likely
easier if the browser selects the RTP parameters. Also the more successful HTML
APIs have been easy to demonstrate for people. Thus the simple stuff needs to
be really simple to accomplish, and the more advanced only possible.
Harald commented about congestion control (CC), that one issue is trust. If CC
is implemented in the JS then we need to trust all the JS authors. If
implemented in browser we need to trust the browser authors. Eric Rescorla
asked if with trust Harald was concerned with malice or incompetence. Harald
responded that any sufficiently advanced incompetence can't be distinguished
from malice. Eric clarified that he was interesting in how strong guarantees
are needed that appropriate congestion control actions are performed? Harald
like the assurance from browser that independently of what mess the JS tries to
create nothing truly bad will happen. Randall Jessup agreed with this. Eric
Rescorla followed that up asking that is CC is implemented in the browser how
is the signaling and response to congestion handled if the JS application is
responsible. Ted commented that he is worried if all JS applications needs this
complicated back and forth handling of events and response selection. Harald
agree. Justin commented that on the positive this is a differentiating factor.
If in browser it is likely we can do it correct, but there will be people that
think they can do it better if they had the possibility. Randall? commented
that there is a connection between application response and congestion control.
But the determination is best done in the browser. Justin agrees, but in for
example a conference scenario then it is the application that needs to make a
decision to drop a stream. Dan York asked on how we define congestion control.
Eric rescorla asked what is required to build a very basic soft phone. He
thinks he understands what needed in the case of Cullen's model. There is very
little JS code and quite slime server side. But has anyone tried out what
Kauffman is proposing? Harald commented that he hasn't seen enough details of
Kauffman proposal to build a site. I certainly can't(?) see how I build a site
fulfilling Cullen's forward compatible principle. Dan York commented that they
are building something that are similar to Kauffman's principles. Cullen
interjected that in his discussion with Kauffman's believe what Dan Y is doing
is a type example of what Kauffman hates. So there might be some confusion
here. Bernard Aboba asked, but that is a JS example, not SIP in the browser?
Dan York followed up saying that there are definitely people building in JS
real-time stuff that has commonalities. Eric Rescorla formulated the two
questions, how much code does oneself as site programmer have to write, and
secondly, how much code does one have to download?
Cullen is wondering if the principles presented are okay. Cullen has written
several prosals, but is not willing to spend more time on writing up an
offer/answer based proposal along the principles if there is sufficient
agreement. I want to make progress.
Parthasarathi, Ravindran? why don't we take SIP as framework, not the
applications and have the browser be an UA. Cullen responded, yet that is
possible. And it appears like all the demos are based on some softphone.
Ted summarized that we gotten a lot of good material and discussion.
Requirements for federation and browser server have clearly made progress. Have
a quick round among the people driving a design principle. Then have two week
consensus calls for the actual principles. Justin requested that if there is
anything the other model can't do, please detail that. Ted agreed and clarified
that in the input to the consensus call such input is highly desirable.
Action Item: Take this up on the consensus call with the signaling design
advocates, present results to mailing list.
Security (1 hr)
Note: new version of the security document
- RTCWEB functionality is too dangerous to enable by default -users must consent
- How do they consent intelligently
- Objective of discussion - work through common cases.
1) Consent issues:
- making long term grants secure
- user expectations
Potential long-term consent security features:
a) live with it
b) require user interaction with browser for all calls
c) require user interaction with browser for new calls
d) require JS to be delivered over HTTPs
2) Authenticating the person you are talking to (suggest this is less important
Discussion: Cullen commented that EKR is phrasing consent in terms of access to
camera. You've captured those options. But, if you frame it as control who
your computer is sending media to and that would lead to a different set of
options. Ekr exemplifies this by going to a poker site and connect to Ted (which
he doesn't now, just a random guy). How do I avoid faked pokerweb to connect me
to some random guy? Cullen responded that it depends on the identity provider
which is pokerweb. It is another issue to separate between the real pokerweb,
and the faked one. Ekr two concerns, first one that even with TLS I can't be
certain that I am talking to the real pokerweb. The second is the question, do
I have any protection against the real pokerweb tapping its user. Cullen thinks
we will need two fundamental authentication processes. Ted asked can you talk
about in terms of assurances rather than UI. Which assurance is being provided
and whether it has been successfully provided. Cullen responded that wasn't
saying there were two assurances - there are two groups that can be selected.
Assurance where you install application like Skype is different than having a
conversation with X and disallowing that thereafter. Ekr responded that after
bulk auth, the amount of protection is quite limited. Not sure how to improve
this without interfering with user. Minimal assurance of sites you authorize
(with HTTPs). Assurance before making a call or assurance when you've made a
call before. Eg., a user been on pokerweb before and already had a call with
two people. If I am willing to only have calls to these then that limits what
the website can do to bug the calls. This of course has UI consequences.
- single call to people that you have no prior relationship
- Conflicting requirements; low-impact, not something users click through, can
we do anything to help here. This is really a W3C issue. -- It was suggested
to do like facetime and mirror the UI.
- Characterization: User doesn't know who they are calling. Don't have
technical means to give this kind of identity. -- Know domain and you need to
figure out if it's connected to people you want to call. -- It requires a user
leap to go from FQDN/user name to the company you're talking to.
API impact of short-term consent:
1) show self picture
2) This implies some level of device access prior to permissions
Suggests this is out of scope.
What about site being visited:
- should top level site get an opinion:
Sohel: Hearing problems - not solutions
Ekr: proposal is to allow JS API for consent
- If you care about this use SRTP
- site can enforce this
- browser can support this - cryptographic continuity
Verifying who you are talking to.
- can't completely eliminate threats from long-term sites:
-- Basic principle: trust but verify
- short term consent is somewhat more secure:
-- likely user will have to give consent
Jonathan asked if interface is for requesting long term or short term consent?
Ekr responded it will be the web sites that decided which to use. The short
term consent is happening as side effect of a JS call. The app can make call
without long term permissions. The long-term consent process is something a
little more heavy weight. Jonathan asked if the user is clear what they are
consenting to. Ekr thinks so.
Ted a question around the short term, when you consent only for this call, some
long running process may be present in a Web application. How do you determine
the length of a call compared to the UI interactions? Does the call end when
you navigate away from tab or does it end when you quite the browser or when
you close the tab? How do we manage the user expectation and consent? Ekr have
no idea what the answer is. Basic requirement is that a user be aware of what
and how many calls are in progress. Randall commented that this is edging into
W3C side. Mozilla will be considering UI may have a Chrome indication that
mic/camera are active. Likely part of the solution. Ted wonders if there is a
need for an additional assurance type? Do I want to have an assurance that
they have access to camera and microphone only when I am on tab containing this
application. Do I have one assurance that is short term as long as this tab
exist, and one that is long term. The assurance models are different. Would be
hard to express to the users. At the same time having different types of
assurances will be valuable. What does other believe? Cullen wants assurance
that whenever end it, it truly ends. Ted stated we don't want to make this part
of JS. Cullen stated that our threat model is that JS is inherently untrusted,
or isn't it? Randell agrees that part of the indication is in the chrome, if
you let end call be part of UI in the apple. If you hit end and you don't see
that visibly in UI, you know it didn't do the right thing. That's not the whole
solution, but parts of it. Ted stated that we don't appear to have a need for
different assurance classes?
Stefan commented we've beyond ending calls, also have recording. And as soon as
you can record and send a file off the user has very little control. Ted
restated this as, Is there any assurance that we would like to provide and is
there any that we can provide, given that the number of participants is known.
Ekr responded to that with yes and yes. There is an big importance between 0
and 1 participants, 1 and N is important, but less than 0 and 1. Browser should
indicate both cases. Ted thought that was a different Q. Eg. I'm sending media
to Stefan who is sharing media with recording service. Can we provide any
assurance, or is simply that when your media left your device and arrived
somewhere they can do anything. Randall responded, we can't really give an
assurance, there's always an OS way to grab the media and forward it. Stefan
stated that even if verified who I am talking to our, your application may also
locally record it send it. Cullen commented we need to differentiate recording
and what can happen to media when going out a speaker versus what a less than
trustworthy JS application can do with the info. Handling the first case is
just one bar to far. Something we can't deal with. Magnus tried to reinforce
Stefan's point that local recording is possible and then can ship data off
machine from JS.
Terminology Mapping (30 minutes)
Mapping WebRTC constructs to RTCWeb terms (Magnus Westerlund)
o Multi-media session versus RTP session
o RTP related terminology
o WEBRTC API terminology (MediaStream object, MediaStream Track, Label,
Discussion: MediaStream and Label
- MediaStream track can be mapped to a SSRC in RTP session
- MediaStream track has a synchronization context that can be represented by
CNAME - a MediaStream sent by a PeerConnection can be presented by a list of
RTP session SRTP tuples - The MediaSTream label has no matching construct: --
SDP a=label attribute labels RTP sessions, not a set of SSRCs in possibly
several RTP sessions. -- label can't be CNAME
Discussion: Harald haven't been able to track down a requirement for mediaStream
label Magnus responded can have the same media source appear in multiple
mediaStreams, which makes CNAME dubious Harald commented that multiple copies of
same mediastream track used to send media in different directions. That means
that it doesn't matter whether they have the same or different CNAMES because
they won't end up at same receiving entity. Cullen commented that these are in
different MultiMedia sessions from RTP view. Harald concluded that he don't see
use case. Magnus responded that he have a use case where you might want to
maintain CNAME across multiple sessions (next slide).
Justin asked what do you expect application to do with mediastream label?
Cullen responded with a use case where a video device is getting 3 streams.
Have been told in signaling JS that you're getting Alice, Bob and Charlie -
need to know how to map to media stream. Stefan added that an end-point can
have multiple cameras. Cullen remarked maybe we need a label for a track and
not one for a mediastream. Mediatracks need ssrc and cname. Harald disagreed.
If media stream represents all that need to be coordinated. Need a label to
refer to that construct. Magnus directed attention to the mixer case. Where a
likely to receive a mixed audio stream, and several video streams. Harald
responded that if sending audio and video, need two things, need you to know
they belong to me and that they need to be synchronized which is CNAME. Why do
you need another construct? How else can you refer to it? Need to map to an
identifier to figure out the actual stream. Justin commented that you only need
to map to a track. Stefan commented the model on application side has been to
deal with stream. Justin responed that we have identified places where this
breaks down - mapping to track and not stream provides more flexibility. Stefan
thinks it makes it more complex. Cullen a good example: two camera views with
Harald and both synced to one audio. Want to synchronize all three and
different two camera views. Not proposing a solution, but this seems to be use
Ekr asked what is application model? Either I'm in Matthew's world with a JS
handle to both or in SDP browser world. Cullen responded we are talking about
which JS handle that flows across the interfaces. Ekr wonder where do you
envision this being signaled. In SDP according to Justin. Ekr asked what are
you trying to say. Justin clarified that the application is asking for the wide
angle shot, or the zoomed in. Cullen added that we are already receiving both.
Want to display the wide on right hand and zoomed on left hand. Justin that
works as long as you have video track tags and use url to get. Ekr was not
clear on whether application does this locally or if this sent across wire.
Justin and Stefan agreed that it is local operations, but Stefan added that you
need information from the other side. Ekr commented that one have a JS object,
why do I need another label? Cullen refered how does incoming RTP packet get
sent to right JS object is the question. Justin remarked by the SSRC. Colin
Perkins you would need same CNAME for synchronization. But, can easily define a
new identifier/label if we need one and put it in RTCP.
Conclusion: Discussion to continue on mailing list.
Congestion Control (10 minutes) Harald Alvestrand
Randall is in favor of this. Need to run congestion control across all streams,
not just RTP. Harald remarked that he agrees and Google are experimenting with
joint control, but the RTP identifiers aren't setup to make this easy. Justin
remarked that we have continuous (likey to fill the pipe) as well as
discontinuous traffic. What kind of assurance can you give users that their
data traffic makes it? We likely need something to help their traffic getting
through. Magnus commented, that it depends on what timescale you want to react.
If one at least frame-level you can throttle back your video encoder to make
room for the data. Justin, if media is running without saturating the pipe and
the application dumps 300k of data, then it will be an RTT before we know if
there was a problem or not. Randall, we likely need something like TCP slow
start. The data is clearly asynchronous and may not be a new connection. We
need something the details are the question.
Cullen commented that in use case where data was much less than video, then no
issue with data. As the spike from an I-frame is much larger than some single
data packet. May be easier to solve this in some cases - e.g., low data rates.
Tim Terriberry remind that a proposed use case was to do direct file transfer
without relay to the server. Justin responded this argues for reliable data
channel. Randall noted that the JS application may need to give indication of
prioritization and e.g., percent of channel devoted to data. Justin agreed may
need to request a reservation to send more data. Agree with Cullen's remark,
and that implies one should avoid I-frames.
Colin there is a missing requirement to not impact media flow. Issue is that
what is good for the network, like slow start is bad for the media. Randell
commented there are cases where timely delivery versus media quality is more
important. This is an application decision. Justin agrees that QoS is something
we should let applications to set.
Ted concluded that draft-alvestrand-rtcweb-congestion-00 is not yet ready for
WG adoption. Wants Harald to take this forward. Harald wants someone else to
take on the task to write requirements for congestion. This document just
documents what Google is doing. Randall was willing to cooperate on a draft.
Conclusion: Randall will write up a first pass at requirements (along with
Harald) and discuss on the ML.