Audio/Video Transport J. Lennox
Internet-Draft H. Schulzrinne
Expires: June 1, 2005 J. Nieh
R. Barrato
Columbia U.
December 2004
Protocols for Application and Desktop Sharing
draft-lennox-avt-app-sharing-00
Status of this Memo
This document is an Internet-Draft and is subject to all provisions
of section 3 of RFC 3667. By submitting this Internet-Draft, each
author represents that any applicable patent or other IPR claims of
which he or she is aware have been or will be disclosed, and any of
which he or she become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on June 1, 2005.
Copyright Notice
Copyright (C) The Internet Society (2004).
Abstract
This document defines several protocols to support accessing general
graphical user interface (GUI) desktops and applications remotely,
either by a single remote user or embedded into a multiparty
conference. The protocols are designed to allow sharing of, and
access to general windowing system applications that are not
Lennox, et al. Expires June 1, 2005 [Page 1]
Internet-Draft Application and Desktop Sharing December 2004
expressly written to be accessed remotely.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Protocol Components . . . . . . . . . . . . . . . . . . . 5
4. Common Protocol Elements . . . . . . . . . . . . . . . . . . . 6
5. Output Protocols . . . . . . . . . . . . . . . . . . . . . . . 6
5.1 Window Identifiers and Output Meta-Format . . . . . . . . 6
5.2 Window State Protocol . . . . . . . . . . . . . . . . . . 7
5.3 Window Pixel Data . . . . . . . . . . . . . . . . . . . . 9
5.4 Pointer Representation . . . . . . . . . . . . . . . . . . 10
6. Input Protocols . . . . . . . . . . . . . . . . . . . . . . . 10
6.1 Keyboard Input . . . . . . . . . . . . . . . . . . . . . . 10
6.2 Pointer Position . . . . . . . . . . . . . . . . . . . . . 15
7. Implementation Notes . . . . . . . . . . . . . . . . . . . . . 16
8. Open issues . . . . . . . . . . . . . . . . . . . . . . . . . 16
9. Security Considerations . . . . . . . . . . . . . . . . . . . 18
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . 18
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 18
11.1 Normative References . . . . . . . . . . . . . . . . . . . . 18
11.2 Informative References . . . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 20
Intellectual Property and Copyright Statements . . . . . . . . 22
Lennox, et al. Expires June 1, 2005 [Page 2]
Internet-Draft Application and Desktop Sharing December 2004
1. Introduction
While two-party and multi-party conferencing using standards-based
protocols is now common and well-developed, protocols for sharing
applications are largely proprietary or based on the aging T.120 [8]
suite of protocols. In this document, we define a set of protocols
for application and desktop sharing.
We note that there are large similarities between remote access to an
application ("remote desktop") and by multiple users sharing an
application within a collaboration setting such as a multimedia call
or multiparty conference. The protocols defined in this document
therefore support both.
Remote access differs from video transmission of the sort for which
most video encodings have been designed. In particular, screen
encoding may need to be lossless and typically operates on artificial
rather than natural (photographic) video input. The video input is
characterized by large areas of the screen that remain unchanged for
long periods of time, while others change rapidly. (However,
rendering the output of a modern computer-generated animation
application such as video games blurs the distinction between
traditional motion video output and screen sharing.)
Unlike earlier systems, such as T.120, we believe that application
sharing should be integrated into the existing IETF session model,
encompassing session descriptions using the Session Description
Protocol (SDP) [1] or successors and the Session Initiation Protocol
(SIP) [9]. Application sharing needs many of the same control
functions as other multimedia sessions, such as address binding and
session feature and media negotiation. We believe that use of the
session model is also beneficial for the remote desktop case, as it
allows to re-use many of the well-developed session components and
easily supports hybrid models, such as the delivery of desktop audio
to the remote user.
Remote access to graphical applications and desktops, as defined in
this document, has two important characteristics. First, the access
protocol is unaware of any semantic characteristics of the
applications being shared; it only transmits the visual
characteristics of the windows. This is different, therefore, from
shared-drawing or shared-editing tools that allow distributed
modification of documents. Secondly, the protocol is designed to
work with applications which were not written to be used remotely, by
intercepting or simulating their connections to their native window
systems. In this way, it is distinguished from systems such as the X
Window System [10], which allow natively-written applications to be
displayed on remote viewers.
Lennox, et al. Expires June 1, 2005 [Page 3]
Internet-Draft Application and Desktop Sharing December 2004
We distinguish between local and remote users. Local users employ
normal operating system mechanisms to interact with the running
application. Remote users interact via the delivery protocols
described here.
The application sharing problem can be divided into four components:
(1) setting up a session to the node running the application, (2)
transporting user input events from the remote viewers such as
conference participants to the application, (3) delivering screen
output from the application to the participants, (4) moderating
access to shared human interface devices such as pointing devices
(e.g., mice, joystick, trackball) and text input (keyboard). We
refer to components (2) and (3) as the "remoting protocol". They are
the focus of this document, and are described in Section 6 and
Section 5 respectively.
Session negotiation and description can be provided by existing
session setup protocols; user input access can be moderated by a
floor control protocol. Thus, these two components are beyond the
scope of this document, although they are important for an acceptable
overall user experience.
Applications are more than just windows; they are a stack of related
windows which serve the same task and are usually associated with the
same process on the server. Some applications impose special
constraints on the user input, e.g., through modal dialogs, which
temporarily exclusively acquire input focus, and floating
(always-on-top) windows.
The protocols described in this document are intended to fulfil the
requirements described in the Internet-Draft Sharing and Remote
Access to Applications [11].
The rest of this document is laid out as follows. Section 2 defines
the common terminology for normative references. Section 3 gives an
overview of the protocol's architecture and components. Section 4
defines common elements of the output and input protocols, which are
then further described in Section 5 and Section 6 respectively.
Section 7 gives implementation notes, and Section 8 discusses open
issues with the design of the protocol. Finally, Section 9 discusses
security considerations, and Section 10 gives IANA considerations.
2. Terminology
In this document, the key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" are to be interpreted as described in RFC 2119 [2] and
indicate requirement levels for compliant implementations.
Lennox, et al. Expires June 1, 2005 [Page 4]
Internet-Draft Application and Desktop Sharing December 2004
3. Overview
3.1 Architecture
Application and desktop sharing consists of two classes of
components: "viewers" and "application hosts". Viewers receive
remote graphics and provide input. Application hosts receive input
from local and remote users, and host and transmit applications and
graphics.
The application and desktop sharing models defined in this document
are integrated into the IETF conferencing model. In particular, the
Session Initiation Protocol (SIP) [9] is used to intiate and control
remote access. This allows the use of existing SIP mechanisms for
confidentiality, authentication and authorization, user location,
conferencing, etc.
In the IETF conferencing model media sessions can consist of multiple
participants; this document's protocols are designed to work in this
case. The various Centralized Conferencing (XCON) [12] control
protocols can be used for floor control, to determine which member of
a conference is permitted to send input to an application host at any
given time. Conferencing also gives rise to issues of late-joiners;
the protocol is designed to make it relatively easy for a protocol
relay, which receives input from one application host and forwards it
to multiple viewers, to send all the necessary information about a
sharing session to new conference arrivals. Similarly, it is
possible to record and replay a shared application session without
semantic awareness of the protocol.
3.2 Protocol Components
The three core components of desktop sharing are: input protocols
which represent user input from keyboards or pointing devices such as
mice, trackballs, or touchscreens; an output protocol which can
represent screen pixels and related data; and a negotiation mechanism
which can convey attributes of the session such as the desired size
of the screen. In addition, application sharing requires a mechanism
to represent window state, position and stacking. In this document,
the negotiation mechanism is defined in Section 4; output protocols,
including window-state handling, are defined in Section 5; and input
protocols are defined in Section 6.
Additional, optional mechanisms can enhance both window and
application sharing. Additional input mechanisms such as joysticks
or other game controllers can be supported; audio streams can be
associated with a desktop or application; viewer-side scaling and
porthole requests can be used to optimize transmission of data to
Lennox, et al. Expires June 1, 2005 [Page 5]
Internet-Draft Application and Desktop Sharing December 2004
viewers with a small screen; and it is often useful to allow
copy-and-paste between applications running on a viewer and those
running on an an application host. This document does not define any
such extensions; they may be defined elsewhere.
4. Common Protocol Elements
Protocol negotiation is carried out using the Session Description
Protocol (SDP) [1], while all input and output protocols run over the
Real-Time Protocol (RTP) [3]. In most use cases for application and
desktop sharing, reliability is more important than latency, and flow
control and dynamic bandwidth adjustment are crucial. As such,
viewers and application hosts SHOULD use RTP Framing [4] to send the
RTP packets over TCP, unless there is a strong reason, such as the
need to distribute a desktop session over multicast, to do otherwise.
5. Output Protocols
5.1 Window Identifiers and Output Meta-Format
A shared application consists of a set of overlapping windows,
usually rectangular. Each window needs a unique identifier, and most
output data (other than audio or other non-visual output mechanisms
not specified here) needs to be associated with a particular window.
Windows in an application need to be created and destroyed relatively
frequently. Re-negotiating SDP descriptions whenever a window is
opened or closed would therefore not be practical, and so data for
multiple windows needs to be multiplexed into a single RTP stream.
Since multiple windows are from a single source, multiplexing on the
RTP SSRC (synchronization source) field would not be appropriate.
Instead, each window is assigned a unique window identifier.
Rather than require each payload type to define a field to carry this
window identifier we instead define a payload meta-format which
precedes each payload. The meta-format carries the relevant window
identifier and coordinates, and is then followed by the actual data
for the particular payload being sent. (This allows existing payload
type definitions to be re-used in the context of a window.)
This protocol has three mandatory format-specific parameters, which
are carried in an SDP "a=fmtp:" parameter. The parameter "height"
indicates the desktop height in pixels; "width" indicates the desktop
width in pixels. All images and other coordinates sent for this
protocol must lie within these boundaries. The third parameter is
"mode", which can take the value "desktop", indicating one big
drawing pane, or "application", indicating that individual windows
will created and destroyed as needed, and all drawing will occur in
Lennox, et al. Expires June 1, 2005 [Page 6]
Internet-Draft Application and Desktop Sharing December 2004
individual windows.
In application mode, SDP "i=" lines for this protocol SHOULD contain
a human-readable description of the application.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. .
. RTP header .
. .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| X Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Y Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Window ID | MBZ | PT |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. .
. Payload .
. .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: RTP putput payload meta-protocol
Figure 1 shows how all output data formats are defined. The "X
Offset" and "Y Offset" fields indicate the upper-left corner of the
area the message is describing. (In desktop mode, these coordinates
are relative to the desktop; in application mode, these coordinates
are relative to the enclosing window, except for window state
protocol messages.) The Window ID is a 16-bit unique identifier for
each window; these are assigned arbitrarily by the application host,
and SHOULD be recycled on a least-recently-used basis. If the
meta-protocol is running in desktop mode, the Window ID MUST always
be set to zero by the server and SHOULD be ignored by viewers. In
application mode, Window IDs are defined by the window state
protocol, defined in Section 5.2. Messages other than Window State
Protocol messages which reference unknown Window IDs SHOULD be
ignored. The PT indicates the actual payload type of the rest of the
data. The MBZ bits must be zero.
Fields of the RTP header other than the PT field are set as
appropriate for the enclosed payload type; the meta-protocol does not
define any specific uses for them.
5.2 Window State Protocol
The window state protocol is an output protocol which handles the
Lennox, et al. Expires June 1, 2005 [Page 7]
Internet-Draft Application and Desktop Sharing December 2004
creation, destruction, resizing, raising and lowering, positioning,
and characteristics of application windows when the protocol is being
run in application mode. It MUST NOT be used in desktop mode.
The specific details and packet format of this protocol are not yet
defined. The rest of this section describes it at a high level.
Creating, moving, resizing, and raising or lowering a window are
indicated by the same message. This message starts with the common
meta-protocol header, whose coordinates indicate the position of the
window relative to the desktop. Following this is a code indicating
that this is a "window state" message, and the window's X and Y sizes
as 32-bit integers. Finally, the message contains a list of all the
application's windows, in Z order, bottom to top. (Listing the
entire Z order in every message helps prevent the Z order list from
getting out of sync between viewers and the application host.) A flag
in the Z order list can indicate that some windows should be
considered "always on top", and float above all non-"on top" other
windows on the viewer (shared or not).
The size and position given for a window includes all the "trim"
provided by its window manager -- title bars, frames, and the like.
(Otherwise, the remoting protocol would need to include input and
output messages to indicate window state changes and manipulation.)
Non-rectangular and translucent windows can additionally have an
alpha channel specified. This is sent as a black-and-white or
grayscale PNG [5] image corresponding to the window's transparent or
translucent pixels. Window alpha channels can change dynamically.
Window removal consists of the meta-protocol header followed by a
"window remove" code. On receipt of this message, a viewer erases
the corresponding window and removes it from the Z order list. The X
and Y coordinates given in the meta-protocol are ignored and SHOULD
be zero.
An additional message indicates the "pointer capture", in which a
window indicates that it should exclusively receive all pointer
events until it indicates otherwise. This is necessary when menus
are pulled down, for example; a window with a pulled-down menu
receives a "release menu" mouse click whether or not it the cursor is
still over the original window. "Stop pointer capture" is the same
message, with a flag set.
The window state protocol may need to carry additional information,
as well; see the open issues list in Section 8.
Lennox, et al. Expires June 1, 2005 [Page 8]
Internet-Draft Application and Desktop Sharing December 2004
5.3 Window Pixel Data
There are three basic window data operations: pixel images, fills,
and block copies. Each operation uses the common meta-protocol
header. Each operation has its own MIME type, and thus a unique RTP
payload type in the meta-protocol PT field.
Pixel images contain arbitrary graphical data to be applied to
windows. They are conveyed as PNG [5] images. The PNG image follows
the meta-protocol header (which indicates the offset of the image
within the window or desktop) and consists of an area of the screen
to be updated. If the PNG image contains an alpha channel, the image
is composited with the existing contents of the window or desktop.
(The PNG images defining the initial graphical contents of a window
or desktop MUST NOT contain alpha channels.) In window mode, if a
window has an alpha channel with completely transparent pixels --
i.e., if a window is non-rectangular -- the corresponding pixels in
the PNG image are ignored.
As an optimization, two additional window data operations are
defined. A fill defines an area of a window to be filled by a single
solid color. Following the meta-protocol header, it consists of a
height and width (specified as 32-bit coordinates), followed by the
fill color. Colors are specified as one byte each of Red, Green, and
Blue, i.e. as PNG color type 2 with 8-bit sample depth. (Color
sample depths greater than 8 bits per channel cannot be spsecified
with the fill operation, and must use the general PNG pixel image
form.)
The block copy operation copies a region of a desktop or window from
one position to another. Following the meta-protocol header (which
indicates the destination position) are the source position and size
(both as 32-bit coordinates). The destination region MAY overlap the
source region. Both the source and destination regions MUST NOT
extend beyond the boundaries of the window (in window mode) or
desktop (in desktop mode). In window mode, cross-window moves are
not supported. Portions of the source region which do not overlap
with the destination region remain unmodified.
Additionally, if the viewer and application host negotiate support
for other video/* MIME types, video streams can be sent following the
meta-protocol header. For video this will often be more efficient
than sending raw screen images.
In window mode, graphics to be drawn MUST NOT extend beyond the
boundaries of the window; in either mode, images to be drawn MUST NOT
extend beyond the defined borders of the desktop. (Window-related
images such as drop-down menus or tooltips which can extend beyond
Lennox, et al. Expires June 1, 2005 [Page 9]
Internet-Draft Application and Desktop Sharing December 2004
the boundaries of a window SHOULD be transmitted as separate
windows.)
5.4 Pointer Representation
For efficiency, pointers can be represented separately from other
window data. This is accomplished by transmitting, in a special
protocol, PNG [5]s with alpha channels and hotspots for the pointers'
images, and then RFC 2862 [6] streams to indicate pointers'
positions. These protocols still need to be defined in detail.
6. Input Protocols
6.1 Keyboard Input
The viewer represents keyboard input to the server by sending a list
of depressed keys, updated whenever this state changes. (Note that
this is unlike how keys are represented in most window systems, which
instead use individual key-down and key-up events. The latter can be
derived from the former.) Key repetition is handled by the
application host.
There are two types of keys that can be represented. "Encoding keys"
are keys that encode a specific Unicode [7] character, whereas
"virtual keys" do not. Encoding keys are indicated by the Unicode
value of the character they encode.
Virtual keys are indicated by codes from the tables of virtual
keycodes listed in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6,
Figure 7, Figure 8, and Figure 9. These codes are those of the X
Window System [10], taken from the header file <X11/keysymdef.h>.
They are all of X's keycodes with 0xFFnn values, except for the
XK_KP_* keypad keys.
Unicode control characters -- characters in the range 0x0000 - 0x001f
or 0x007f - 0x009f -- MUST NOT be sent. Instead, the corresponding
virtual key, or the virtual key Control (Left) or Control (Right)
(0xE3 or 0xE4) plus a Unicode codepoint, should be used.
Lennox, et al. Expires June 1, 2005 [Page 10]
Internet-Draft Application and Desktop Sharing December 2004
Name Code Note
---- ---- ----
Backspace 0x08 Back space, back char
Tab 0x09
Linefeed 0x0A Linefeed, LF
Clear 0x0B
Return 0x0D Return, enter
Pause 0x13 Pause, hold
Scroll Lock 0x14
Sys Req 0x15
Escape 0x1B
Delete 0xFF Delete, rubout
These codes have been chosen to map to ASCII, for convenience of
programming, but could have been arbitrary (at the cost of lookup
tables in viewer code).
Figure 2: Virtual keycodes: teletype keys
Name Code Note
---- ---- ----
Multi key 0x20 Multi-key character compose
Code Input 0x37
Single Candidate 0x3C
Multiple Candidate 0x3D
Previous Candidate 0x3E
Figure 3: Virtual keycodes: international and multi-key character
composition
Lennox, et al. Expires June 1, 2005 [Page 11]
Internet-Draft Application and Desktop Sharing December 2004
Name Code Note
---- ---- ----
Kanji 0x21 Kanji, Kanji convert
Muhenkan 0x22 Cancel Conversion
Henkan Mode 0x23 Start/Stop Conversion
Romaji 0x24 to Romaji
Hiragana 0x25 to Hiragana
Katakana 0x26 to Katakana
Hiragana/Katakana 0x27 Hiragana/Katakana toggle
Zenkaku 0x28 to Zenkaku
Hankaku 0x29 to Hankaku
Zenkaku/Hankaku 0x2A Zenkaku/Hankaku toggle
Touroku 0x2B Add to Dictionary
Massyo 0x2C Delete from Dictionary
Kana Lock 0x2D Kana Lock
Kana Shift 0x2E Kana Shift
Eisu Shift 0x2F Alphanumeric Shift
Eisu Toggle 0x30 Alphanumeric toggle
Kanji Bangou 0x37 Codeinput
Zen Koho 0x3D Multiple/All Candidate(s)
Mae Koho 0x3E Previous Candidate
Note that some of these codes are also used for equivalent Hangul
keyboard keys listed in Figure 9.
Figure 4: Virtual keys: Japanese keyboard support
Name Code Note
---- ---- ----
Home 0x50
Left 0x51 Move left, left arrow
Up 0x52 Move up, up arrow
Right 0x53 Move right, right arrow
Down 0x54 Move down, down arrow
Prior 0x55 Prior, previous
Page Up 0x55
Next 0x56 Next
Page Down 0x56
End 0x57 EOL
Begin 0x58 BOL
Figure 5: Virtual keycodes: cursor control and motion
Lennox, et al. Expires June 1, 2005 [Page 12]
Internet-Draft Application and Desktop Sharing December 2004
Name Code Note
---- ---- ----
Select 0x60 Select, mark
Print 0x61
Execute 0x62 Execute, run, do
Insert 0x63 Insert, insert here
Undo 0x65 Undo, oops
Redo 0x66 Redo, again
Menu 0x67
Find 0x68 Find, search
Cancel 0x69 Cancel, stop, abort, exit
Help 0x6A Help
Break 0x6B
Mode switch 0x7E Character set switch (*)
Num Lock 0x7F
(*) The "Mode switch" key is variously used on Katakana, Arabic,
Greek, Hebrew, and Hangul keyboards to switch between the Roman and
native alphabets.
Figure 6: Virtual keycodes: miscellaneous functions
Name Code
---- ----
F1 0xBE
F2 0xBF
F3 0xC0
F4 0xC1
F5 0xC2
F6 0xC3
F7 0xC4
F8 0xC5
F9 0xC6
F10 0xC7
F11/L1 0xC8
F12/L2 0xC9
F13/L3 0xCA
F14/L4 0xCB
F15/L5 0xCC
F16/L6 0xCD
F17/L7 0xCE
F18/L8 0xCF
F19/L9 0xD0
F20/L10 0xD1
F21/R1 0xD2
F22/R2 0xD3
F23/R3 0xD4
Lennox, et al. Expires June 1, 2005 [Page 13]
Internet-Draft Application and Desktop Sharing December 2004
F24/R4 0xD5
F25/R5 0xD6
F26/R6 0xD7
F27/R7 0xD8
F28/R8 0xD9
F29/R9 0xDA
F30/R10 0xDB
F31/R11 0xDC
F32/R12 0xDD
F33/R13 0xDE
F34/R14 0xDF
F35/R15 0xE0
Sun keyboards and a few other manufacturers have additional Left and
Right function key groups on the left and/or right sides of the
keyboard.
Figure 7: Virtual keycodes: auxiliary functions
Name Code Note
---- ---- ----
Shift (Left) 0xE1
Shift (Right) 0xE2
Control (Left) 0xE3
Control (Right) 0xE4
Caps Lock 0xE5
Shift Lock 0xE6
Meta (Left) 0xE7 Windows (Microsoft); Option (Macintosh)
Meta (Right) 0xE8 Windows (Microsoft); Option (Macintosh)
Alt (Left) 0xE9 Command (Macintosh)
Alt (Right) 0xEA Command (Macintosh)
Super (Left) 0xEB
Super (Right) 0xEC
Hyper (Left) 0xED
Hyper (Right) 0xEE
Application hosts which lack right-hand versions of modifiers SHOULD
treat them as though the left-hand version had been received.
Figure 8: Virtual keys: modifiers
Lennox, et al. Expires June 1, 2005 [Page 14]
Internet-Draft Application and Desktop Sharing December 2004
Name Code Note
---- ---- ----
Hangul 0x31 Hangul start/stop(toggle)
Hangul Start 0x32 Hangul start
Hangul End 0x33 Hangul end, English start
Hangul Hanja 0x34 Start Hangul->Hanja Conversion
Hangul Jamo 0x35 Hangul Jamo mode
Hangul Romaja 0x36 Hangul Romaja mode
Hangul Code Input 0x37 Hangul code input mode
Hangul Jeonja 0x38 Jeonja mode
Hangul Banja 0x39 Banja mode
Hangul Pre-Hanja 0x3A Pre Hanja conversion
Hangul Post-Hanja 0x3B Post Hanja conversion
Hangul Single Candidate 0x3C Single candidate
Hangul Multiple Candidate 0x3D Multiple candidate
Hangul Previous Candidate 0x3E Previous candidate
Hangul Special 0x3F Special symbols
Figure 9: Virtual keys: Hangul (Korean)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V|P| MBZ | Unicode codepoint or virtual keycode |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 10: One entry in the keyboard state list
Figure 10 illustrates the format of entries in the keyboard state
list. If the "V" bit is set, it indicates that the codepoint number
indicates a virtual keycode, otherwise it indicates a Unicode
codepoint. If the "P" bit is set, it indicates that it is the
version of a key from a separate keypad. If the server does not have
a concept of a separate keypad, or the key indicated does not appear
on a keypad, the "P" bit MAY be ignored.
The highest Unicode codepoint is 0x10ffff; thus, all Unicode
characters fit comfortably in the 24-bit codepoint field of the
keycode format. For non-virtual, non-keypad keys, for which the V
and P bits are both zero, the format is identical to a UTF-32BE
encoding of the same character. (The format can, thus, be considered
alternatively as UTF-32BE bitwise-or'd with 0x80000000 for "V" and
0x40000000 for "P".)
6.2 Pointer Position
The viewer indicates pointer position to the application host using
Lennox, et al. Expires June 1, 2005 [Page 15]
Internet-Draft Application and Desktop Sharing December 2004
the media type video/pointer, defined in RFC 2862, the RTP Payload
Format for Real-Time Pointers [6].
7. Implementation Notes
Application hosts shouldn't blindly send every screen update they
receive down the RTP channel. Instead, they should monitor the state
of their TCP transmission buffers (through mechanisms such as the
select() command) and only send the most recent screen data when
there is not a backlog. This will prevent screen latency for
rapidly-changing images, when a viewer usually only needs to see the
final state of the image.
To conserve bandwidth, application hosts SHOULD use PNG's palette or
grayscale image format wherever possible, and SHOULD use a minimal
palette and image bit depth, subject to encoding delay constraints.
In particular, two-color images, and one-color images drawn over the
existing image, SHOULD use a one-bit PNG with a two-entry palette, in
the latter case with a transparency chunk.
In window mode, application hosts SHOULD be aware of unshared local
windows on the host. If an unshared window obscures a shared window,
the application host SHOULD obscure its contents (through a mechanism
such as transmitting a neutral color) so that the viewer experience
reflects as closely as possible the experience on the host.
Application hosts MAY choose to treat portions of windows obscured by
other shared windows the same way.
8. Open issues
We need to determine what mechanism to recommend to secure the input
and output streams. The two logical possibilities would be Secure
RTP [13] and Transport-Layer Security (TLS) [14]. Either would work;
neither is currently specified for connection-oriented RTP (neither
TCP/RTP/SAVP nor TCP/TLS/RTP/AVP is defined). One or the other ought
to be recommended to facilitate interoperability.
It seems likely that "beep" needs to be defined specially, as an
output type, and needs to be defined separately from other audio
channels. Many systems allow beeps to be rendered visually, either
for accessibility for the deaf or because systems are being used in
quiet environments.
We need a name for the meta-protocol of Section 5.1. It's also
unclear whether it should have an application/* or video/* MIME type.
SDP doesn't normally allow you to send traffic with different
top-level MIME types over the same RTP channel. Do we need to add an
Lennox, et al. Expires June 1, 2005 [Page 16]
Internet-Draft Application and Desktop Sharing December 2004
extension to work around this? Some of the RTP payloads described in
this document should pretty clearly be application/*, some should be
video/*, some (beeping) audio/*, PNG is already defined as image/png,
etc.
All the payload types need MIME-type assignments and RTP payload
characteristics (sample rate, use of marker bit, etc.) defined.
Is there anything useful that can be done with the MBZ bits of the
meta-protocol?
What other information needs to be carried in the window state
protocol? One particular concern is taskbar support. Window
information that might be carried to support taskbars includes the
window title, a list of minimized windows, and whether each window
should be listed in the taskbar. Taskbar support also requires an
input protocol to support taskbar actions (right-clicking on a
taskbar item): unminimize, maximize, close, etc.
Another flag that might be carried by the window state protocol is
whether a window is "taggable", i.e. whether it is a good candidate
for a viewer-side marker indicating that this is a shared
application. Top-level windows would typically get this, while
subsidiary windows such as dialog boxes would not. (This is a
feature of T.128.)
Do we need to define what an un-drawn-upon window looks like? T.128
seems to assume that windows are transparent until drawn to, and uses
this fact: a server can define a full-screen window on top of all
others, and draw directly to it. I don't think this is necessary,
but it's an important consideration.
Should pointer images be window-associated? Should there be a
pointer image cache? (T.128 has one; VNC doesn't.)
RFC 2862 only supports 12-bit positioning for the mouse pointer. I
believe this is already too small; screens wider than 4096 pixels
already exist, especially virtual desktops. Additionally, we want to
support additional mouse information, most notably mousewheels. A
protocol obsoleting RFC 2862 is probably in order.
Do we want to support a mechanism by which viewers can request a full
screen refresh, analogous with RFC 2032 [15]'s Full Intra-frame
Request (FIR) RTCP packet?
One other optimized pixel transmission operation that could be used
is the tiling operation: transmit one image and a number of times it
should be repeated horizontally and vertically. Would this be worth
Lennox, et al. Expires June 1, 2005 [Page 17]
Internet-Draft Application and Desktop Sharing December 2004
the bandwidth/complexity tradeoff? (Note that it can be emulated
without too much overhead by the copy operation.)
Is it necessary for viewers and application hosts to be able to
negotiate the maximum supported size and color depth of pointers?
This would presumably be a format parameter on the pointer image
representation payload type.
Do we want to support multiple payloads in one RTP packet, either
reusing or inspired by RFC 2198 [16]?
How should "lock" key state (caps lock, num lock, scroll lock) be
represented in the keyboard input protocol? Should there be flags of
some sort? Should there be separate virtual keycodes for "lock key
depressed" and "lock state enabled"? Should this simply be handled
on the viewer side, altering the keycodes that are sent? (This works
for caps and num lock, not so well for scroll lock.)
9. Security Considerations
Both input and output data may be highly sensitive. For example,
input data may contain user passwords. Thus, encryption of all user
input is likely to be required. For some applications, such as
sharing slides during a public lecture, confidentiality for user
output may not be required. Given the broad set of applications,
viewers and application hosts MUST support or be able to leverage
end-to-end confidentiality and integrity protection mechanism.
Application sharing inherently exposes the shared applications to
risks by malicious participants. They may, for example, access
resources beyond the application itself, e.g., by installing or
running scripts. It may be difficult to constrain access to specific
user data, e.g., a specific set of slides, unless the user
application can be sandboxed or run in some kind of "jail", with the
sandbox control outside the view of the remoting protocol.
10. IANA Considerations
TODO; MIME type definitions for everything.
11. References
11.1 Normative References
[1] Handley, M., Jacobson, V. and C. Perkins, "SDP: Session
Description Protocol", draft-ietf-mmusic-sdp-new-21 (work in
progress), October 2004.
Lennox, et al. Expires June 1, 2005 [Page 18]
Internet-Draft Application and Desktop Sharing December 2004
[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[3] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
"RTP: A Transport Protocol for Real-Time Applications", STD 64,
RFC 3550, July 2003.
[4] Lazzaro, J., "Framing RTP and RTCP Packets over
Connection-Oriented Transport",
draft-ietf-avt-rtp-framing-contrans-03 (work in progress), July
2004.
[5] Duce, D., "Portable Network Graphics (PNG) Specification (Second
Edition)", W3C REC REC-PNG-20031110, November 2003.
[6] Civanlar, M. and G. Cash, "RTP Payload Format for Real-Time
Pointers", RFC 2862, June 2000.
[7] International Organization for Standardization, "Information
Technology - Universal Multiple-octet coded Character Set
(UCS)", ISO Standard 10646, December 2003.
11.2 Informative References
[8] International Telecommunication Union, "Data Protocols for
Multimedia Conferencing", ITU-T Recommendation T.120, July
1996.
[9] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP:
Session Initiation Protocol", RFC 3261, June 2002.
[10] Scheifler, R., "X Window System Protocol", X Consortium
Standard X Version 11, Release 6.7, November 2004.
[11] Schulzrinne, H., "Sharing and Remote Access to Applications",
draft-schulzrinne-mmusic-sharing-00 (work in progress), October
2004.
[12] Barnes, M. and C. Boulton, "A Framework for Centralized
Conferencing", draft-barnes-xcon-framework-00 (work in
progress), October 2004.
[13] Baugher, M., "The Secure Real-time Transport Protocol",
draft-ietf-avt-srtp-09 (work in progress), July 2003.
[14] Lennox, J., "Connection-Oriented Media Transport over the
Transport Layer Security (TLS) Protocol in the Session
Lennox, et al. Expires June 1, 2005 [Page 19]
Internet-Draft Application and Desktop Sharing December 2004
Description Protocol (SDP)", draft-ietf-mmusic-comedia-tls-02
(work in progress), October 2004.
[15] Turletti, T., "RTP Payload Format for H.261 Video Streams", RFC
2032, October 1996.
[16] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley,
M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP
Payload for Redundant Audio Data", RFC 2198, September 1997.
[17] International Telecommunication Union, "Multipoint Application
Sharing", ITU-T Recommendation T.128, February 1998.
Authors' Addresses
Jonathan Lennox
Columbia University Department of Computer Science
450 Computer Science
1214 Amsterdam Ave., M.C. 0401
New York, NY 10027
US
Phone: +1 212 939 7018
EMail: lennox@cs.columbia.edu
Henning Schulzrinne
Columbia University Department of Computer Science
450 Computer Science
1214 Amsterdam Ave., M.C. 0401
New York, NY 10027
US
Phone: +1 212 939 7004
EMail: hgs+mmusic@cs.columbia.edu
Jason Nieh
Columbia University Department of Computer Science
450 Computer Science
1214 Amsterdam Ave., M.C. 0401
New York, NY 10027
US
Phone: +1 212 939 7000
EMail: nieh@cs.columbia.edu
Lennox, et al. Expires June 1, 2005 [Page 20]
Internet-Draft Application and Desktop Sharing December 2004
Ricardo Baratto
Columbia University Department of Computer Science
450 Computer Science
1214 Amsterdam Ave., M.C. 0401
New York, NY 10027
US
Phone: +1 212 939 7000
EMail: ricardo@cs.columbia.edu
Lennox, et al. Expires June 1, 2005 [Page 21]
Internet-Draft Application and Desktop Sharing December 2004
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2004). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Lennox, et al. Expires June 1, 2005 [Page 22]