Network Working Group M. Rose
Internet-Draft Dover Beach Consulting, Inc.
Expires: May 7, 2003 D. Crocker
Brandenburg InternetWorking
November 6, 2002
Toward a Quantitative Analysis of IETF Productivity
draft-etal-ietf-analysis-02
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on May 7, 2003.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
This memo presents an initial quantitative analysis of the IETF's
working groups using RFC publication as the primary metric. These
basic indicators are sufficient for an initial assessment of IETF
performance. We can discuss our expectations for the numbers and our
reaction to them. Where there is a discrepancy, we can decide
whether to change our expectations or whether to look for ways to
improve the numbers. In other words, the purpose of this effort is
to encourage community discussion about measuring IETF productivity,
detecting possible problems and fixing them.
Rose & Crocker Expires May 7, 2003 [Page 1]
Internet-Draft Quantifying of IETF Productivity November 2002
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Statistics and Methodology . . . . . . . . . . . . . . . . . . 5
2.1 What to measure . . . . . . . . . . . . . . . . . . . . . . . 5
3. The Model . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4. The Queries . . . . . . . . . . . . . . . . . . . . . . . . . 9
5. The Results . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1 Days until 1st RFC published . . . . . . . . . . . . . . . . . 14
5.2 WG duration in days . . . . . . . . . . . . . . . . . . . . . 15
5.3 WG duration normalized over #-RFCs produced . . . . . . . . . 16
5.4 Numver of RFCs produced . . . . . . . . . . . . . . . . . . . 17
6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.1 Two Suggestions for the WG Charter Document System . . . . . 18
7. Security Considerations . . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 21
A. Data Gathering and Processing . . . . . . . . . . . . . . . . 22
A.1 Charters, Messages, and Documents . . . . . . . . . . . . . . 22
A.2 Reconstructing Activity . . . . . . . . . . . . . . . . . . . 23
A.3 Running the Analysis . . . . . . . . . . . . . . . . . . . . . 24
Full Copyright Statement . . . . . . . . . . . . . . . . . . . 26
Rose & Crocker Expires May 7, 2003 [Page 2]
Internet-Draft Quantifying of IETF Productivity November 2002
1. Introduction
An important part of management is measurement.
When you can measure something, you can make reasoned decisions as to
how to improve it -- if not "reasoned", then at least "better
informed". Imagine trying to achieve the last 15 years of
improvements to TCP's algorithms without being able to measure the
results at each step!
The Internet depends upon the IETF's producing useful specifications,
in a timely manner. Many folks contribute considerable resources to
that goal, so a lot of folks should be interested in understanding
how well the IETF is working. Historically the IETF has an excellent
track record. However as it has grown, there is increasing concern
that IETF efforts are less efficient and less effective. To date the
community has relied solely on its subjective sense of this change.
It is time to get help from evaluation tools that are both objective
and useful.
Obviously measuring the effectiveness of an organization is different
than measuring the effectiveness of a protocol. However there are
some fairly simple, objective metrics that we can apply to get a
first- or second-order approximation. (Previously, the only public
analysis of the IETF, per se, has been of the number of working
groups and meeting size. While interesting these metrics are useful
primarily for logistics planning.)
This memo is to encourage community discussion about measuring IETF
productivity, detecting possible problems and fixing them. Therefore
simple measurements and simple analyses are used. It is hoped that
the community will focus on the question of IETF productivity, rather
than the question of methodological imperfections.
The ultimate test of the IETF is that its work gets used by the
Internet community. Given the size of the Internet, this should mean
that the work is employed on millions or even hundreds of millions of
platforms. However it is years before adoption and use can be
measured, and even then, we do not have objective methods for
assessing that success.
Consider these two assertions:
o from a work product perspective, the IETF is simply the sum of its
working groups; and,
o while a working group might do many valuable things, the only
quantifiable metric is the number and timing of the RFCs that it
Rose & Crocker Expires May 7, 2003 [Page 3]
Internet-Draft Quantifying of IETF Productivity November 2002
produces.
So to measure something now, we'll take the middle road: while
agreeing that there are significant qualitative aspects of the IETF
that are not tied to RFC publication, and while hoping that IETF
participants are significant stakeholders in terms of needing
successful implementation and provisioning, we'll focus on measuring
IETF production of RFCs. In particular does the IETF produce
specifications efficiently? To be useful answers to such a question
needs to identify activities or areas that might be problematic.
Rose & Crocker Expires May 7, 2003 [Page 4]
Internet-Draft Quantifying of IETF Productivity November 2002
2. Statistics and Methodology
The first question is what gives guidance about IETF productivity and
is reasonably easy to measure objectively?
Statistical measurement involves the choice of what to measure and
the choice of how to analyze the measurements. In this memo we keep
both choices as simple as possible. The first needs to be
intuitively reasonable and procedurally easy. The second also needs
to be minimalist. Because this process of IETF measurement is new,
no one has enough knowledge about it to use sophisticated statistical
tools. (For that matter, the small very small quantity of data and
its non-normal distribution make use of the usual statistics tools
inappropriate.)
In fact the IETF approach to rough consensus is helpful here. It
means that we need the data and the analyses to be straightforward,
so that the bulk of the community can understand it easily, and agree
with it. Therefore we limit ourselves to the most basic
calculations: mean, minimum and maximum, and standard deviation
(sigma).
The mean tells us what is "usual" whilst the minimum and maximum tell
us the upper and lower bounds of performance. The standard deviation
gives us gradations of "better" and "worse", "faster" and "slower".
These basic indicators are sufficient for an initial assessment of
IETF performance. We can discuss our expectations for the numbers
and our reaction to them. Where there is a discrepancy, we can
decide whether to change our expectations or whether to look for ways
to improve the numbers.
1. Develop a model and populate it with data.
2. Decide on some interesting queries (analyses) to run against the
data.
3. Look at the results.
We hope that such tidbits prompt community discussion about useful
metrics and their preferred values. Even better will be the
development of community consensus for such measures and on-going use
of them to improve IETF performance.
2.1 What to measure
Since we can't measure exactly what we'd like, namely "productivity",
we'll measure something that's close.
Rose & Crocker Expires May 7, 2003 [Page 5]
Internet-Draft Quantifying of IETF Productivity November 2002
It is easy to create mechanical measurements of IETF activity. The
measure can be entirely objective and entirely meaningless. Size of
meeting sessions, number of messages on the mailing list, number of
words per specification, and number of RFCs cited in specifications
are all examples. For some these measures might be interesting, but
they do not really tell us much about productivity. They do not tell
us how well specifications are produced.
So it is difficult to measure statistics that say something both
meaningful and accurate. They need to tell us how well one or a
number of efforts is working. They might also tell us where we have
a problem. Although we try, here, to select particular measures that
seem useful, we carefully avoid drawing any qualitative conclusions
about the results. Therefore while we might show that it takes a
working group about 740 days to produce its first RFC, we will not
say that this is either "sweet" or "sour". As with everything else
in life, reader are free to draw their own conclusions (that
naturally reinforce their own perspectives).
Rose & Crocker Expires May 7, 2003 [Page 6]
Internet-Draft Quantifying of IETF Productivity November 2002
3. The Model
Given our second assertion, that RFC publication is the only thing
that a working group does that we care to measure. How can we go
about measuring it?
The first step is to develop a model of a working group. For this
analysis, we're going to use XML as the description language:
<!ENTITY % DATE "CDATA">
<!ENTITY % URI "CDATA">
<!ENTITY % SIZE "CDATA">
<!ELEMENT ietf (group*)>
<!ELEMENT group (person*,doc*)>
<!ATTLIST group
name CDATA #REQUIRED
title CDATA #REQUIRED
area CDATA #REQUIRED
chartered %DATE; #REQUIRED
concluded %DATE; #IMPLIED
estimated NMTOKENS "">
<!ELEMENT person EMPTY>
<!ATTLIST person
name CDATA #REQUIRED
role (director|area-advisor|chair|technical-advisor)
"chair">
<!ELEMENT doc (revision*)>
<!ATTLIST doc
name CDATA #REQUIRED
status (rfc|expired|inprogress|moved) "inprogress">
<!ELEMENT revision EMPTY>
<!ATTLIST revision
published %DATE; #REQUIRED
uri %URI; #REQUIRED
size %SIZE; #REQUIRED>
which provides a document-centric, and, to a lesser extent, a role-
centric model of a working group. Of course, it's not a complete
model, (e.g., BOF and meeting information is absent).
Rose & Crocker Expires May 7, 2003 [Page 7]
Internet-Draft Quantifying of IETF Productivity November 2002
To make this data model a bit more concrete, here's an abbreviated
example:
<group name='beep' area='app' chartered='20000705'
title='Blocks Extensible Exchange Protocol'>
<person name='Pete Resnick' role='chair' />
<person name='Ned Freed' role='director' />
<person name='Patrik Faltstrom' role='director' />
<person name='Ned Freed' role='area-advisor' />
<doc name='draft-ietf-beep-framework' status='rfc'>
<revision published='20010301' size='82089'
uri='http://.../rfc3080.txt' />
<revision published='20010105' size='83266'
uri='http://.../draft-ietf-beep-framework-11.txt' />
...
<revision published='20000825' size='77321'
uri='http://.../draft-ietf-beep-framework-00.txt' />
</doc>
...
</group>
In case it isn't obvious, throughout this memo the presence of an
ellipsis ("...") in an example indicates that some information was
omitted.
It turns out this information can be gathered in a fairly automated
fashion! Consult the Appendix for the details (including how to get a
copy of the dataset along with the tools that synthesized it.)
Rose & Crocker Expires May 7, 2003 [Page 8]
Internet-Draft Quantifying of IETF Productivity November 2002
4. The Queries
Although the IETF secretariat does a great job of documenting things,
the process has evolved over more than a decade. It turns out that
the data starts to get "really clean", for quantitative analysis, in
late 1998. Moving further back, it becomes progressively harder to
construct the written record. So, the very first thing to appreciate
is that, for the purposes of this analysis, the epoch for any query
is February 12, 1997 -- getting the data clean from that date moving
forward is straight-forward. With a lot more work, we can move the
epoch backward in time, This will be particularly helpful for doing
trend analysis, trying to discern changes in IETF productivity.
Regardless of the data cleanliness issue, it turns out that February,
1997 is a natural choice for an epoch -- this covers the last five
years.
Besides cleanliness, there may be other things to consider when
limiting the input domain. For example, if a working group was
created very recently, the fact that it hasn't published any RFCs yet
is hardly of interest. The question, of course, is where the cutoff
is for "very recently". For the purposes of this analysis, we'll
consider any working group that's at least two years old as being of
interest.
Another thing to consider is whether a working group is actually
"active". The IESG often does not conclude a working group at the
end of a document production cycle -- instead, the working group may
remain on "the books" as active, even though it isn't allowed to
produce any more RFCs until some external event occurs, (e.g., an
period of implementation and experimentation before re-examining a
document). Since this may affect some queries (e.g., "what's the
lifetime of a working group?"), we'll use the following heuristic: if
the working group has published at least one RFC, and if all of its
Internet-Drafts have been published as RFCs, then we consider it
"inactive".
So, here are the queries to consider:
o How long does it a take a working group to get its first RFC
published?
o How long is a working group "active"?
o How many RFCs does a working group publish?
o For these quantities, what is the average, the distribution, and
the extrema?
Rose & Crocker Expires May 7, 2003 [Page 9]
Internet-Draft Quantifying of IETF Productivity November 2002
o Do these relationships change if we aggregate the working groups
into areas?
In order to answer questions such as these, we'll also need a
relational model, for which we'll use SQL:
CREATE TABLE person (
id int(11) NOT NULL auto_increment,
name varchar(255) NOT NULL default '',
PRIMARY KEY (id)
);
CREATE TABLE groop (
id int(11) NOT NULL auto_increment,
name varchar(8) NOT NULL default '',
area varchar(8) NOT NULL default '',
chartered date NOT NULL default '',
concluded date default NULL,
title varchar(255) NOT NULL default '',
PRIMARY KEY (id)
);
CREATE TABLE role (
id int(11) NOT NULL auto_increment,
groupID int(11) NOT NULL default 0,
personID int(11) NOT NULL default 0,
name varchar(25) NOT NULL default '',
PRIMARY KEY (id)
);
CREATE TABLE doc (
id int(11) NOT NULL auto_increment,
groupID int(11) NOT NULL default 0,
status varchar(25) NOT NULL default '',
name varchar(255) NOT NULL default '',
PRIMARY KEY (id)
);
CREATE TABLE revision (
id int(11) NOT NULL auto_increment,
groupID int(11) NOT NULL default 0,
docID int(11) NOT NULL default 0,
size int(11) NOT NULL default 0,
published date NOT NULL default '',
uri varchar(255) NOT NULL default '',
PRIMARY KEY (id)
Rose & Crocker Expires May 7, 2003 [Page 10]
Internet-Draft Quantifying of IETF Productivity November 2002
);
Rose & Crocker Expires May 7, 2003 [Page 11]
Internet-Draft Quantifying of IETF Productivity November 2002
5. The Results
We now look at the results from the queries. Note that for a working
group to be considered in this analysis:
o it must have been chartered since the epoch; and,
o either:
* have published at least one RFC; or,
* be at least two years old.
Seventy-five working groups meet this criteria:
area size
================= ====
Applications app 19
Internet int 7
Operations ops 11
Routing rtg 5
Security sec 9
Sub-IP sub 2
Transport tsv 20
User Services usv 2
With the exception of the Sub-IP and User Services areas, we should
be able to make comparisons between areas.
For each of the queries, a tabular summary of the results is
presented. Interested readers may also consult [1] for graphical
summaries of these results. Readers are strongly encouraged to
examine these summaries. They provide some very clear insight into
the numbers.
Although the statistical terms used in this memo are basic
measurements, they are not part of typical IETF parlance.
Accordingly, we remind readers of their meanings:
size: the number of values measured
min: the smallest value measured
Rose & Crocker Expires May 7, 2003 [Page 12]
Internet-Draft Quantifying of IETF Productivity November 2002
max: the largest value measured
avg: the arithmetic mean (simple average) of all values measured
median: the dividing point of values, with half of the values below,
and the other half above
mode: a peak (maximum value) in the distribution; the primary mode is
the highest peak
sigma: one standard deviation from mean
1s: one sigma from mean
2s: two sigmas from mean
3s: three sigmas from mean
Although looking at the median reduces the effect of extreme minimum
or maximum values, it's also useful to try to normalize the data to
permit better comparisons between areas. To do this, we present
working group percentiles that occur at a given number of standard-
deviations. Percentiles and standard-deviations normalize the
results across the disparate groups, permitting fair comparison among
them. (Note that due to rounding, the percentiles for each area may
not add up to 100%).
Rose & Crocker Expires May 7, 2003 [Page 13]
Internet-Draft Quantifying of IETF Productivity November 2002
5.1 Days until 1st RFC published
area size min avg median sigma max
==== ==== === === ====== ===== ===
* 75 193 777 741 418 1849
app 19 239 897 842 459 1705
int 7 193 731 772 419 1466
ops 11 287 867 794 481 1807
rtg 5 601 1106 1227 339 1438
sec 9 297 812 741 482 1849
sub 2 937 1040 1040 146 1144
tsv 20 206 480 473 161 791
usv 2 1029 1034 1034 8 1040
area 1s 2s 3s
==== === === ===
app 47 47 5
int 71 28
ops 63 27 9
rtg 40 60
sec 66 22 11
sub 100
tsv 75 25
usv 100
This measures how quickly the working groups in an area produce their
first RFC.
All of the areas, except Transport, show very inconsistent durations
between start of the working group and issuance of the first RFC.
Application and Operations shows some similarity to the shape of
their distributions, with similar variance but very different means.
Internet and Security also have variances that are similar but means
that are quite different. Transport is distinctive, with a lower
average and narrower variance than the other areas. Routing is
distinctive, with the highest average and highest variance. Note
however that it also has the smallest number of working groups in
this calculation.
The graphs of these measures, with number of days normalized as a
percentage and the duration normalized in sigmas, is striking. All
of the areas, except Internet, have a primary mode (highest peak) of
1.5 to 1.75 sigmas. The Internet area has a primary mode at about
Rose & Crocker Expires May 7, 2003 [Page 14]
Internet-Draft Quantifying of IETF Productivity November 2002
2.25 sigmas. Routing shows a distinctive, secondary mode at about
4.25 sigmas.
5.2 WG duration in days
area size min avg median sigma max
==== ==== === === ====== ===== ===
* 75 235 1074 1059 427 1849
app 19 240 1094 959 477 1829
int 7 235 820 772 344 1334
ops 11 307 1054 1109 479 1823
rtg 5 934 1409 1331 366 1809
sec 9 515 1038 952 444 1849
sub 2 1144 1494 1494 495 1844
tsv 20 471 1088 1106 347 1684
usv 2 633 638 638 8 644
area 1s 2s
==== === ===
app 57 42
int 71 28
ops 54 45
rtg 60 40
sec 66 33
sub 50 50
tsv 80 20
usv 0 100
This measures the longevity of a working group, from the time its
chartered, until it's concluded.
The most significant observation about the data is that it has little
coherence. The only pattern that is consistent is that all the areas
show no real "shape" to the distribution of their working group
durations, so that their distributions are nearly flat. Transport
might be seen as having a bit of a distribution curve, but too few
working groups (3) form the mode, to make such an assessment
meaningful.
Rose & Crocker Expires May 7, 2003 [Page 15]
Internet-Draft Quantifying of IETF Productivity November 2002
5.3 WG duration normalized over #-RFCs produced
area size min avg median sigma max
==== ==== === === ====== ===== ===
* 75 106 687 623 435 1849
app 19 120 784 812 453 1810
int 7 235 575 644 255 920
ops 11 132 652 533 426 1393
rtg 5 934 1409 1331 366 1809
sec 9 106 796 741 511 1849
sub 2 108 626 626 733 1144
tsv 20 159 435 434 189 875
usv 2 633 638 638 8 644
area 1s 2s 3s
==== === === ===
app 68 21 10
int 85 14
ops 72 27
rtg 20 40 40
sec 66 22 11
sub 0 100
tsv 80 20
usv 100
Calculating the total number of RFCs produced by a working group,
against the duration of that working group, produces a unit of
measure for the average time needed to produce each RFC.
The distributions show very little pattern, except for Transport,
which has a distinctive primary mode at one year per RFC. Over the
other areas, most working groups take longer than two years to
produce each RFC.
The effect of normalizing both axes is remarkable. Normalizing to
percentage of working groups, and number of sigmas, shows all of the
areas to be relatively similar to each other in shape and mode of
their distribution curves, with the primary mode being approximately
1.75 sigmas. Operations is distinctive with a lower percentage of
working groups at the mode and a high-end tail having a more gradual
descent. Routing is remarkable with two modes that are nearly the
same height, one that is the same as the rest of the IETF, though
with a markedly smaller percentage of working groups, and a second
Rose & Crocker Expires May 7, 2003 [Page 16]
Internet-Draft Quantifying of IETF Productivity November 2002
mode at approximately 4 sigmas.
5.4 Numver of RFCs produced
area size min avg median sigma max
==== ==== === === ====== ===== ===
* 75 0 2 1 3 17
app 19 0 1 1 2 7
int 7 0 1 1 1 3
ops 11 0 2 1 4 12
rtg 5 0 0 0 0 1
sec 9 0 2 0 5 15
sub 2 0 8 8 12 17
tsv 20 1 3 2 2 9
usv 2 0 0 0 0 0
area 1s 2s 3s 4s 5s 6s
==== === === === === === ===
app 84 15
int 100
ops 81 9 0 9
rtg 100
sec 88 0 0 0 11
sub 50 0 0 0 0 50
tsv 75 20 5
usv 100
The distributions show very little pattern. Only Transport and
Applications show a real mode, both at three RFCs. All areas show
extremely long, flat, high-end tails.
Again, the effect of normalizing both axes is remarkable.
Normalizing to percentage of working groups, and number of sigmas,
shows all of the areas to be extremely similar to each other, with
well-shaped and equivalent curves having a primary mode at about 1.75
sigmas. Internet has a much lower, but distinctive, secondary mode
at about 3.5 sigmas.
Rose & Crocker Expires May 7, 2003 [Page 17]
Internet-Draft Quantifying of IETF Productivity November 2002
6. Conclusions
If these sorts of measures are useful, the question is what
additional measurements and analyses should be pursued?
First note that the data used for this analysis is only from the last
5 years. The modern IETF was formed in 1989 and these analysis
should be applied across at least the last 10 years looking for
trends, such as with a rolling 3 or 5 year analysis, to see how
things have changed.
Second note that we have only looked for broad assessments of the
IETF. These same techniques can be used to look more closely at the
history of particular working groups and even particular contributors
to IETF efforts. Obviously statistics about people can be sensitive
and the dangers of inappropriate use are particularly serious.
Still, we wonder whether the mere fact that a working group has
produced many specifications is a good thing, or whether a particular
person has their name on many specifications is a good thing.
Ultimately the question is whether those specifications get used.
6.1 Two Suggestions for the WG Charter Document System
Please note the current system works fine for its original and
intended purpose. However, here are two concrete suggestions for
improving the charter documents maintained by the IETF secretariat:
o each charter document should include information for each event in
its lifetime (and in between); and,
o in addition to making charter documents available in both text and
HTML, to facilitate automatic processing, each charter document
should also be available in XML.
Consider this (abbreviated) example, which pretty much captures the
whole of a working group's past and present activity:
<group name='beep' area='app'
title='Blocks Extensible Exchange Protocol'>
<events>
<event type='chartered' date='20000705' />
<event type='inactive' date='20010302' />
</events>
<mail mailto='beepwg@lists.beepcore.org'
Rose & Crocker Expires May 7, 2003 [Page 18]
Internet-Draft Quantifying of IETF Productivity November 2002
archive='http://lists.beepcore.org/mailman/listinfo/beepwg/'>
<subscribe mailto='beepwg-request@...'>subscribe</subscribe>
</mail>
<person name='Keith McCloghrie' mailto='...'
role='chair' begin='20000705' end='20010201' />
<person name='Pete Resnick' mailto='...'
role='chair' begin='20010201' />
<person name='Ned Freed' mailto='...'
role='director' begin='20010201' />
<person name='Patrik Faltstrom' mailto='...'
role='director' begin='20010201' />
<person name='Ned Freed' mailto='...'
role='area-advisor' begin='20010201' />
<doc name='draft-ietf-beep-framework' status='rfc'>
<revision published='20010301' size='82089'
uri='http://.../rfc3080.txt' />
<revision published='20010105' size='83266'
uri='http://.../draft-ietf-beep-framework-11.txt' />
...
</doc>
<description begin='20000705'> ...text... </description>
<milestones begin='20000705'>
<milestone planned='20000705' actual='20000731'>Prepare
...<milestone>
...
</milestones>
</group>
This is actually a straight-forward generalization of the model used
in this analysis. However, since the DTD for this model isn't
germane to this analysis, it isn't presented in this memo.
Rose & Crocker Expires May 7, 2003 [Page 19]
Internet-Draft Quantifying of IETF Productivity November 2002
7. Security Considerations
This memo has nothing, whatsoever, to do with security; nor does it
have anything to do with insecurity.
Rose & Crocker Expires May 7, 2003 [Page 20]
Internet-Draft Quantifying of IETF Productivity November 2002
URIs
[1] <http://xml.resource.org/ietf-analysis/current/>
[2] <http://xml.resource.org/ietf-analysis/analysis.tgz>
[3] <http://xml.resource.org/ietf-analysis/analysis.tgz>
Authors' Addresses
Marshall T. Rose
Dover Beach Consulting, Inc.
POB 255268
Sacramento, CA 95865-5268
US
Phone: +1 916 483 8878
EMail: mrose@dbc.mtview.ca.us
David H. Crocker
Brandenburg InternetWorking
675 Spruce Drive
Sunnyvale, CA 94086
US
Phone: +1 408 246 8253
EMail: dcrocker@brandenburg.com
URI: http://www.brandenburg.com/
Rose & Crocker Expires May 7, 2003 [Page 21]
Internet-Draft Quantifying of IETF Productivity November 2002
Appendix A. Data Gathering and Processing
Because the IETF "written history" wasn't explicitly designed for
productivity analysis, three different sources are consulted to
synthesize a uniform dataset. Even so, there are still some gaps in
the data, primarily before 1997.
A.1 Charters, Messages, and Documents
The charter document is the fundamental description of a working
group. Since its earliest days, the IETF secretariat has been quite
diligent in using a consistent charter format, which allows for
automatic processing of the charter documents. Unfortunately, there
is no known public archive of charter document revisions.
Accordingly, if a working group is active, the charter document
reflects the present state of affairs. In other words, the charter
document indicates the current chairs, advisors and directors. But,
it does not indicate when the working group was created, and who has
been involved with the working group prior to the present.
Fortunately, chair and technical advisor turnover is (anecdotally)
rather low. Further, in the case of area director and advisors, the
IESG membership rotates fairly infrequently -- even with a two-year
term.
In addition to a lack of personnel history, the charter document
indicates only the latest I-Ds produced by a working group. (For
example, if an I-D expires or moves to another working group, this
fact isn't noted by the charter document.) Simlarly, when a working
group reaches a milestone, the charter document is updated to replace
the planned date of the milestone with the string "Done".
(Obviously, knowing the planned and actual dates is rather useful for
gauging a working group's ability to work towards deadlines.)
A third issue in mining charter documents is that when a working
group is concluded, the charter document is severely condensed, e.g.,
with the exception of the chairs all other personnel information is
removed. The difficulty here is that when a working group is
concluded and then re-activated, all of the history is lost.
Fortunately, three other data sources are available to help minimize
these deficiencies.
First, there are the archives for the IETF general and announcement
mailing lists. Although there are some gaps in the archives prior to
mid-1998, when announcements were split off from the general
discussion list, many charter announcements (and conclusion notices)
are archived. (For this analysis, approximately 54,250 messages were
Rose & Crocker Expires May 7, 2003 [Page 22]
Internet-Draft Quantifying of IETF Productivity November 2002
examined to find 127 charter document and 65 conclusion
announcements.)
Second, there are several archival I-D repositories, which can be
examined to reconstruct the document history of each working group.
(This analysis uses the "watersprings" archive, which indexes
approximately 11,700 I-Ds and RFCs since late 1991.)
Finally, a small exception file was built by consulting the IETF
proceedings. Of the 126 working groups chartered since the epoch,
the charter document announcement message for 25 were not archived.
However, by manually examining the IETF proceedings and the earliest
I-Ds produced by those working groups, it is possible to estimate
when these working groups were created.
A.2 Reconstructing Activity
Here are the steps to produce a uniform dataset:
1. The document history for each working group is constructed, by
parsing the index for the I-D archive repository. Although
straight-forward, there are a few nuances:
* Some I-Ds were published with incorrect prefixes, so an
exception list is consulted so that the correct working group
gets the "credit".
* The index lists documents and their revision history, but not
publication dates nor sizes, so the appropriate instances are
retrieved in order to determine this information.
2. The creation and conclusion dates for each working group is
determined, by parsing the mail archives for the IETF general and
announcement mailing lists. It turns out that (only) eight of
these announcements are poorly-formatted, so an additional
archive containing these 8 messages, properly-formatted, is also
consulted.
3. Additional information for each working group is determined, by
parsing the current charter document. Because of the
deficiencies in the format, some additional steps are taken:
* If a working group is concluded, then the charter announcement
is examined (if available) to determine the area associated
with the working group.
* If the creation date of a working group isn't known, or if a
working group has produced a document that is before the
Rose & Crocker Expires May 7, 2003 [Page 23]
Internet-Draft Quantifying of IETF Productivity November 2002
charter announcement, then the creation date of the working
group is set to the working group's earliest document, and a
note is made of this. (This may happen if the creation date
of a working group is estimated using the IETF proceedings.)
Howeer, to avoid skewing the results, these working groups
were not analyzed.
* If the conclusion date of a working group is known, and if the
working group produced a document after that date, then the
conclusion date of the working group is updated, and a note is
made of this. (This happens when a working group is concluded
before one of its document is published as an RFC --
ultimately, only RFCs count, so even if the IESG formally
concludes a working group, this analysis doesn't consider the
working group concluded until the RFC editor publishes...)
4. Finally, the XML and corresponding SQL datasets are produced.
When the SQL dataset is produced, for each working groups
considered active, a check is made to see if all of the working
group's documents have been published as RFCs. If so, the
working group is considered inactive, as of the most recent RFC
publication date. (This reflects the fact that the IESG often
does not conclude a working group until its documents reach final
standardization status.)
A.3 Running the Analysis
The software and data for this analysis is available at [2].
If you want to build the datasets yourself, you'll need a UNIX system
(e.g., NetBSD) and these packages:
o 'tcl', a powerful scripting language;
o the 'mbox' package for Tcl; and,
o the GNU 'wget' utility.
If you just want the resulting XML and SQL datasets, they're also
available at [3]. You'll also need database software. This
analysis was generated using the excellent "MySQL" package, although
any "modern" SQL software should work. Simliarly, any "postmodern"
XML database software should work, e.g., "Tamino" or "eXist",
although the authors used only SQL for database access.
Finally, to query the database and visualize the results, this
analysis uses the "fbsql" extension to "Tcl" and the friendly
Rose & Crocker Expires May 7, 2003 [Page 24]
Internet-Draft Quantifying of IETF Productivity November 2002
"ploticus" graphics package.
Rose & Crocker Expires May 7, 2003 [Page 25]
Internet-Draft Quantifying of IETF Productivity November 2002
Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Rose & Crocker Expires May 7, 2003 [Page 26]