Minutes interim-2023-rasprg-01: Tue 11:00

Meeting Minutes Research and Analysis of Standard-Setting Processes Proposed Research Group (rasprg) RG
Date and time 2023-05-16 11:00
Title Minutes interim-2023-rasprg-01: Tue 11:00
State Active
Other versions plain text
Last updated 2023-06-01

# rasprg interim (16/5/23)

- Introduction (Niels/Ignacio): plan is to have an interactive, hands-on
session, given the presentation-based meeting in Yokohama. - Preliminary agenda
accepted; no changes. - Chairs have developed a set of research questions,
based on previous/existing work. Not meant to be exhaustive. - Want to discuss
and develop other questions that are of wider interest, so that the proposed RG
meets the needs of the community. Both data that we can surface (i.e., in a
dashboard), and papers/research projects that we can collaborate on.

### Patents/other outputs
- Stephen: want to look at how patents/intellectual property/other
commercialisation activities impact participation dynamics. more broadly, IETF
participants have other outputs (academics produce papers, industry
participants produce patents, …) — can we link these together with their IETF
contributions to come up with a broader, more holistic  view of activity. -
Niels: added to the list of questions. Have added another point that wasn’t
mentioned: participants might also produce public policy documents. More
broadly: “non-standards documents”. What do you think would be the best way to
collect this, ML or? - Stephen: there are databases available for things like
academic papers and patents. Final point on patent/intellectual property side:
do the different governance models, and how they relate to IPR, have an impact
on participation dynamics? So, for example, the IETF has a stated preference
for standards that are free from IPR claims — does this change who
participates, and how they participate? - Ignacio: There has been a little bit
of work done on this on the economics side [linked to paper in the chat: - Niels: Are there
patent databases, or other resources, and do they vary by jurisdiction? -
Stephen: PATSTAT for EU patent applications; other databases for different
jurisdictions. - Niels: are these machine readable? - Stephen: yes, I think so
— but Bernhard’s in the queue, and knows much more about it than I do. -
Bernhard: about the patent databases — PATSTAT has a lot of metadata, but not
so much the texts. for the texts, for US patents, patentsview (1976 and later,
all machine readable). European Patent Office, sells their database. UK IPO has
an academic/research licence programme for machine-readable access. Canadian
patent office also makes the full texts available via their website. Bottom
line: great resources for any text analytics people want to do. - Niels: are
people already doing work on knowledge graphs? - Bernhard: good amount of work
exploiting the texts of patents. I’ve been doing work at a simpler level.
People doing work looking at when new terminology and technologies appear in
patents. People at the Max Planck University in Munich that are linking patents
to standards texts. The question of how similar patent texts are to standards
documents. A lot of work done, but a lot more still to do. - Niels: added a
question about how language/concepts travels between standards documents and
patents. Can also look at research documents, and ultimately make arguments
about where innovation happens. Are you already working on this Bernhard? -
Bernhard: yes, a little bit. A paper a few years ago looking at how patents
cite standards documents. Patents have two types of references: other patents,
and non-patent literature. We were curious about how many patents cite
standards documents, and how often standards documents are cited in patents to
explore their impact on patent development. - Niels: will try to encourage you
to present in a later meeting.

### Categorising the research questions
- Colin: list of questions could be categorised. Some can be derived
straightforwardly from the data, but others build on that initial work. Would
it make sense to think about the levels of analyse that are needed? Start with
an initial characterisation of the data, and then build up the layers of
abstraction. - Niels: clarify - categorisation of the questions or the data? -
Colin: of the questions — some require need little analysis, and then other
questions need data that earlier questions would produce. - Niels: good
suggestion; lots of categorisation that could be done.

### Definition and data about participation
- Mirja: many questions that we discuss from time to time where more data would
be useful. But something that we’ve discussed lately has been to try to
understand who our active participants are, what they are doing, how they
engage. - Niels: Interesting — we also have some preliminary analysis on how
long they stick around. - Mirja: one more thing — we have lots of data on
people that engage in mailing lists, come to meetings, etc — but participation
is broader than that. How do we meet the needs of other groups of participants?
How is active participation even defined? - Ignacio: lots of data gathered
(like surveys, for example) that isn’t publicly available — how could that be
released in some way (anonymised)? - Mirja: to clarify — by “who participants
are” I mean their characteristics, not just a list of names. - Ignacio: yes —
for very active participants, it’s straightforward to look at things at mailing
list participation, for example. - Mirja: even for active participants, it
isn’t super easy — for example, I don’t think I send a lot of e-mails to public
mailing lists, but I do a lot of work on GitHub and those kinds of things. One
metric isn’t enough; question is about figuring out what active participants
are doing. - Ignacio: agreed - but there is other data that isn’t available yet
that would be useful in answering these questions. - Mirja: yes, good point —
for example, we’ve a lot of data from MeetEcho that isn’t public, or at least
isn’t public yet, because we don’t know how to publish it. No reason for not
making it public, but need to discuss. - Sebastian [chat]: is there a list of
GitHub repositories associated with the IETF/working groups? - Rich: the
requirement is that the Secretariat has to be a co-owner of any repository.
Datatracker has arbitrary key/value pairs — predefined keys include a GitHub
repo link. Don’t think it’s available on the API. Could ask the Secretariat
about the repositories that they own. - Stephen [chat]: GitHub URLs are
available from the Datatracker, but it isn’t always populated. - Colin [chat]:
Can we search GitHub for repos containing files based on I-D names? - Sebastian
[chat]: GitHub API is very solid.

### Bibliographies in the RG GitHub repo
- Niels: thought about adding bibliographies of papers looking at standards
data, split up by discipline. Links to code, and tools, too. - John Klensin
[chat]: lots of work in other venues. A good bibliography might be a good idea.
- Bernhard Ganglmair [chat]: happy to put together a bibliography of papers
linking patents and SDO documents.

### Ethics and privacy
- Sebastian: think my questions were addressed in the mailing list, will follow
up if there are further questions. - Niels: Jay’s answers were quite clear. For
our work, we’ve done GDPR analysis. There are considerable carve-outs for
research, so strong basis for our carrying out our analysis. - Colin: main
concern is what data we can release, as it’s processed — for example, sentiment
analysis or other annotations — can those be released, because it’s all
personally identifiable information? - Niels: our GDPR work helps here. Big
problem is when this data is used commercially. Research purposes are OK, and
distributing it to people that are using it for research, then the GDPR
carve-out for research applies — but not when we distribute it for commercial
purposes. - Colin: so we ask people to agree to terms before they can access
the data? - Niels: yes. the GDPR isn’t just valid in the EU, but everywhere for
all EU citizens. - Colin: might be worth sharing the licences that people use,
so that we aren’t all starting from scratch. - Niels: can share that analysis.
Also think that the data that Jay provided was helpful. - Sebastian: My
understanding is that there’s a particular purpose restriction for the use of
data according to the GDPR, which is purpose of public interest research. My
understanding is that the IETF privacy talks about the data being public, and
has some language in it about the particular purposes that the privacy policy
enumerates; but the policy doesn’t enumerate them. Discrepancy between European
privacy standards and American standard — some in public in Europe can still be
purpose restricted. Would make it clear if the privacy statement said something
about the purpose of the data being collected was for research. Maybe a point
of ambiguity that should be followed up on. - Jay: privacy statement has been
discussed many times with the lawyers. Not a discrepancy in the privacy
statement — purposes are there, but written in a way that might be unfamiliar.
The purposes are there clearly: about operating in an open and transparent
fashion, going to make things public, etc. This is a very defensible privacy
statement. No attempt in the privacy statement to talk about things that go
beyond the public interest and research carve out — commercialisation, for
example — so no need to talk about it here. - Redoing the privacy statement is
exceptionally complex, given the number of people involved. Would rather not do
that — but happy to get further advice and clarification. But I don’t think
there’s any discrepancy here that needs to be fixed.

### Possible joint work
- Niels: Stephen and Sebastian — I wanted to ask about joint work, and what the
next steps are. - Stephen: reviewer recommendations tooling is going well. It’s
something that we’re planning to contribute to the repository that was
discussed earlier. In terms of other joint work, we’re thinking about the data
that we can contribute to a dashboard-style project. - Stephen: feedback on the
reviewer recommender tooling has been useful, and has led to changes and
improvements. - Niels: Sebastian — how is your work going? - Sebastian: getting
close to a 0.5 release of BigBang, based on recent work (including at the
Hackathon). Not much work on the dashboard since the Hackathon.

### Future meetings
- Neither chair will be in-person in San Francisco.
- No appetite for a physical meeting or hackathon in San Francisco: going to
focus on interims, and potentially meet in Prague. - Rich Salz: maybe schedule
a side meeting during San Francisco, have an interim a few weeks after. -
Niels: agreed. - [San Francisco meeting slot cancelled]

### Datatracker changes
- Jay: datatracker developers are planning to redo the API to use GraphQL.
Don’t know the timetable, but worth keeping in mind. - Colin: is that going to
deprecate the existing API? - Jay: don’t know — would have to ask to find out
about that.