Minutes interim-2023-rasprg-01: Tue 11:00
minutes-interim-2023-rasprg-01-202305161100-00
| Meeting Minutes | Research and Analysis of Standard-Setting Processes Research Group (rasprg) RG | |
|---|---|---|
| Date and time | 2023-05-16 11:00 | |
| Title | Minutes interim-2023-rasprg-01: Tue 11:00 | |
| State | Active | |
| Other versions | plain text | |
| Last updated | 2023-06-01 |
minutes-interim-2023-rasprg-01-202305161100-00
# rasprg interim (16/5/23) - Introduction (Niels/Ignacio): plan is to have an interactive, hands-on session, given the presentation-based meeting in Yokohama. - Preliminary agenda accepted; no changes. - Chairs have developed a set of research questions, based on previous/existing work. Not meant to be exhaustive. - Want to discuss and develop other questions that are of wider interest, so that the proposed RG meets the needs of the community. Both data that we can surface (i.e., in a dashboard), and papers/research projects that we can collaborate on. ### Patents/other outputs - Stephen: want to look at how patents/intellectual property/other commercialisation activities impact participation dynamics. more broadly, IETF participants have other outputs (academics produce papers, industry participants produce patents, …) — can we link these together with their IETF contributions to come up with a broader, more holistic view of activity. - Niels: added to the list of questions. Have added another point that wasn’t mentioned: participants might also produce public policy documents. More broadly: “non-standards documents”. What do you think would be the best way to collect this, ML or? - Stephen: there are databases available for things like academic papers and patents. Final point on patent/intellectual property side: do the different governance models, and how they relate to IPR, have an impact on participation dynamics? So, for example, the IETF has a stated preference for standards that are free from IPR claims — does this change who participates, and how they participate? - Ignacio: There has been a little bit of work done on this on the economics side [linked to paper in the chat: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3073165 - Niels: Are there patent databases, or other resources, and do they vary by jurisdiction? - Stephen: PATSTAT for EU patent applications; other databases for different jurisdictions. - Niels: are these machine readable? - Stephen: yes, I think so — but Bernhard’s in the queue, and knows much more about it than I do. - Bernhard: about the patent databases — PATSTAT has a lot of metadata, but not so much the texts. for the texts, for US patents, patentsview (1976 and later, all machine readable). European Patent Office, sells their database. UK IPO has an academic/research licence programme for machine-readable access. Canadian patent office also makes the full texts available via their website. Bottom line: great resources for any text analytics people want to do. - Niels: are people already doing work on knowledge graphs? - Bernhard: good amount of work exploiting the texts of patents. I’ve been doing work at a simpler level. People doing work looking at when new terminology and technologies appear in patents. People at the Max Planck University in Munich that are linking patents to standards texts. The question of how similar patent texts are to standards documents. A lot of work done, but a lot more still to do. - Niels: added a question about how language/concepts travels between standards documents and patents. Can also look at research documents, and ultimately make arguments about where innovation happens. Are you already working on this Bernhard? - Bernhard: yes, a little bit. A paper a few years ago looking at how patents cite standards documents. Patents have two types of references: other patents, and non-patent literature. We were curious about how many patents cite standards documents, and how often standards documents are cited in patents to explore their impact on patent development. - Niels: will try to encourage you to present in a later meeting. ### Categorising the research questions - Colin: list of questions could be categorised. Some can be derived straightforwardly from the data, but others build on that initial work. Would it make sense to think about the levels of analyse that are needed? Start with an initial characterisation of the data, and then build up the layers of abstraction. - Niels: clarify - categorisation of the questions or the data? - Colin: of the questions — some require need little analysis, and then other questions need data that earlier questions would produce. - Niels: good suggestion; lots of categorisation that could be done. ### Definition and data about participation - Mirja: many questions that we discuss from time to time where more data would be useful. But something that we’ve discussed lately has been to try to understand who our active participants are, what they are doing, how they engage. - Niels: Interesting — we also have some preliminary analysis on how long they stick around. - Mirja: one more thing — we have lots of data on people that engage in mailing lists, come to meetings, etc — but participation is broader than that. How do we meet the needs of other groups of participants? How is active participation even defined? - Ignacio: lots of data gathered (like surveys, for example) that isn’t publicly available — how could that be released in some way (anonymised)? - Mirja: to clarify — by “who participants are” I mean their characteristics, not just a list of names. - Ignacio: yes — for very active participants, it’s straightforward to look at things at mailing list participation, for example. - Mirja: even for active participants, it isn’t super easy — for example, I don’t think I send a lot of e-mails to public mailing lists, but I do a lot of work on GitHub and those kinds of things. One metric isn’t enough; question is about figuring out what active participants are doing. - Ignacio: agreed - but there is other data that isn’t available yet that would be useful in answering these questions. - Mirja: yes, good point — for example, we’ve a lot of data from MeetEcho that isn’t public, or at least isn’t public yet, because we don’t know how to publish it. No reason for not making it public, but need to discuss. - Sebastian [chat]: is there a list of GitHub repositories associated with the IETF/working groups? - Rich: the requirement is that the Secretariat has to be a co-owner of any repository. Datatracker has arbitrary key/value pairs — predefined keys include a GitHub repo link. Don’t think it’s available on the API. Could ask the Secretariat about the repositories that they own. - Stephen [chat]: GitHub URLs are available from the Datatracker, but it isn’t always populated. - Colin [chat]: Can we search GitHub for repos containing files based on I-D names? - Sebastian [chat]: GitHub API is very solid. ### Bibliographies in the RG GitHub repo - Niels: thought about adding bibliographies of papers looking at standards data, split up by discipline. Links to code, and tools, too. - John Klensin [chat]: lots of work in other venues. A good bibliography might be a good idea. - Bernhard Ganglmair [chat]: happy to put together a bibliography of papers linking patents and SDO documents. ### Ethics and privacy - Sebastian: think my questions were addressed in the mailing list, will follow up if there are further questions. - Niels: Jay’s answers were quite clear. For our work, we’ve done GDPR analysis. There are considerable carve-outs for research, so strong basis for our carrying out our analysis. - Colin: main concern is what data we can release, as it’s processed — for example, sentiment analysis or other annotations — can those be released, because it’s all personally identifiable information? - Niels: our GDPR work helps here. Big problem is when this data is used commercially. Research purposes are OK, and distributing it to people that are using it for research, then the GDPR carve-out for research applies — but not when we distribute it for commercial purposes. - Colin: so we ask people to agree to terms before they can access the data? - Niels: yes. the GDPR isn’t just valid in the EU, but everywhere for all EU citizens. - Colin: might be worth sharing the licences that people use, so that we aren’t all starting from scratch. - Niels: can share that analysis. Also think that the data that Jay provided was helpful. - Sebastian: My understanding is that there’s a particular purpose restriction for the use of data according to the GDPR, which is purpose of public interest research. My understanding is that the IETF privacy talks about the data being public, and has some language in it about the particular purposes that the privacy policy enumerates; but the policy doesn’t enumerate them. Discrepancy between European privacy standards and American standard — some in public in Europe can still be purpose restricted. Would make it clear if the privacy statement said something about the purpose of the data being collected was for research. Maybe a point of ambiguity that should be followed up on. - Jay: privacy statement has been discussed many times with the lawyers. Not a discrepancy in the privacy statement — purposes are there, but written in a way that might be unfamiliar. The purposes are there clearly: about operating in an open and transparent fashion, going to make things public, etc. This is a very defensible privacy statement. No attempt in the privacy statement to talk about things that go beyond the public interest and research carve out — commercialisation, for example — so no need to talk about it here. - Redoing the privacy statement is exceptionally complex, given the number of people involved. Would rather not do that — but happy to get further advice and clarification. But I don’t think there’s any discrepancy here that needs to be fixed. ### Possible joint work - Niels: Stephen and Sebastian — I wanted to ask about joint work, and what the next steps are. - Stephen: reviewer recommendations tooling is going well. It’s something that we’re planning to contribute to the repository that was discussed earlier. In terms of other joint work, we’re thinking about the data that we can contribute to a dashboard-style project. - Stephen: feedback on the reviewer recommender tooling has been useful, and has led to changes and improvements. - Niels: Sebastian — how is your work going? - Sebastian: getting close to a 0.5 release of BigBang, based on recent work (including at the Hackathon). Not much work on the dashboard since the Hackathon. ### Future meetings - Neither chair will be in-person in San Francisco. - No appetite for a physical meeting or hackathon in San Francisco: going to focus on interims, and potentially meet in Prague. - Rich Salz: maybe schedule a side meeting during San Francisco, have an interim a few weeks after. - Niels: agreed. - [San Francisco meeting slot cancelled] ### Datatracker changes - Jay: datatracker developers are planning to redo the API to use GraphQL. Don’t know the timetable, but worth keeping in mind. - Colin: is that going to deprecate the existing API? - Jay: don’t know — would have to ask to find out about that.