Networking Machine Learning Proposed Research Group, IETF-94 (Yokohama) Chairs: Brian Carpenter & Sheng Jiang (Since Brian was not in Yokohama, Suresh Krishnan helped to co-chaired this session) Tuesday, Morning Session I 0900-1130, Room304 1. WG Dash, by co-chairs ******************************************************************************* 2. Introduction to Machine Learning, its potential usage in network area, & the proposed NMLRG - Sheng Jiang draft-jiang-nmlrg-network-machine-learning [Pierre Peloso] Question related to Autonomic & Machine Learning. Would you consider the needs to machine learning algorithms for autonomic networking behavior, or is it just one option among other options? [Sheng Jiang] Autonomic is one of the reasons we're doing machine learning here, but machine learning still needs to interact with human operators. So autonomic could become a very important aspect of machine learning applied in network. [Suresh Krishnan] The question could be asked in another way: can you do autonomic without machine learning. (Pierre: yes.) [Sheng] For now, not sure. First target of NMLRG is higher layer concepts and use case study. Given the time, that may be in the scope. [Kevin Fall] Higher layer concepts and use cases seems like pretty broad, more detail of what you expect to be initial areas you mean by higher layer use cases? [Sheng] For now, we set very low bar that anything regards to network using ML mechanisms could be discussed here. We're giving a forum to people. But in the future, we might narrow our scope to focus on certain use cases/solutions. [Kevin] Things that you perceived as sort of pressing problems for which you think ML might be a great solution? [Sheng] For now, network control/management and supplying in data for up-layer application. But that's not the limitation or border there, it could be more. [???] Do we have a statement what is not in the scope? [Suresh] It's a proposed RG, it's a really broad scope, not like ietf WGs. So I don't think there could be text for that, by the time Lars says the RG goes or not, we could have a better idea what is in or not. [???] An example of what I think about: for DPI, what kind of information you're going to collect on the network? [Sheng] We're do the use case study, you guys need to go front and introduce your use case whatever you think in your mind. Then we can learn study that use case together. After that, we'll be clearer whether it's in or out of scope. So I do encourage you to apply timeslots in the next meeting, or discuss in the mailing list. [Lars Eggert] This is IRTF Research Group, there aren't going to be standard coming out. The point is, to be a forum, there are a lot of researchers that look at ML as tool to apply in networking. We want to talk about that as a research effort, and learn something from them. Maybe realize this is actually fit into something we struggling over on the ietf side eventually, or not. For the next couple of meetings, it's mostly about judging whether this community wanting to discuss this together here, or not. That's why the scope is broad. [Gihan Dias] I believe network security would be in the scope? [Sheng] Yes. [Erik Nordmark] It seems to be two different pieces here: 1.using ML to classify the data from network; 2.recovery/correct/routing etc. take resource of that and feeding back into the system. What are the local things that device could deal with individually; network level, in the domain, what data we gather, what decisions made. May need to split the learning aspect and the action, feedback aspect. [Sheng] For many use cases, maybe ML needs to be combined with traditional mechanisms or combined with evaluation system which is NOT ML. For certain cases, we have to consider them as whole use cases, part of which are ML. [Brian Trammell] suggest that the scope of the RG explicitly consider interaction "from" the IETF to help inject a different perspective into the discussion (e.g. production may be interesting from standardization standpoint). [Sheng] Thank you. In the future if the RG decides something mature enough that need to do protocol/standard work, we can come together to form a proposal for ietf. [Jerome Francois] what dataset could use, the RG maybe provide assessment of dataset, for verification of use cases. [Sheng] For myself, not sure the value of fixed datasets. Fixed dataset might be useful to evaluate certain use cases and compare different algorithms, but the network situation is very complicated and dynamic changing. So if we can do some work against the real world data, for me, it would be better than the fixed datasets. [Steven Wright] to document requirements seems a little bit absurd for a RG. If the purpose is to deal with requirements, then that requirements should be on something specific or related to some overall objective which is probably beyond the scope of a specific RG. So either redefine it or just leave them. [Sheng] Could do. That could be some study results of this RG. [Andrew Veitch] For the dataset, you need a lot of data, and a lot of scenarios. I don't know whether there are open source or open access has data from carriers (real world, not simulation data). How can these data from real carriers available for researchers? [Sheng] For me for now, no answer. It depends on the group. If people volunteered to provide some data, and other people are interested in using that data for certain solutions, that's fine. [???] How would you describe a use case, what would be consisted of? (What will be a good template for a good use case?) [Sheng] I try to NOT give any limitation. You can come to describe what you think to apply ML to whatever network aspect. Then you measure the results according to real world criteria to see whether the use case is efficient. [Lars] In irtf, the charter almost doesn't matter. Charter is more like an advertisement to researchers to come here to present works. So it won't to be exclusive (the words "use cases" or "requirements"). The first meeting is just an advertisement. ******************************************************************************* 3. Applying Machine Learning to Software-Defined Networks Use-cases and ongoing experimental results - Albert Cabellos [Edward Henry] Question regarding to your model used for routing. How do you handle the coordination within the system (different routing models among boxes). [Albert] We don't have an answer for that. We assume there is a controller handling all the information. [???from Telefonica] Have you studied on Virtual Network Function placing? [Albert] We haven't studied Virtual Network Function placement, but what we're doing here is what is the cost of the Virtual Network Function. If you have a chain of the functions, which would have delays, you need to know where it is. This is this kind of model can provide you, then using the estimate, you can allocate/place them in a more efficient way. [Jeffrey ?? from SDNLabs] Should be careful about the black box model prediction. It is some fundamental concern because we don't understand how it works. It is not reliable. [Albert] Fully agree. If we had a model, then we should use that to solve the problem. But there are scenarios that we don't have all the information (e.g. underlying, topologies), so we don't have the model, ML is suitable for those scenarios. We're aware of that ML is not a solution for everything. [Fei Song] Regarding to "from data to knowledge", how do you define "knowledge"? [Albert] Network monitoring provides data, huge log files with many information, a human cannot parse so much raw information. Give all those files to the ML system, and tell it events that are relevant. The ML tries to find correlations between events which for a human that relevant correlates to the events of network. This is theknowledge given to human that this is something relevant to you is happening now. [Fei Song] A comment to the RG. What's the boundary of this RG, do we think that the prediction is enough for the use cases (network management etc.)? If we want to use the achievement of this RG in other perspectives, we need to consider other things (e.g. gaming theory). ******************************************************************************* 4. Multidimensional Aggregation for DNS monitoring - Jer?me Fran?ois [???] Seems this tech very dependent on domain names stability compared IP address allocation, how do you deal with CDN domain names? How do you deal with domain names parking? [Jerome] For CDN, you're right, we have the same kind of behavior. Domain names parking without stability might be challenging. ******************************************************************************* 5. Machine Learning in Spam Filtering - by John Levine (No time for discussion) ******************************************************************************* 6. Autonomic Network Configuration Using Machine Learning - Shufan Ji (No time for discussion) ******************************************************************************* 7. Research on Network Fault Analysis Based on Machine Learning - Haibing Song (No time for discussion) ******************************************************************************* 8. RG Closing - co-chairs [Lars, IRTF chair] There should be more time for discussion. It is the meaning we have RG meetings. Encourage you to find where the researchers are, from the academic side, conferences, you shouldn't only meet them in the IETF. Future meetings could be organized with conference. The organizing team may consider to theme the meeting, you pick up a certain topic so that the topics won't take all of the place; and leave enough time for discussion.