Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
Minutes for BMWG at IETF-96
minutes-96-bmwg-1

Versions:
Meeting Minutes		Benchmarking Methodology (bmwg) WG
Date and time		2016-07-20 08:00 Meeting Local UTC
Title		Minutes for BMWG at IETF-96
State		Active
Other versions		plain text
Last updated		2016-08-15
minutes-96-bmwg-1
Benchmarking Methodology Working Group (BMWG)
IETF 96
Wednesday, July 20, 2016
1000-1230 (Local Time)  Morning Session I
Schoenberg      OPS     bmwg

Remote Participation:
http://www.ietf.org/meeting/96/index.html
http://www.ietf.org/meeting/96/remote-participation.html

Minute Takers: Marius Georgescu
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

0.  Agenda Bashing
    No agenda bashing.

1a. WG Status (Chairs)
    * VNF and Infrastructure Benchmarking Considerations
      WG Consensus

2. Data Center Benchmarking Proposal
   Presenter: Jacob Rapp and Lucien Avramov
   Presentation Link:
   https://www.ietf.org/proceedings/96/slides/slides-96-bmwg-1.pdf
   https://datatracker.ietf.org/doc/draft-ietf-bmwg-dcbench-terminology/
   https://datatracker.ietf.org/doc/draft-ietf-bmwg-dcbench-methodology/

   - Al Morton: A point that I don't think you mentioned in a
        while is that you guys have used these tests for many
        datacenter applications already. They have been basically
        found to be useful and this was what drove the draft to
        creation and near completion. It would be good if we could
        capture that somehow.
   - Jacob Rapp: I think there are a couple of vendors that are
        using these recommendations from this methodology.
   - Al: Let's try to get some evidence of that. It's of lasting
        value. Folks should take a look at this. It's in working
        group last call (WGLC), if you hadn't yet.
   - Al: By the way, who has read this draft in the room? I see
        4-5 hands. Good.
   - Ramki Krishnan: Carrying on Al' thoughts, perhaps we could
        add some application notes in regards to big data.
   - Jacob: That's an use that we were thinking of. If you look at
        some of the big data applications, there's a mixture of
        very high speed traffic and east-west traffic.
   - Ramki: Maybe it could even go to the Appendix. Thank you.
   - Marius: About repeatability, you guys say the test should be
        repeated a couple of times. However, you do not specify a
        number. I would say it would be useful to have at least a
        lower bound. Say, a minimum of 20 (like RFC2544). Also, in
        terms of variation, do you mean something like: +/-5% of
        the average as relative error. I think this needs to be
        clarified.
   - Jacob: Are you talking about delay variation?
   - Marius: I think this is applies to most the metrics. In the
        reporting format you specify the variation. How should that
        be reported effectively? Is that a +/- percentage of the
        average? Maybe that should be clarified.
   - Steven Wright: In Section 6.3, about the reporting format you
        are assuming reports are human readable. An it occurs to me
        that one of the major uses for this stuff is going to be
        looking forward to automation. You might want to have some
        text about machine readable formats of the tests results as
        a direction for future work.
   - Jacob: I think it's a good point although I don't know how
        we've done this in the past. How the formatting should be
        done.
   - Al: The past is the past. Forget the past, embrace the
        future. A machine-readable format is something we should
        consider in the draft.
   - Jacob: Maybe this should be a group discussion.
   - Sarah: I think it comes down to what did you intend when you
        wrote the draft. I imagine this was done with human
        readable formats in mind. I would argue that machine
        readable format would not be in this context. I don't think
        you should feel beholden to it.
   - Stephen: I think what I am asking about can be solved with a
        couple of sentences along the lines of: further
        consideration may be given to machine readable formats, for
        the reports to support automation. You don't have to solve
        the whole thing in this draft.
   - Al: In one case that I'm thinking about generally for the
        working group as a whole, in the case of SDN controller,
        these things are going to be packaged for download and if
        you package the results along, that's not a bad thing.
   - Sarah: We saw this in NFVRG as well. This is for future
        consideration, which maybe be tackled in a related draft,
        in which you cover the automation perspective.
   - Marius: The machine readable format is definitely an
        interesting proposal, but (at least for my draft) the
        report is supposed to be read by a human. I support the
        idea of noting this as a future consideration. In relation
        to Al's comment about the past, maybe we should not forget
        the past completely, as we may repeat the past errors.
   - Ramki: I think in the context of devops, discussing
        automation would be a great idea.  I think it would be
        valuable to continue the discussion on the list. Perhaps a
        joint one with NFVRG. Maybe this should be covered in Al's
        considerations draft as well.

3. IPv6 Transition Benchmarking
   Presenter: Marius Georgescu (remote)
   https://www.ietf.org/proceedings/96/slides/slides-96-bmwg-2.pdf
   http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-02

   - Many comments addressed

   - Jacob: I think it's good to have a concise piece, but the  question still
   remains. What if I put the tag at the very
        end of the packet? That's one of the points for the latency
        measurement. If you put something at the very beginning,
        you don't know if something happened.
   - Marius: Do you mean have some padding in the payload before
        the tag?
   - Jacob: Yes. Could you put it in the payload? Probably not, I
        think.
   - Marius: We haven't tested this yet. We will come back with
        some recommendation for this based on the empirical
        results. However, I didn't get your point. Why wouldn't you
        be able to put the tag in the payload?
   - Jacob: You could.
   - Marius: I don't know if putting it at the begging or at the
        end of the payload would affect too much. We will test and
        let you know.
   - Marius: We were thinking to have the 1st WGLC around IETF97.
        What do you think about that?
   - Al: Let's see how many people have read the draft. Some
        hands. That's good. Any other comments? I think we should
        have a WGLC between now and IETF97.  We'll introduce this
        WGLC in a phased approach with the other drafts.

4. Benchmarking SDN Controller
   Presenter: Sarah Banks
   Presentation Link:
   https://www.ietf.org/proceedings/96/slides/slides-96-bmwg-9.pdf
   http://tools.ietf.org/html/draft-ietf-bmwg-sdn-controller-benchmark-term-02.txt
   http://tools.ietf.org/html/draft-ietf-bmwg-sdn-controller-benchmark-meth-02.txt

   - Al: Comparable to maximum frame rate tests. I'm suggesting
        that we back up and look at lossless cases, where packets
        are not lost.  It's going to be more like the normal mode
        of operation we would want to use with these controllers.
        It requires different tooling, it's going to be more
        difficult to measure this and that's were the challenge is
        for vendors in this space.
   - Sarah: We've been having a lot of discussion internally about
        this too. One of the points that I've proposed is that we
        continue coming at it as if you're pumping in too many
        packets and observe what gets in and what gets dropped.
        This is supposed to give some sanity to the test and lead
        to understanding how much traffic I can push through
        without dropping. We still need to address this. We just
        haven't 100% agreed how to do that.
   - Al: It's definitely tricky to keep track of these packet in
        to responses, because they may not be unique and yet they
        may not be sufficient if they're not unique.
   - Sarah: Agreed. [for the next steps] Who's read the draft?
   - Al: We have 5-6 people.
   - Sarah: Any comments?
   - Marius: I think you mean WGLC, not IESG review?
   - Sarah: Yes. We will get comments before we get to that :)
   - Steven: I was a bit confused in the methodology document
        about the security and reliability tests because they're
        pointing back to the test setup in Section 3.2, which has
        data flowing through the data path. I would expect a north-     south
        flow through the controller. I would expect a different test setup.
   - Sarah: I think that it's a mistake to point back to 3.2.  We
        will address this in the next version.
   - Marius: There was discussion about the summarization function
        at last meeting. Is that still being discussed among the
        co-authors?
   - Sarah: We talked about this about 6 months ago and thought we
        clarified. I'll take a look at this with fresh eyes. We
        might need to explicitly state that in the draft.
   - Al: This is a hot topic in the industry and there is even
        some new work on hoping to change the south-bound
        interface. The de facto standard for south-bound protocol
        is OpenFlow, but there's new research in this area, such as
        P4. So, one of the things that we tried to do with this
        draft was to make it more generic. It would be interesting
        to find out too if some of our generic metrics are
        applicable to other south-bound interfaces beyond OpenFlow.
        If it's not that is OK, but it would limit the scope. But,
        let's try to answer that question before somebody else asks
        us.
   - Ramki: One of the things we've seen on the technology front
        is the same vSwitch function implemented in intelligent
        NICs. So, perhaps is worth making some mention of it
        because there area several ways of implementing. It could
        be an ASIC-based implementation it could be an FPGA. What
        it's also interesting is that the management interfaces are
        all preserved, but the only changing element is the actual
        implementation. I don't know how this would fall in the
        scope, but I'm thinking it's worth mentioning.
   - Sarah: Would it suffice to clearly mention what the physical
        and software topology is?
   - Ramki: That and I was also thinking it's worth making
        references to other types of implementations. I can
        actually send the model list. The question is whether we
        want to spend time on identifying the common areas versus
        what is different.
   - Sarah: Let me think about that and talk to Bhuvan and we'll
        come back to you. In the mean time, if we can talk it to
        the list and if you want to share your sources, that would
        certainly help.

5. Benchmarking Virtual Switches in OPNFV
   Presenter: Maryam Tahhan (remote)
   Presentation Link:
   https://www.ietf.org/proceedings/96/slides/slides-96-bmwg-3.pdf
   http://tools.ietf.org/html/draft-ietf-bmwg-vswitch-opnfv-00.txt

   - Newly adopted by the WG
   - Implements BMWG RFCs and Considerations draft for vSwitch
        benchmarksthrough OPNFV Brahmaputra Release

   - Ramki: One if the interesting challenges I see here is the
        feature set. For example, in vSwitch OVS it's a bit of a
        moving target. Recently the rate limit per VLAN was
        introduced. I was wondering how best to capture all this.
        Are we going to say this particular version of the vSwitch
        is the scope of the draft? Are we thinking about a
        different approach? On a similar line, even for the vSwitch
        there are several implementations. One is in the kernel,
        there's also the DPDK. How do we capture that? Is it more
        of a freeze approach? And say this is the scope of the
        draft.
   - Maryam: So far, the draft hasn't been focused on particular
        revisions or versions of virtual switched. We have tried to
        be virtual switch agnostic. We know that virtual switches
        can do X,Y,Z and we plan the test cases around that. Our
        vsperf framework is built as virtual switch agnostic.
   - Sarah: It's on us to take the same methodology and apply it
        to different implementations, rather than to have a
        methodology that is implementation specific. Would that not
        work?
   - Ramki: The features are moving into a certain direction. New
        features are being added. So, the question is: how are you
        going to capture that?  Also, there are some subtle
        differences between implementations. How do we capture
        those? Or maybe we just say this is outside the scope, and
        we only test the base feature set.
   - Sarah: I see your point, but I think it would be helpful to
        submit an example of what you're thinking about, Ramki. The
        authors could then maybe add that to the appendix as a
        discussion point.
   - Ramki: I am also OK with saying this is the base feature set
        in the scope. I'm just trying to make sure we are aware of
        other features.
   - Maryam: It sounds really good. If you could send us details,
        we will discuss it.
   - Al: Just for the clarification of the good discussion, let's
        get some comments on the list so we can have clear actions
        on the draft going forward.

   - Carsten Rossenhoevel: I have two questions. First one is
        about the duration of the pre-qualification. It says test
        at least for 72 hours. I know it's only for a very specific
        case, but I think it sets a dangerous precedent, because we
        see ever increasing requirements. I don't see a basis for
        this requirement, and it makes testing virtually
        impossible.
   - Sarah: I think Maryam covered this the second time this
        presentation came in. Since there are two questions, let's
        have Maryam answer the first one first.
   - Maryam: Regarding the 72 hours test, it was actually feedback
        from many of our customers. The actual test described is
        the soak test, a prolonged variation of one of the basic
        tests. This is what our customers actually do. This is
        where the recommendation came from. Of course, the test
        duration could be reduced to 24 hours for example, or the
        working can be more loose around it. But, I still think
        this is a useful test to have.
   - Carsten: I think customer feedback is irrelevant. What is
        relevant here is mathematical and statistical relevance and
        that would need to be calculated. In this industry we often
        are happy with a 95% confidence interval. We run
        reliability and resilience tests only 3 times, which is
        wildly off the mark of what is statistically relevant. At
        the same time, we send a trillion packets, for over 72
        hours and we get a confidence interval of 99.99% probably.
        So, I think this would need to be justified more clearly.
        What is the mathematical model? We could alternatively run
        the test 100 times to be on the safe side.
   - Sarah: You mention here two points. One is the duration of
        the tests, and the other the number of repetitions.
        Regarding the duration, I understand where you're coming
        from, but when you're taking on something as new as a
        vSwitch, because we don't completely understand it, we want
        to understand how its performance degrades over time.
        Ignoring what customers want to do, I  don't think it's
        going to take you very far in this particular topic. I
        think there's widespread agreement that they are going to
        do it anyways and probably for very valid and prudent   reasons.
   - Carsten: It's not about ignoring customers, in the beginning
        era of router testing, the early RFCs (e.g. RFC2544) were
        saying 2 min or 3 min, not 72 hours.  We can run these
        tests as long as we want. The question is, what does the
        standard require as a minimum. And this draft requires 72
        hours as a minimum.
   - Al: We do allow shorter durations. In fact, is I remember
        correctly the range is 6-72 hours. I resonate with this
        test because software errors and collisions in software
        systems are probabilistic. You may have to run all sorts of
        processes before you get the confluence of events that
        causes the performance problems that you would want to
        measure on the long term. Because, they are going to affect
        your operation in the long term. And that's the kind of
        thing this is after. It's not that you have to run every
        test here for 72 hours. But checking once, that's perfectly
        possible.
   - Carsten: My second question is: the core of the document
        doesn't really describe the actual tests. It just
        references vsperf, how is this going to work? How can I use
        this document in the end? It refers to a moving target.
   - Al: that's exactly right. It refers to a moving target
        because this is an open source project, and what we are
        planning to do is split it into two things: the LTP and the
        LTD. When we have those things stable, we are going to
        bring all of that material in and propose that as an RFC.
   - Carsten: So, the plan is to reference an ongoing work
        document?
   - Al: That's right. It's the summary of ongoing work. And it's
        necessary to bring that in, so we can have this connection
        between our traditional standards and the way open source
        works.
   - Stephen: I think there may need to be some improvement in the
        wording in the section of the 72 hour soak test. The point
        of soak tests is not to measure the performance, not to
        measure the numerical value of throughput. The throughput
        is just a load to stress the system to make sure it's
        stable before that period of time. And that's to for with
        the collisions and the memory leaks. And if you detect a
        variation in the throughput, you detect the existence of
        some other problem. Maybe some clarification in the wording
        will help.
   - Marius: I want to support the discussion. I think there
        should be some sort of justification for having 6-72 hours
        of soak test. Why are 6 needed? Why not 1-2 hours?
   - Sarah: Why don't we take this discussion to the list? We
        could spend the next hour talking about this.
   - Pierre Lynch: I want to address this as well. Is the goal of
        that 72 hours test really benchmarking? It's bug finding.
        Does it belong in a benchmarking discussion? The goal is
        not to find the performance, the goal is to find the bug.
   - Sarah: One hopes there is no bug, and there's proof therein.
        But, is it a valid benchmark? I think this should be taken
        to the list, because I suspect this is going to generate
        conversation for an hour.
   - Maryam: It would be great if we can start that discussion on
        the list. For me, it's also a performance stability test,
        which is important.
   - Al: As a final summary, a lot of our tests are measuring the
        performance of functions that we assume are operating
        correctly. As consequence, when you get a performance
        result, you are also getting the result of the functional
        test. It's just that functional tests are beyond our scope.
        We don't do anything that's pass/fail here.

6. Benchmarking Performance Monitoring on a DUT
   Presenter: Sudhin Jacob
   Presentation: https://www.ietf.org/proceedings/96/slides/slides-96-bmwg-4.pdf
   Related Draft:
   https://tools.ietf.org/html/draft-jacpra-bmwg-pmtest-01

   - Al: Who's read this draft? I see two hands. Any comments?
   - Marius: I skimmed through the document. There's one thing I
        was wondering about: what do you mean with line rate?
   - Sudhin: If the interface capacity is 1Gbps, 80% of the line
        rate is 800 Mbps. We will update that wording.
   - Marius: I was trying to clarify you weren't talking about
        throughput.
   - Sarah: I have many comments. I will send them on the mailing
        list.
   - Al: Do you have a big one you would like to discuss?
   - Sarah: There's some discussion to have about the soak
        testing. I don't know if it's only about core and memory
        leaks when we do a soak test, and I'm surprised considering
        you are testing a core router, the test is only 12 hours. I
        think you're also making an assumption on implementation
        for counter-wrapping as well. Under reliability, I also
        have some concerns here. I am not sure that this is as
        robust as I would have expected reliability to be. In most
        tier 1 service providers you're going to see a certain
        amount of SPFs happening, even per minute. I am not
        entirely sure that having a steady flow of traffic really
        buys you any value, because I don't know if it has that
        relation with the real world. Maybe it's as simple as
        modifying or fluctuating the traffic rate. I'll bring the
        rest of the comments to the list.
   - Sudhin: That's a good point. I'll update that. In the real
        world it wouldn't be a steady flow. It would be a
        fluctuating. Is that what you mean?
   - Sarah: I would also modify the number of SPFs per second
        happening, because there was difference in convergence as
        well, which often makes a difference in memory utilization,
        not necessarily the traffic throughput. And then bursting,
        when and how are you doing that.
   - Al: Who in the room is willing to review this draft and
        provide comments on the list?
   - Al: We have two reviewers: Marius Georgescu and Kostas
        Pentikousis. We would need more reviews before considering
        taking this on as a WG draft.

7. Benchmarking Methodology for EVPN
   Presenter: Sudhin Jacob
   Presentation: https://www.ietf.org/proceedings/96/slides/slides-96-bmwg-5.pdf
   https://tools.ietf.org/html/draft-kishjac-bmwg-evpntest-01

   - EVPN: BGP MPLS-based Ethernet VPNs (from RFC 7432)

   - Al: Where is the test equipment in these topologies?
   - Sudhin: Do you mean the tester? I missed that.
   - Carsten: Generally the test methodology says something like
        measure that these MACs are learned, but it doesn't say
        how. If you look at RFC2889 which admittedly is quite old
        the requirement is to measure on the data plane. If a MAC
        is learned, RFC2889 says to verify on the data plane that
        there is no more flooding.  You coming from a router
        vendor, I would guess that you would want to check if all
        the entries were learned. I would recommend either a
        reference to RFC2889 or describe exactly the measurement
        procedure on the data plane not on the management plane.
   - Sudhin: Please reiterate these comments on the mailing list
        and we will think about that. There are two types of
        learning in EVPN: locally in the data plane and remote in
        the control plane.
   - Sarah: I resonate with his point. I would think it's valid to
        at least comment on.
   - Sudhin: Please mail this to the mailing list.
   - Boris Khasanov: Similar question: you mentioned 8ks (8000
        entries). We need some explanation: why not 6ks or 10 ks?
   - Sudhin: That's a good point. The 8ks are based on the service
        requirements. Studying the services used in the vendor, we
        have reached that limit. It's not a hard limit, it could
        also be 4k. The scaling of the service is the reason why
        we chose the 8k.
   - Sarah: You could either justify the 8k, which I think it's
        fine or specify an n, where the user fills in the n and
        have some conversation around why we would pick the 8k.
   - Sudhin: That's a fantastic comment: let the user pick the n.
   - Pierre: For measuring reliability and scale, what are the
        performance metrics? Are we measuring performance?
   - Sudhin: I think we are. In reliability conditions the system
        should perform. The algorithm should not change because of
        these conditions.
   - Pierre: I get the test, but are we measuring performance? And
        in the scale as well. As you said, this should work, is
        should happen like this. So, is this the complete test
        working group or the benchmarking working group?
   - Al: I agree,  this doesn't sound like a performance
        characteristic. It sounds more like a functional test.
        Let's look at this in detail.
   - Marius: To clarify if this is a pass/fail test or not, what
        is the output of you test? Is it a number?
   - Sudhin: It's not pass or fail, but the algorithm  should work
        as expected. We should get the correct p.
   - Marius: I'm looking forward to the update. Currently, I don't
        understand it like that.
   - Sudhin: We will revisit that.

NEW/FIRST TIME:

8. Benchmarking Methodology for PBB-EVPN
   Presenter: Sudhin Jacob
   NEW draft
   Related Draft:
   https://tools.ietf.org/html/draft-kishjac-bmwg-pbbevpn-00

   - PBB-EVPN defined in RFC 7623
   - Al: The draft has a different focus, but it looks like the
        same tests.
   - Sudhin: It looks similar, but the parameters are different.
   - Marius: Since it's so similar to your previous document, why
        not have one document? You could have a section explaining
        the differences between the two. If we have a methodology
        for every protocol that is proposed in the IETF that's not
        very efficient.
   - Al: It would be a lot of overhead. I feel the same way. This
        could be easily combined with the previous document.
   - Sudhin: That's a good point. We could add a subsection in
        this document explain the differences.
   - Marius: You could have two different sections in the same
        document for the two protocols.
   - Sarah: Take a look at the documents and try to see if they
        can be combined. If they cannot it is fine. But, I suspect
        you will find that they can.
   - Sudhin: We will try combining them in the next iteration.
   - Kostat: I would second Marius's proposal. I had a very quick
        look through the drafts, and I think it makes more sense.
        That may mean that you need to change the title. Since I
        volunteered for the review, should I wait until the merger
        happens?
   - Al: You volunteered for another one
        (https://tools.ietf.org/html/draft-jacpra-bmwg-pmtest-01).
        So, if you're volunteering for all three, you win the
        prize.
   - Kostas: I don;t  know, but it looks like there's a lot of
        overlap.
   - Sudhin: There is overlap since both protocols depend in the
        BGP payload and the scenarios are almost intersecting.
   - Kostas: I'll do the first review and maybe after.
   - Sarah: Who's read these drafts?
        No one at meeting had read drafts.
   - Sarah: Who's willing to read these drafts?
        Three BMWGers agreed to read the draft: Sarah, Marius and
        Giuseppe Fioccola.

9. Considerations for Benchmarking Network Virtualization
   Platforms
   Presenter: Jacob Rapp
   NEW Draft
   Related Draft: https://tools.ietf.org/html/draft-bmwg-nvp-00


   - Sarah: How many folks have read the draft?
        1 person in room has read draft.

   - Marius: How would you think the repeatability problem can be
        solved?
   - Jacob: As long as you can run multiple iterations, it should
        be fairly consistent.
   - Marius: Since there's no specific traffic generator there,
        depending on the size of the data, you will have different
        results every time.
   - Jacob: if you clearly specify the variables that you're
        putting in, it should be fine.
   - Maciek Konstantynowicz: Do you treat your SUTs as white or
        black boxes?
   - Jacob: We can't treat it like a black box. We have to
        document what's inside the box. Configuration for example.
   - Maciek: Configuration but also operational data?
   - Al: It's ok to collect operational data, but it's not part of
        the benchmarks.
   - Maciek: We're almost in agreement, and I volunteer to review
        the draft.
   - Steven: I was confused by the text in Section 2, the platform
        description. Some wording improvement would help, I think.
   - Jacob: Please send that comment on the list.
   - Catalin Meirosu: Two quick comments. One is about the fact
        that this goes very close to benchmarking cloud
        environments, and for cloud environments there is a lot of
        open source out there that captures the conditions that
        you've mentioned. When it comes to repeatability, a lot of
        the testers like IXIA or Spirent, they would do Layer 7
        testing. In academia, I noticed that traffic traces are
        used for repeatability. I can comment more on the mailing
        list.
   - Jacob: Traffic traces would work as long as they are
        stateful.
   - Catalin: You can do TCP reply for example.
   - Sarah: Taking this to the list is very important. Al started
        out today with the single factor variation which clearly
        obliterates that, or certainly calls it into question. And
        I don't think it's in itself bad. But we will have to take
        it to the list to alleviate some of the discussion around
        that.
   - Al: If Scott Bradner were here, he would have jumped out of
        his chair 5 min ago. Let me take that point for him.
   - Ramki: You have all this capabilities which are also
        applicable to NFV, I will send link to the NFVRG as well.
   - Sarah: If it gets adopted, we will ask you to take it to
        NFVRG anyhow.
   - Al: My final comment, both as an author and co-chair, we want
        to be sure that this draft is clearly distinct from the
        other considerations draft, which takes into account not
        just VNFs, but the whole infrastructure. We need to figure
        out how to keep things distinct.

10. Revisiting Benchmarking Methodology for Interconnect Devices
    Presenter: Daniel Raumer raumer@in.tum.de;
    This paper was also presented at ANRW on Saturday, July 16
    https://irtf.org/anrw/2016/anrw16-final12.pdf

   - Kostas: Thank you very much. This is great work. I would like
        to go back to that figure (histogram plot). How many people
        in the room have read the work of Edward Tufte "visual
        display of quantitative information"? My favorite is box
        plots. It is easy to plot and packs a lot of information. I
        would recommend this sort of plot. Everyone that does a
        visual display of quantitative information should read that
        book.
   - Al: Quick commercial :)
   - Kostas: I never met the guy, but I think the book is highly
        recommended.
   - Al: The R programming language contains all this stuff and I
        haven't seen a box plot with whiskers in a long time.
   - Kostas: The second is R. If it's statistical analysis, don't
        use anything but R. I think the greater discussion here is
        do we actually as a group think that RFC2544 needs to be
        revisited?
   - Sarah: You are welcome to bring a draft.
   - Al: That's one of the reasons for inviting Daniel to give the
        talk here. This is an obvious next step. Thanks for
        speaking the unspoken.
   - Kostas: I would be interested in this kind of work.
   - Marius: I would support Kostas on the revisiting. I think
        some parts definitely need to be revisited. I tried to do
        that with the latency in my draft,  following some
        discussion about summarization with Paul (Emmerich). His
        suggestion was multiple percentiles. Again, aren't those
        too many? We are trying to be synthetic in reporting. If
        you find some patterns and they are relevant I think three
        numbers would be enough (1 central tenancy, 2 variation). I
        think that both box plots and histograms are useful for a
        fine grain analysis. But, I think the main report should be
        more synthetic.
   - Al: I think most tools today have gone beyond the static,
        very minimal RFC2544 latency specification, which was what
        Scott could do in his lab 20 years ago.
   - Daniel: In the old RFC only 20 samples were mentioned. With
        that you couldn't even get a graph like this one.
   - Marius: The 20 iterations are a lower bound.
   - Ramki: Do the tests you did reflect any real world
        applications? What was really the objective of these tests?
   - Daniel: This was actually Open vSwitch.
   - Ramki: What kind of traffic patterns were used? Was it
        synthetic or more towards a real world application?
   - Daniel: This were really synthetic. We were not in the
        position to push some specific trace.
   - Ramki: The reason I'm bringing this up is, as you can see,
        the latency numbers are varying, but does it really matter
        to the end application? If you consider a network function,
        you know there is going to be variation. Do the question
        probably doesn't matter at all. I would say bring an
        application that really matters into consideration.
   - Al: I think your key point is: Where does it really matter.
        But, you have to have a test to see where it really
        matters.
   - Ramki: I would say going back to question "did we run the
        right test?". Was the test crafted correctly?
   - Al: The problem with going to the real world is that the real
        world is different for everyone. So, we should have a set
        of tests that we all agree on and everybody can go on and
        expand that. I think what we're talking about here is a
        pattern sensitivity, CBR producing some unusual results,
        and what do we need to do about our absolutely recommended
        set of tests, which not necessarily look like a real world
        scenario.

11. Overview of work in Linux Foundation project FD.io CSIT
    Presenter: Maciek Konstantynowicz <mkonstan@cisco.com>
    More info on the project here: https://wiki.fd.io/view/CSIT

   - Ramki: Is the scope only Layer 2-4? Or is it beyond that?
   - Maciek: I apologize, I went quickly through that. It's L2, L2
        bridging and cross-connecting, it's got IPv4 and IPv6
        routing. RFC compliant bridging and routing stack. No
        control plane. There is also VXLAN GPE, LISP GPE and IPSec
        that we haven't tested yet on fd.io.
   - Ramki: So, basically switching and routing are covered.
   - Maciek: We also have stateless security and some folks
        developing a stateful firewall. So threat it like a packet
        processing system. It can be used as a VNF, it can be used
        as a virtual switch.
   - Ramki: To give a broader context, for example in NFV we have
        several types of network functions: stateful firewall, IDS.
        One of the strategies we have is bringing platform
        awareness: NUMA, CPU pinning, all of the L3 level cache
        partitioning. How do you see the integration with that?
   - Maciek: To be honest, the committee is going to decide how
        this is going to be used. It could be used as a virtual
        switch, virtual router or virtual appliance. The system is
        NUMA aware and it comes with the box. How it will be used.
        I think the users will really decide. Currently most of the
        code it stateless. There is some publish material on the
        page and there will be more.
   - Al: I heard the word vector there. Vector means traffic
        patterns. Homogeneous vectors go through this very fast.
        Heterogeneous ones, not so much.
   - Maciek: Well, then you have an adaptive system, that can
        adapt to a variety of traffic patterns. And that's a goos
        point for calling for extended benchmarking methodologies.
   - Al: Thank you everyone for staying over time. Please do some
        homework, read some drafts while at the beach :) Anything
        to add, Sarah?
   - Sarah: When you read drafts it's usually a mutual service.
        You read theirs, they will read yours. Please read.

LAST. AOB.
Minutes for BMWG at IETF-96 minutes-96-bmwg-1

Minutes for BMWG at IETF-96
minutes-96-bmwg-1