[{"author": "\u67f3\u7530 \u6dbc", "text": "<p>Hi, I'm attempting note-taking, any help would be welcome!</p>", "time": "2023-07-25T16:49:38Z"}, {"author": "Eve Schooler", "text": "<p>Thank you!</p>", "time": "2023-07-25T16:50:38Z"}, {"author": "David Oran", "text": "<p>Q: By \"supply chain\" do you meneaban where you get the various spices of software/hardware from, or what logical topology your data flows traverse in achieving the computation you want to do?</p>", "time": "2023-07-25T16:54:22Z"}, {"author": "David Oran", "text": "<p>Maybe snarky, but I'll ask anyway: isn't <em>ALL</em> computing actually \"computing in the network\" by this taxonomy?</p>", "time": "2023-07-25T16:57:10Z"}, {"author": "David Oran", "text": "<p>wait to the end: I'm just writing things as i think of them</p>", "time": "2023-07-25T16:57:42Z"}, {"author": "David Oran", "text": "<p>What do you mean here by \"layer\" - not what conventionally people say? You seen to mean \"place in the distributed computational graph of your application\" not \"level of abstraction in composing networking stacks\".</p>", "time": "2023-07-25T17:02:24Z"}, {"author": "Eve Schooler", "text": "<p>I think of de-cloudization = edge-ification (the pendulum swinging back from centralization to decentralization/distributed)</p>", "time": "2023-07-25T17:02:26Z"}, {"author": "David Oran", "text": "<p>@eve: yes, except nearly all nontrivial applications running inside cloud data centers are massively distributed - for example, Google Adsense on-demand ad placement has 600+ independent distributed modules that can talk to each other for placing a single ad.</p>", "time": "2023-07-25T17:04:10Z"}, {"author": "Eve Schooler", "text": "<p>except they are centrally managed and highly homogenous</p>", "time": "2023-07-25T17:04:32Z"}, {"author": "David Oran", "text": "<p>centrally managed, yes, highly homogeneous, o.</p>", "time": "2023-07-25T17:04:52Z"}, {"author": "Eve Schooler", "text": "<p>the HW/SW on which they run</p>", "time": "2023-07-25T17:05:11Z"}, {"author": "David Oran", "text": "<p>0=no</p>", "time": "2023-07-25T17:05:15Z"}, {"author": "David Oran", "text": "<p>Not true anymore when you have purpose-built hardware (e.g. TPU complexes for AI, in-cage specialize secure storage for customer keys, etc.).</p>", "time": "2023-07-25T17:06:19Z"}, {"author": "David Oran", "text": "<p>There's tons of research on edge computing and how to make it work robustly and with good performance - if that's the main target for COINRG then ok, but we don't necessarily need an IRTF forum fo that. I've advocated for more basic research focus that is independent on the deployment model - for example how do yo compose the distributed computing graph when it has to laid out over latency-sensitive, and possibly bandwidth constrained computing resources.</p>", "time": "2023-07-25T17:12:06Z"}, {"author": "Marie-Jose Montpetit", "text": "<p>@Dave Supply chain in how you supply network services :) Simple</p>", "time": "2023-07-25T17:18:52Z"}, {"author": "Eve Schooler", "text": "<p>Composition of the graphs is indeed a key challenge, given constraints/preferences/SLAs</p>", "time": "2023-07-25T17:18:56Z"}, {"author": "David Oran", "text": "<p>@MJM - ok, but this got confounded in my head with your somewhat strange use of the term \"layer\". I'm not sure what's changed with respect to this definition of \"supply chain\" since from the early days of networking you had your computer hardware vendor, your software vendor, your local LAN on campus, the Arpanet or NSFnet, the guys running a supercomputer in a national lab, etc. etc. It isn' clear the that this kind of \"supply chain\" is nay more complicated today, modulo the complexities of selecting, composing, and deploying open source software modules.</p>", "time": "2023-07-25T17:22:49Z"}, {"author": "Eve Schooler", "text": "<p>Q: So is it that we need an MQTT-type comms system for a more distributed setting?</p>", "time": "2023-07-25T17:27:13Z"}, {"author": "Marie-Jose Montpetit", "text": "<p>This is a long discussion :) The \"supply chain\" people are into this - it is not a chain anymore it is a \"network\" - and very much following network-like protocols. See \"supply chain networks\" - they also have intelligent nodes.</p>", "time": "2023-07-25T17:27:43Z"}, {"author": "Marie-Jose Montpetit", "text": "<p>@Eve MQTT maybe not... more local?</p>", "time": "2023-07-25T17:28:09Z"}, {"author": "David Oran", "text": "<p>@MJM - let's take this offline - it seems you're talking about stuff running at layers 8 and 9 in your references to \"supply network\"?</p>", "time": "2023-07-25T17:31:51Z"}, {"author": "Marie-Jose Montpetit", "text": "<p>Yes</p>", "time": "2023-07-25T17:32:24Z"}, {"author": "Eve Schooler", "text": "<p>(intra vs inter cluster comms differentiated from each other)</p>", "time": "2023-07-25T17:33:31Z"}, {"author": "David Oran", "text": "<p>@jorg my question is: what do the applications running under Oakestra trust Oakestra to do and not do, and conversely what does Oakestra trust the applications it's running to do or not do?</p>", "time": "2023-07-25T17:38:50Z"}, {"author": "Joerg Ott", "text": "<p><span class=\"user-mention\" data-user-id=\"44\">@David Oran</span> : the applications expects Oakestra to run them in a way that matches their demands (all resources available, possibly limitations of latency or so considered) and then to set up a networking environment in which the application components can talk to each other. The application then need to do their stuff including all communication down to sending IP packets. But Oakestra offers a namespace for each application, so they can talk. Other than this Oakestra does not interfere, but id monitors system load (and we are looking at allowig application to report specific metrics that describe their degree of performance fulfillment)</p>", "time": "2023-07-25T17:45:15Z"}, {"author": "Eve Schooler", "text": "<p>Folks, please take the outstanding questions from the previous talk to the chat or the e-mail list</p>", "time": "2023-07-25T17:45:25Z"}, {"author": "David Oran", "text": "<p>@jorg -thanks. So applications are still responsible for their own failure detection, reconfiguration, recovery - Oaskstra relieves them of the task/responsibility of figuring out where to instantiate themselves and auto-scale, arrange that fate-shared things aren't over-distributed, right?</p>", "time": "2023-07-25T17:48:42Z"}, {"author": "Joerg Ott", "text": "<p>Applications that fail get relaunched, node crashes lead to auotmated reconfiguration, so this is taken care of. Also, observing that performance goes down can lead to more instances being spawned. We do have load balancing functionality (such as simple RR but more sophisticated policies are possible)</p>", "time": "2023-07-25T17:51:28Z"}, {"author": "Marie-Jose Montpetit", "text": "<p>This is very automative-centric - does it apply elsewhere?</p>", "time": "2023-07-25T17:53:12Z"}, {"author": "Sharon Barkai", "text": "<p>@MJM the pipelines for sure, the foundation models are very generic and can apply to multiple physical world processing, DTs of observed entities, as well as training automation based on observed behavior per conditions.</p>", "time": "2023-07-25T18:02:34Z"}, {"author": "Sharon Barkai", "text": "<p>As for GPUs, vehicles look like a sources of high-power and mostly idle compute which can be leveraged in the near future due to electrification. However it will seem likely given GenAI leapfrog that GPUs will become much more pervasive in the processed environment (inference) even if training is in the cloud.</p>", "time": "2023-07-25T18:05:33Z"}, {"author": "Nitinder Mohan", "text": "<p><span class=\"user-mention\" data-user-id=\"44\">@David Oran</span> to add on to <span class=\"user-mention\" data-user-id=\"636\">@Joerg Ott</span> 's answer to failures, similar to other orchestration systems, Oakestra's view of the \"OK\" application performance is from the utilization/service description perspective. The orchestrator consistently monitors if the current application service utilization at the edge node is within the requirements defined by the developer in service description. If the application fails (say container stops running) or becomes over-utilized (consumes 100% specified CPU boundary), Oakestra will detect this and attempt to redeploy the affected service on another node asap. There can also be multiple replicas of the service and Oakestra has several load balancing policies for managing network traffic across them.</p>", "time": "2023-07-25T18:16:56Z"}, {"author": "Eve Schooler", "text": "<p>Q: I would be curious to see something like this implemented and compared with other approaches</p>", "time": "2023-07-25T18:17:34Z"}, {"author": "David Oran", "text": "<p>Seems to me telemetry is a function of any other other pieces, not a plane.</p>", "time": "2023-07-25T18:19:27Z"}, {"author": "Marie-Jose Montpetit", "text": "<p>I agree</p>", "time": "2023-07-25T18:20:07Z"}, {"author": "David Oran", "text": "<p>Why should I do telemetry for instrumenting compute separately from telemetry of connectivity/communication, from telemetry of services?</p>", "time": "2023-07-25T18:20:19Z"}, {"author": "David Oran", "text": "<p>And there's a recursion fee, since doing telemetry requires the use of compute, connectivity, and service functions.</p>", "time": "2023-07-25T18:20:47Z"}, {"author": "Nitinder Mohan", "text": "<p>Note that Oakestra cannot detect \"internal\" application operation -- say frames being dropped due to GPU contention or bad hardware optimization or large service queue buildups-- as it wont show up in hardware level metrics that it monitors. We are currently working on application-aware orchestration where applications may choose to report internal service metrics with the orchestration control plane which can take that into account for further decision making</p>", "time": "2023-07-25T18:21:03Z"}, {"author": "Marie-Jose Montpetit", "text": "<p>Can you explain what is the link to quantum?</p>", "time": "2023-07-25T18:21:07Z"}, {"author": "Nitinder Mohan", "text": "<p>Sorry for being off-topic to current discussion :)</p>", "time": "2023-07-25T18:21:24Z"}, {"author": "Marie-Jose Montpetit", "text": "<p>np</p>", "time": "2023-07-25T18:22:58Z"}, {"author": "David Oran", "text": "<p>@Pascal: i think you mean \"accounting\" whenyoiu say \"billing\"</p>", "time": "2023-07-25T18:23:47Z"}, {"author": "David Oran", "text": "<p>they aren't the same</p>", "time": "2023-07-25T18:24:11Z"}, {"author": "David Oran", "text": "<p>you can bill without accounting, and vice versa</p>", "time": "2023-07-25T18:24:28Z"}, {"author": "Greg Schumacher", "text": "<p>It could also focus merely on charging events and leave it to other infrastructure for accounting and billing - this is the 3GPP model.</p>", "time": "2023-07-25T18:30:57Z"}]