Artwork

Content provided by Asim Hussain and Green Software Foundation. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Asim Hussain and Green Software Foundation or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

The Week in Green Software: Green Kernels

45:21
 
Share
 

Manage episode 373897523 series 3336430
Content provided by Asim Hussain and Green Software Foundation. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Asim Hussain and Green Software Foundation or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
In this episode of TWiGS we delve into the intricate world of measuring software energy consumption, a topic vital for reducing our carbon footprint. Despite the strides in greening software, knowing how much energy software consumes remains a challenging puzzle, especially in the cloud computing era. Joining host Chris Adams are guests, Aditya Manglik and Hongyu Hè, graduate students from ETH Zurich in Switzerland. With their expertise in improving energy efficiency in systems, particularly operating systems, microarchitecture, and machine learning, we embark on a captivating journey to understand why quantifying software energy usage is intricate and what innovative solutions are emerging. Stay tuned as we amplify the geek factor to 11 and uncover the complexities of this critical field.
Learn more about our people:

Find out more about the GSF:

Topics:

Resources:

If you enjoyed this episode then please either:

TRANSCRIPT BELOW:
Aditya Manglik: At the end of the day, what we want to tell people is, okay, computing is great, but we have to be sustainable. And right now, data centers consume 3% of all global electricity. This number is only going to grow, right? Especially after COVID, we have had a massive increase in Digitalization, and now with the large language models coming in, like ChatGPT, it's going to grow exponentially.
So we have to be sustainable, and the first step to being sustainable about energy use is to understand where is the energy going.
Chris Adams: Hello, and welcome to Environment Variables, brought to you by the Green Software Foundation. In each episode, we discuss the latest news and events surrounding green software. On our show, you can expect candid conversations with top experts in their field who have a passion for how to reduce the greenhouse gas emissions of software. I'm your host, Chris Adams. Welcome back to The Week in Green Software on Environment Variables, where we bring you the latest news and updates from the world of sustainable software development. I'm your host, Chris Adams. When we talk about greening software, a lot of the time we talk about how much energy we use, because even in 2023, more than half of the electricity we use globally is still generated by burning fossil fuels. And we've spoken before in other episodes about how you can make the electricity you use greener, but sometimes you just need to be able to use less electricity in the first place. And to do that, it helps to know how much energy software is using in the first place. This sounds simple, right? In a world of cloud computing, this turns out to be surprisingly hard, and today we're turning up the geek factor all the way to 11 to figure out why this is hard and what the state of the art looks like. Helping with this journey today, we have two special guests from ETH Zurich in Switzerland, whose work we featured in earlier episodes, and we'll see how far we can get in the time we have today. So with us today, we have Aditya. Hey Aditya!
Aditya Manglik: Hi Chris, please feel free to call me Adi. It's a pleasure to be here on this podcast. And, yeah, I'm a longtime listener of this podcast, so it's very exciting to be here. I'm a graduate student at ETH, where I work on improving the energy efficiency of systems, especially operating systems and microarchitecture.
And I previously worked on building a very nice, very complex energy attribution system in Linux as a Google Summer of Code student with the GNOME Foundation.
Chris Adams: Cool. Thank you, Adi. And in addition, we have Hongyu also. Hongyu, I'll give the floor for you to introduce yourself as well.
Hongyu Hè : Yeah, thanks. Thanks very much for having me on, Chris. And thanks for inviting me, Adi. So yeah, I'm also a graduate student in computer science at ETH. My research includes both software and hardware. And I'm currently interning at Apple, working on machine learning research.
Chris Adams: Cool. Thank you Hongyu. So for those who are interested, we featured both of, uh, the work from both of these two researchers. In the last episode, we spoke a little bit about Hongyu's, uh, uh, paper, uh, what one of the papers Hongyu was a contributor on at the Hot Carbon Conference. And we'll share a link to the talk presented there.
And we've also shared a link to Adi's talk at the Linux Foundation Energy Summit in Paris earlier on in June. If you're new to this episode, to this podcast, my name is Chris, I'm the Executive Director of the Green Web Foundation, and I'm the Chair of the Policy Working Group at the Green Software Foundation, and as a final reminder, we're going to cover a fair few papers and links and resources, and what we'll do is we'll add all of these to the show notes so that you can do your own research later on as you run through this. All right then, I think we're all sitting comfortably, so shall we begin, fellas?
Aditya Manglik: Yeah, I look forward to it,
Chris Adams: All right. Okay. Adi, I think I might start with you first. We've spoken a little bit about tracking energy consumption and why it's an important thing. Maybe you could just give a bit of a kind of overview about why this is important, what the state of play is in the different systems, because we know that computers run on, say, Linux. Lots and lots of machines run on Linux, but we also know computers use Windows and macOS. Maybe you could provide a bit of background, then we could talk about what the options are for people using these systems.
Aditya Manglik: absolutely. I've been working on this problem since almost five to six years now and it's an absolute pleasure to be talking about it. Well, I often like to say that you cannot improve what you cannot measure and that is where the problem starts. We don't know how to measure the energy consumption of our systems.
For example, if I ask you, how much energy does WhatsApp use? Or when you send a WhatsApp text to your friend, how many CO2 emissions did that message take? Learn Can you give me an answer? No, that's what makes me so excited to get out of the bed every morning and then try to figure out, okay, how much energy is WhatsApp using?
So it turns out that people at Microsoft, Apple, Google also care about this and they really tried to solve this problem and Microsoft has this very interesting kernel system called the Windows Energy Estimation Engine. It is running on all Windows devices. Android has a very interesting service called PowerMetrics.
You can think of it like a daemon. A daemon is a magical service that runs in the background of your system that does all the stuff for you and you don't know that it exists. PowerMetrics on macOS also collects all possible data about the energy consumption of your applications. Now, what about Linux, right, we are, we love open source, and Linux is a very important operating system, right, all servers in this world are, majority of them are running Linux, but we don't know how to measure the energy consumption of these servers, especially from the software, right, people often think to measure energy you need these hardware devices, or you need these electrical engineers to come in and plug monitors and then tell you, oh, this consumed 5,000 joules.
No, we want to solve it using the tools that we have, and I think we can solve it. I believe we can solve it. And that's what I'm working on.
Chris Adams: Okay, so you just mentioned two things, first of all. So one, first of all, you said that if you're using a Windows machine, there's existing tools that you can tap into and get readings from. And if you're using an Apple machine, you've got access to those kind of figures. But it's a somewhat murkier situation with Linux right now, there isn't a kind of common tool that is actually universally used.
That's One of the key takeaways I'm getting. And Hongyu, I believe this is what you've been finding as well, and you've been looking into some of this as well,
Hongyu Hè : Yeah, exactly. I think there are tools, but there is no common thing that everyone uses. And the standards of those tools are varying quite a lot. That's also, as you said, one of the reasons why we contacted the research in the first place.
Chris Adams: Okay, alright, so if I understand this, given that the majority of servers are now running Linux, basically not having some tools for the most common kind of operating system is one of the things which makes it difficult to come up with some of these numbers. That's what I think I'm understanding from here. We're naming this episode Green Kernels, and I figure it might be worth actually just talking about this idea, because this sounds like a relatively low level thing that's built into systems themselves, actually. Adi, could you maybe talk a little bit about this part here? Because I think that you've spent a bunch of time looking at this low level part of an operating system like Linux, like this kernel part. And before we dive into that, maybe you could actually explain what a kernel is and why that might be somewhere that you actually track some of this. 'cause not everyone may know what a kernel is when you're thinking about computers in the first place.
Aditya Manglik: Chris, first of all, I love the name Green Kernels, like when you talked about, to me, talked about this podcast and when you named it Green Kernels, I was like, yes, I came to the right place. Okay. And yeah, what are the kernels? I think our audience is really smart and even smart people sometimes just need to quickly jog their memories.
So what we're going to do is quickly jog their memories. A kernel is the core of an operating system. What does that mean? Okay, so for us a computer is just a computer on which we log in and do something, but what we do is an application, we use Microsoft Word, PowerPoint, Excel, these are all applications, and Windows that is running these things is the operating system, and operating system comprises broadly two parts, a kernel, which is the core, that you don't see, which handles everything for you and the user space, which is what you interact with.
So you know that start button, that you click that is part of the user space and that start button goes behind the curtains and does some interesting stuff that comes back to you and yeah, you see the effect of your action. So the kernel is the primary response, primary entity in any operating system that is responsible for managing the hardware, the applications, the processes.
Chris Adams: Alright, so this kernel part is the thing that. So far, for Windows machines and for Apple machines, there's something in there, but for Linux machines, you don't have that same ability to read information yet. And this is some of the work that you've been doing to look into to basically make some of that readable.
Is that the case?
Aditya Manglik: That's a great question, Chris. So, energy is typically thought of as an electrical engineering topic. And it's difficult and it's fancy. No, people typically don't include energy monitors and systems. That is the fundamental reason why we are trying to do this. You can measure the performance of your programs.
You can measure how much time it took and this is possible because your system tells you how much time your program took. Your CPU tells you how many clock cycles your program executed for. But if you ask it, okay, how much power it consumed or how much energy it consumed, I think things fall apart. And that's why you need to do a lot of modeling and build entire systems to figure out this information.
Now, Linux also has this information, but the models, right, so you can have all the data in the world, but until you know how to make sense of that data, it is useless, right, and that is what the model does for you, and these models don't exist in Linux, they exist in Windows, they exist in Mac OS, Android, iOS, But I'm not so sure if they are existent or if they're good enough for Linux.
That's what I think, but if you know about it, let me know. I would be very happy to know.
Chris Adams: All right, and when you're talking about a model here, maybe you could just elaborate on that, because I'm not sure I quite follow when you talk about something being a model like this. If I'm, let's say, you mentioned the example of WhatsApp, for example. How would I go about figuring out how much energy is actually attributable to, say, WhatsApp, for example, on a computer or something like that?
Maybe if we were to look at that example there, then we can say, okay, we could talk about some of that, then we could see how that becomes more difficult if you're thinking about things like cloud computing, because, as I understand it, The assumptions you might make when thinking about a desktop computer might not be the same as working with a cluster of computers, for example, and I believe this is some of the work that Hongyu you've done most recently and spoke about at HotCarbon.
Aditya Manglik: That's another great question, Chris. Okay, let me quickly dive into it. Imagine this as a car, okay? You are driving a car. Now, you decide where you're going to go. but its the engine that burns the fuel right? you're not burning the fuel, you're simply deciding oh I want to drive to London but your engine is what's going to consume the fuel now when you want to send a message to whatsapp what you do is you write out a message and you hit the send button and behind the scenes what the kernel does or what is actually happening is the kernel converts your message to a bunch of packets And it sends these packets over the network and along this way of converting this message into packets and sending it, you are using your device's CPU, memory, storage, network, screen, maybe the Wi Fi interface, right?
So, you see all of these hardware devices are immediately turned on as soon as you hit that send button, and that's where the energy consumption comes from. And what would a model look like to build such a model? What you would do is you would take in the amount of power of the CPU and multiply it by the time that the CPU was running.
Similarly, you take the amount of power for the, for the network interface, that would be the wifi card, and you multiply it by the time, by the amount of time that it was running, and then you accumulate all of these data points. And that gives you the energy of sending a WhatsApp message on your device.
Okay, we're not even talking about the energy that the servers consume, the energy that your friend's device consumes.
Chris Adams: Is that giving some pointers? And so the idea would be that if you can't get the figures from each of these pieces of equipment themselves, like this, like a CPU, like a screen or something like that, you might use a model to
come up with some numbers for that. Okay.
Aditya Manglik: Yes.
Chris Adams: All right. And that is based on the assumption when you're looking at a single machine, using a single program.
Now, on you, I think when you are, I think it's somewhat different with, there's assumptions might not always hold true. So maybe Hongyu, you could explain how this gets a bit more complicated in the cloud, or some of the parts there, perhaps.
Hongyu Hè : Thanks for the question, Chris. Yeah, I think Adi brought up a really good point about hardware and the model. One thing I'd like to add on at this point is the key reason why we need a model. Adi has introduced the concept of kernel. So kernel is basically a cushion, if you like, between users and underlying hardware. And the hardware is ugly, because they have different interfaces, it's really hard to interact with directly, so there are multiple challenges, and one crucial challenge that we have been facing is the lack of support from hardware, so if the energy attribution is there, so if I'm using WhatsApp and the hardware is telling me, okay, WhatsApp is using this amount of energy.
Then why don't those kernel, the cushion reports this to me, right? That's the key point. So here we don't have the crucial hardware support that we need. That's why we need the model to collect, uh, if you like, the proxy data, like utilization, the time you're using to calculate, uh, the amount of energy from the user side instead of relying on solely from the hardware. And speaking of like classic computing, I think also Adi mentioned a great point about multiple hardware, different devices that really makes the life more difficult. Because we need to take into account different kinds of devices, especially in the cloud, we have heterogeneous devices, CPU, DRAM, GPU, etc. Yeah, it makes things more challenging and much harder to calculate the energy consumption because of the distributed nature as well.
Chris Adams: So if I understand what you've been saying here, so there's one issue, which is a case of attributing the energy to a particular program, for example. And then one of the other issues is basically the fact that across all the different kinds of computers, not every single And Device that draws power will have a consistent way of reporting how much power it's drawing. So if I understand it, there are some tools that we do this. So lots of intel processes have a thing called running average power limit, for example. But it may be the case that if you're using maybe. If we were to step away from our WhatsApp example and say, I'm doing a really big machine learning job using a bunch of very powerful graphics cards, they might not expose the same information about how much power they're using.
So you would need to either model that or you would need to have some other way of getting that information back. Is that the case?
Hongyu Hè : Yeah, that's a very accurate summary. Thanks, Chris.
Chris Adams: Okay. All right. So that gives us some pointers here. And I'll just ask you one thing about this as well, because this is something you touched on in your paper. The example that Adi gave was being in a single computer, where you can be relatively confident that the hard drive is attached to the same computer and the screen is attached to the same computer, like a laptop, everything's in one place. This assumption might not be true when you're looking at cloud computing. Again, I understand it's a little bit more complicated. Is that the case?
Hongyu Hè : Yes, indeed. Yeah, that's, that's a great question. So in a cloud, for example, computing resources like CPU memory that we've been talking about are increasingly shared among many tenants or users, or, you know, for example, the organization like a university or company are using a class of servers. And this really makes The attributing of energy consumption really hard. And also this is quite a sensitive topic as well, because we don't want to point fingers arbitrarily without a very precise model there.
Chris Adams: All right, Adi, I assume this is similar to some of the work that you've been finding as well, because I understand your research has been focusing on the desktop part more than the cloud computing part, right? That's where some of your research has been, or have you been looking at the wider, somewhat more wider than that, for example?
Aditya Manglik: I've been focusing on the desktop for now, but I agree with Hongyu that it's tricky to correctly point fingers at people for consuming the energy that they're using. Yeah, so at the end of the day, what we want to tell people is, okay, computing is great, but we have to be sustainable. And right now, data centers consume 3% of all global electricity.
This number is only going to grow, right? Especially after COVID, we have had a massive increase in digitalization, and now with the large language models coming in, like ChatGPT, it's going to grow exponentially. So we have to be sustainable, and the first step to being sustainable about energy use is to understand where is the energy going.
And, yeah, this problem becomes more tricky because with the growth of cloud, you don't know who exactly is consuming how much. I'm very curious, and I keep talking to Hongyu about his work. He's doing fantastic work in this direction. Yeah, let's just say that we are both very curious about this.
Chris Adams: Okay, so if I understand it, it's a bit easier to get some of the numbers from a computer you have yourself. Right, when it's on your own computer, but, because increasingly we're moving computing workloads away from just the desktop into a kind of wider set, it ends up going from other places. So maybe you could actually talk a little bit, let's say that you do want to actually start measuring this, or you do want to start understanding what role you could play or what, or how you're able to at least measure this so you can start optimising it. Let's say you're working with servers right now and you're using a bunch of Linux computers. What are your options at now? So Ade, I'll start with you actually. I'm on a Linux machine, it's just one machine, and I want to understand the environmental impact of a particular service, or a machine learning job, or any kind of thing I'm about to do. Maybe you could just talk to me about what my options are right now, for the most part.
Aditya Manglik: Yeah, sure. There are a bunch of tools that I know about. The first tool that comes to my mind is something that I've looked at quite some time back, but it's a tool called PowerTOP. Just like you have the top utility in Linux based systems, this tool is called PowerTOP, and I think it is It used to be supported by Intel, and what this tool does is it tells you how much power each process is consuming on your system at any given point of time.
Now, sometimes those numbers are a little shaky. But it does a decent job. Post PowerTOP, I came to know about this interesting tool called Scaphandre. Scaphandre actually goes in and gets you the energy consumption. So Scaphandre monitors, I think Scaphandre has built some high level models for taking in all the information that we talked about in the earlier questions and actually calculating.
The energy consumption for a process. But the problem is that we assume that all of this is okay, is that if you talk about a desktop machine, it is not virtualized. We assume that all the number that you, the numbers that you see, they can be accounted to a single entity. Whereas if we go to the cloud, you have multiple entities running on the same hardware.
That's the fundamental premise of cloud, right? You want to reduce hardware costs by sharing the workloads. And that's where things get murky because we simply don't know how to separate out the energy for each entity. I think Hongyu would be able to shed more light on this.
Hongyu Hè : Yeah. Thank, thanks, Ali. I think those are really great points, especially you mentioned a tool called Scaphandre. I'm not sure if I'm pronouncing it correctly, but yeah, that's, that was one of the baselines of our paper. And indeed, as Chris has mentioned, there are tools available on Linux, it's not a thing people can use. There are tools and, for example, PowerTOP. But the models they have are coarse grained, meaning that, yeah, they are not computing by a fine grained energy attribution per user, per thread even, and we can talk about that, why that's important later, but in the cloud, as Adi has said, virtualization is a crucial technique, if not the most important technique, that enables users to share resources, but for energy attribution, actually, It's a key enemy, I have to say, because in order to get accurate energy attribution, we need to get access to hardware counters that tell us the statistics, the runtime statistics we need for our model to calculate the energy attribution. And that's, yeah, as Adi said, makes things much trickier.
Chris Adams: So if I can just take a step back for a second. So we spoke about, you've got a machine running, and, a machine will be running a series of, we might call them programs, but you might refer to them as processes, and within a given process, there might be a series of threads that's running, that kind of granularity is quite difficult, so if it's just my own machine, and it's just me using a computer, then you can attribute all of the figures to me, essentially, but when there's multiple processes or multiple programs for multiple people, working out who to share the kind of responsibility for the emissions, that's the difficult, that's the part that gets more complicated.
Hongyu Hè : Exactly, yeah, that's a fair summary. Thank you, Chris.
Chris Adams: Okay, we were talking about some of the tools available, so, and Adi, you were talking about Scaphandre as one of the tools which has become quite popular for this, for tracking some of this, but I understand, Hongyu, some of the research you did was, you've been using some other tools to help address some of the problems that you've come up against this when you're looking at basically working in a cloud like environment, for example, where you don't have absolute access to the computer yourself, for example. This was my understanding of some of the work that you're presenting at Hot Carbon, correct?
Hongyu Hè : Yes and no. Yes, it's because we are looking at how to accurately attribute energy in a multi tenant environment. And no, it's because attributing energy consumption in a virtualized environment is still an open question and we haven't solved it yet. And it will be very interesting to see future solutions to that. But indeed, we've compared with multiple tools like Scaphandre, and also the famous Cobbler, for example, but we explicitly Compared with those tools that run that target non-virtualized environment because in a virtualized environment, I think it's a fair game and no one knows exactly the wrong truth. What we found that is that existing tools are too course grain, meaning that when we use them to measure the energy of your application, for example, they will. Also mix in the consumption of other applications that run on the same server as your application does, which is a very common scenario. And we found that this could really lead to about 50% overestimation and over 90% underestimation.
Yes, in our paper, the main objective is really to measure the energy consumption of your application and only your application as precisely as possible and exclude the consumption of other applications.
Chris Adams: I can't help asking, when we start talking about these tools, is there an overhead from measuring your own footprint when you're trying to do this? Because as I understand it, this was This is one thing that has come up a few times, is that, for example, Scaphandre is written in Rust, so it's designed to be a very small, fast, lightweight program, which has some overhead, but I understand that there is going to be an overhead from tracking some of this in the first place, correct?
Hongyu Hè : Yes, that's a really great point. Thanks for bringing it up, Chris. Indeed, as you said, Scaphandre is written in Rust, and Rust is really an efficient language compared to, for example, Python and the mingle of energy attribution is to really have the precise knowledge of the energy efficiency of application so that we can improve and optimize our code accordingly. But as you said, yes indeed, there's a inherent trade off between the preciseness or how fine-grained our model is and the cost right in both terms of performance and and energy. And so our model takes into account, for example, the underlying hardware and to collect a more fine grain stati runtime statistics in our model.
But indeed, the overhead could be larger and mitigate the overhead we use conditional probability to do reasonable estimation whenever applicable, instead of trying to capture every single event per millisecond, so to speak. So that would be really costly. This part is a bit, you know, intricate, and we have more detailed mathematical formulation available in our paper.
But yeah, that's that's the high level idea.
Chris Adams: Okay, if I understand what you've been saying so far, so there's one option, which is to use like a fast programming language, which moves quite quickly, or there's another approach, which is to take A kind of sampling approach so that you are not having to, if you are using something which is maybe a little bit slower, like Python, you don't read quite so much.
And another option is to basically use something which is even closer to the metal, like in kind of the kernel space rather than in user space. And Adi, this is what I think you were talking about when you were talking about kernels. Is that, is that the case?
Aditya Manglik: Yes, that is the case. I think there's a very interesting data point that I read in some blog by Microsoft and what they told is basically, so there's three ways to measure energy. First is that your hardware directly tells you that, okay, I used X amount of joules and that would be a 98% accurate number.
It's not, it's still not 100%, right? Because of thermodynamics, but you still have 98% accuracy that, okay, this is the energy that this particular hardware device consumed. The next best step after that would be a kernel level measurement and a kernel level measurement would be, if I remember correctly, they pointed as 85% accurate.
And that is why it would be great to have something in the kernel, and that is why MacOs and Windows put these systems in the kernel to monitor the energy consumption. And finally, if you have something from the user space, now, it's not that accurate simply because it has visibility into a very small subset of the information that you need to get high in enough accuracy, and I completely agree with your point, that the more accurate you want, the more measurements you make, the more energy you are going to consume, right? So it's like a, it's like a catch 22 situation. I want to calculate something, but in the process of calculating it, I'm increasing the load on the system, and by increasing the load on the system, I am increasing the energy consumption.
You need to find out a good balance. between hardware and software based measurement mechanics.
Hongyu Hè : I think Adi mentioned a really great point, so I think the trade off is not necessarily in the programming language itself, but it lies in the model itself. So as Adi has said, the more fine grained your model is, the more costly it's likely to be. And I think we really need to strike a balance between how detailed you want your measurements to be and yeah, the cost it comes with it.
Chris Adams: I see, okay, and maybe this is a chance for us to zoom out a bit because, as I'm aware, one of the projects that the Green Software Foundation is currently involved with is this project called the Real Time Carbon Standard, the idea of creating some of this as something like a way to report these kinds of figures. As I understand it, one of the tools it seems to be standardizing on, and this is a project which is led by Adrian Cockcroft, who is a former VP of Sustainability and Cloud at Amazon and has basically a 20 plus year background working in this field, I believe they're settling on one tool called Kepler, specifically which ties into kernels to provide some of these numbers, but even then there is an ongoing discussion about, okay, how do you make sure that you have access to, how do you report numbers that are actionable, that developers or designers can use? Without actually disclosing too much information that might be a, a possible source of attack, like a kind of side channel attack, for example, and also what kind of resolution is necessary. Now, as I understand it, I think one of the things that people are pushing for there is the idea of going for minute level resolution rather than millisecond level resolution.
So at cloud level, that would al already be way further than what we have right now, but that might in theory, give you enough to then get an idea about what kind of impact you choosing to use, say, a computing job in one place might be compared to another, or at least give you something to optimize for carbon at that point. This idea of actually exposing the energy being used at this kind of level, I think there's a term that was mentioned in one of your papers about Energy Aware Computing or Energy Aware Cloud Computing? I'll ask you a little bit about this because I know that this was something I had to take away from you, but Adi, I'll come to you on this afterwards actually, because I think this is something that you've actually been speaking about at the LF Energy Core Forum as well actually. So. Maybe you could actually explain this idea of one energy aware cloud environment might actually be Hongyu.
Hongyu Hè : Yeah. As you mentioned that there's a great tool called Kepler. And I think, yeah, this kind of tools is very instrumental. as to, um, what kind of information they can give to both the users and the cloud. operators. And by Energy Aware or even Energy Intelligence, which is another level, is that we can make our decision based on, for example, the energy statistics we collected, for example, using those coupler or energy altogether tools to make decisions that optimize for, not only for performance but also for energy efficiency. And the reason for that is because data centers itself, or even networking, has huge potentials and they have great, you know, energy flexibility and we can use this kind of elasticity to do great things. For example, using data centers as energy storage or energy power bank for the smart grid. Yeah, I think that's one of the ideas, but there are, definitely a huge number of challenges we are facing in order to achieve this kind of energy aware cloud or energy intelligence cloud.
Chris Adams: All right, that feels like it's going in a somewhat different direction. So that's basically, but all that is necessary, in order for that to be possible, you can't be driving blind if you want to have this kind of awareness of what the grid's doing right? That's one of the ideas behind that. Okay, so maybe we should touch on why are we doing this in the first place? Because we spoke a bit more about yes, energy is coming from burning fossil fuels. We're not going to entirely transition our fossil fuels tomorrow so as long as we're burning fossil fuels to provide power there are going to be carbon emissions associated with this.
Adi I'd just like to speak to you about why you got involved in this, why you got excited about this in the first place or why you did do this? Because there must have been some process before you decided to try presenting a, like an energy conference and talk about personal computing in the first place.
I, I was quite surprised to see it, but I was very pleased to see someone actually talk about this and talk, talk about making some of this measurable.
Aditya Manglik: Chris, it's a personal story. It goes a long way back when I was an undergrad and in my undergrad, in my, I think, junior year, I had a laptop, which was not the best, and my battery had started to die out. I had exams to prepare for and my battery was acting up and I could not figure out I just charged in the morning why is it dead in 30 minutes so as a very simple minded engineering student my mind immediately went to the problem okay the battery is working good what is consuming energy let me kill off the applications that are consuming energy and that's how I got into the question Okay, I need to figure out which applications are consuming energy in order to kill them correctly, right?
And that's where the entire journey started. I could not figure out. And then it grew on me that, oh, how do we figure out? Because if we can figure this out... We can do a lot more very interesting things. So for instance, I think Hongyu mentioned really great points about energy aware scheduling in data centers, and I see a lot of effort from these hyperscalers to schedule workloads when renewable energy is available.
So when you talk about solar or wind energy, one of the key characteristics of these sources is that they're not 24 7 available. They're available in abundance. At a fixed point in time, and then they fluctuate a bit. So what you would want to do is you would want to maximize the utilization of these green sources when they are available. And if you can schedule your workloads at the right point in time, you can really decrease your carbon emissions. You can really decrease your utilization of fossil fuels while also maintaining your service level agreements with your customers. And that's a win situation for everyone.
You see how this simple problem of not being able to find out the energy consuming applications on my system turned into trying to save the world by reducing the energy consumption of data centers? I don't know. Yeah, so it's been a fascinating journey and I would love to keep going on this. But yeah, thanks for the question.
Chris Adams: Alright. Okay. So there is, um, I think what you might be referring to here is this notion of carbon intensity changing, depending on how abundant renewables are on the grid, for example. Is this something that you touched on as well when you were doing the work for your research on what you were presenting at Hot Carbon?
Hongyu Hè : Yeah, thanks for asking Chris. It's a great point that as we've discussed like how to use data centers as a utility or power bank, but I think our work is mainly targeting a user level optimization. And as you previously said, I really echo with The concerns from AWS so you know, the amount of information you're exposed to the user and the security concerns that, uh, come with it. And I think we need, really need to strike a balance, uh, between the two because users really need the information to optimize for energy efficiency. But on the other hand, you can't really, you know, expose too much information to the users because of the potential security concerns. And that's really, you know, a, a base, the, um, the virtualize the goal of virtualization, for example.
So it's tricky. But yeah, I think we need, uh, at least get something out for the user to optimize for energy efficiency.
Chris Adams: Okay, all right, so this is where some of this kind of cloud computing might be actually heading towards. Adi, you mentioned something about this idea of being able to control or adjust the carbon intensity of electricity by choosing certain times of day when there's an abundance of power in the grid. As I understand that, that's basically one of the reasons why you might do that, is because is that because there is the assumption there's more power than can be used. Maybe you could talk a little bit about how timing power, timing your usage when there's more renewables on the grid actually does help. Maybe you could expand a little bit on that because I know there was some useful research and I'm trying to find a link for it to bring into this, because there's a really nice model that's actually written in Python that actually demonstrates this and I found some pretty eye opening figures for it, but I figured maybe you might expand on or touch on some of this yourself because it seems to be something that you have an interest in as well.
Aditya Manglik: Absolutely. Let's take a look at two points and I think that would really help make this clear. Majority of us are working in the day, right? We go to our offices and we go back to homes. So majority of us use our devices during the day. And that's when we introduce a lot of work for the servers, right? So the data center operators like to call these patterns as diurnal patterns in which the usage spikes during the day and then dips during the night because people are sleeping.
And let's take a look at the second point. Second point would be, for example, the availability of solar energy. So solar energy, as you can guess, would be much more plentiful during the noon. and let's say less available during the evening. So what you want to do is you want to maximize the use of solar energy when it hits the peak.
But it turns out that people often maximize the usage of these devices after lunch, right? So what you do is if you have a surplus of energy available, you use it to schedule batch jobs. What do we mean by batch jobs? These are long running jobs. For example, training neural networks. During the time when solar is available, and you also keep serving your users and your customers using different sources of energy as and when they're available.
I really hope this example drives home the fact that careful balancing of our work as well as the availability of energy to do that work well, it really makes things happen for everyone.
Hongyu Hè : Yeah, so one quick point I have regarding what Adi has just mentioned is that actually I've done my bachelor thesis on energy procurement and modeling of energy in data centers. Actually it's quite surprising that loads of green energy is being dumped. And actually, uh, the, the, the smart grid is rejecting those green energies because, as you said, some parts of the world, uh, have a lot more excessive green energy than other places, for example, uh, Virginia, and, and I think it's a two-way bridge. By exposing more information to the users, on the other hand, cloud providers can also get more information about their workloads. And this can also benefit to their operation as to how they operate their data centers more efficiently and to participate more in the energy grid.
Chris Adams: I'm just gonna round up for the last few minutes. And I was just going to ask, if the people are interested in this kind of work and this kind of projects, how do they start, or what kind of tools would you suggest we look at? For example, if I start with you, Hongyu, then Adi, I'll come to you next. Hongyu, let's say someone, they've got some servers, or they're running some computing, and they want to start experimenting with these figures here. Where do they find out more about this? Is there a project that you would draw people's attention to, to look at on GitHub, or is there a thing you can pip install, for example, if you're running a computer, something like that?
Hongyu Hè : Actually, we implemented a prototype for our theoretical model called EnergAt, which is available on GitHub. Because we want to evaluate, uh, our theoretical model, uh, experimentally. And yeah, it provides users with both a command line interface and a Python API. So you can just download it by, uh, just pseudo pip install EnergAt, so E N E R. G A T. And sudo is very important here because we need the root permissions and it's being validated so you can find the details of our experiments in our paper, but in a nutshell, it can really precisely measure the energy consumption of our applications, even in a multi tenant environment. But it's not perfect.
As Chris has mentioned, If you want to contribute, there are plenty of opportunities. So, for example, we need a secure and efficient hardware software interface for energy reporting. And also, attributing energy in a virtualized environment is still an open question. And we might want to support more devices and more fine grained accounting as well. Yeah.
Chris Adams: Okay, cool. Thank you, Hongyu. And Adi, I think I'll leave the last word with you. If there's any projects or links you would direct people to, if they have an interest in any of this and would like to learn more.
Aditya Manglik: I think that's a very good question because people need to be aware of this. I think our audience would be using diverse devices. So please go to your device. If you have a Windows device, do pseudo parametrics and see what you get. If you have a Mac OS, go to the activity monitor and see the energy impact, okay?
Just, just see how much each process is impacting your battery. If you have Linux, please download Scaphandre and see how much Chrome or Firefox is using. And if you're really technical, please come and talk to me and Hongyu and we would love to dive deeper into more and more tools and help you solve your problems.
I really hope that gets people started. You can also look into Android and iOS because both of them report really good data about what these processes are using in terms of a battery. And once we build up enough awareness, I think then we can go deeper into. How to make these models better and how to reduce it, right?
Chris Adams: Okay, cool. Thank you for that, Adi. All right, so we've got options across all of the tools you might have there. And there's at least one thing people can start playing with. All right. Okay, gents, I think that takes us up to the time that we have available. And, yeah, thank you very much for coming on. And I quite enjoyed nerding out, plumbing the depths of finding out how to actually understand the energy used by various parts of our computing. Alright, cheers folks, thank you very much for your time, and yeah, I'll see you on one of the future episodes, alright? Take care folks, thanks.
Aditya Manglik: The pleasure was all ours, Chris. Thank you for having us on this call. I really enjoyed it.
Hongyu Hè : Thank you very much, Chris.
Chris Adams: Hey everyone, thanks for listening! Just a reminder to follow Environment Variables on Apple Podcasts, Spotify, Google Podcasts, or wherever you get your podcasts. And please, do leave a rating and review if you like what we're doing. It helps other people discover the show, and of course, we'd love to have more listeners. To find out more about the Green Software Foundation, please visit greensoftware.foundation. That's greensoftware.foundation in any browser. Thanks again, and see you in the next episode!


  continue reading

88 episodes

Artwork
iconShare
 
Manage episode 373897523 series 3336430
Content provided by Asim Hussain and Green Software Foundation. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Asim Hussain and Green Software Foundation or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
In this episode of TWiGS we delve into the intricate world of measuring software energy consumption, a topic vital for reducing our carbon footprint. Despite the strides in greening software, knowing how much energy software consumes remains a challenging puzzle, especially in the cloud computing era. Joining host Chris Adams are guests, Aditya Manglik and Hongyu Hè, graduate students from ETH Zurich in Switzerland. With their expertise in improving energy efficiency in systems, particularly operating systems, microarchitecture, and machine learning, we embark on a captivating journey to understand why quantifying software energy usage is intricate and what innovative solutions are emerging. Stay tuned as we amplify the geek factor to 11 and uncover the complexities of this critical field.
Learn more about our people:

Find out more about the GSF:

Topics:

Resources:

If you enjoyed this episode then please either:

TRANSCRIPT BELOW:
Aditya Manglik: At the end of the day, what we want to tell people is, okay, computing is great, but we have to be sustainable. And right now, data centers consume 3% of all global electricity. This number is only going to grow, right? Especially after COVID, we have had a massive increase in Digitalization, and now with the large language models coming in, like ChatGPT, it's going to grow exponentially.
So we have to be sustainable, and the first step to being sustainable about energy use is to understand where is the energy going.
Chris Adams: Hello, and welcome to Environment Variables, brought to you by the Green Software Foundation. In each episode, we discuss the latest news and events surrounding green software. On our show, you can expect candid conversations with top experts in their field who have a passion for how to reduce the greenhouse gas emissions of software. I'm your host, Chris Adams. Welcome back to The Week in Green Software on Environment Variables, where we bring you the latest news and updates from the world of sustainable software development. I'm your host, Chris Adams. When we talk about greening software, a lot of the time we talk about how much energy we use, because even in 2023, more than half of the electricity we use globally is still generated by burning fossil fuels. And we've spoken before in other episodes about how you can make the electricity you use greener, but sometimes you just need to be able to use less electricity in the first place. And to do that, it helps to know how much energy software is using in the first place. This sounds simple, right? In a world of cloud computing, this turns out to be surprisingly hard, and today we're turning up the geek factor all the way to 11 to figure out why this is hard and what the state of the art looks like. Helping with this journey today, we have two special guests from ETH Zurich in Switzerland, whose work we featured in earlier episodes, and we'll see how far we can get in the time we have today. So with us today, we have Aditya. Hey Aditya!
Aditya Manglik: Hi Chris, please feel free to call me Adi. It's a pleasure to be here on this podcast. And, yeah, I'm a longtime listener of this podcast, so it's very exciting to be here. I'm a graduate student at ETH, where I work on improving the energy efficiency of systems, especially operating systems and microarchitecture.
And I previously worked on building a very nice, very complex energy attribution system in Linux as a Google Summer of Code student with the GNOME Foundation.
Chris Adams: Cool. Thank you, Adi. And in addition, we have Hongyu also. Hongyu, I'll give the floor for you to introduce yourself as well.
Hongyu Hè : Yeah, thanks. Thanks very much for having me on, Chris. And thanks for inviting me, Adi. So yeah, I'm also a graduate student in computer science at ETH. My research includes both software and hardware. And I'm currently interning at Apple, working on machine learning research.
Chris Adams: Cool. Thank you Hongyu. So for those who are interested, we featured both of, uh, the work from both of these two researchers. In the last episode, we spoke a little bit about Hongyu's, uh, uh, paper, uh, what one of the papers Hongyu was a contributor on at the Hot Carbon Conference. And we'll share a link to the talk presented there.
And we've also shared a link to Adi's talk at the Linux Foundation Energy Summit in Paris earlier on in June. If you're new to this episode, to this podcast, my name is Chris, I'm the Executive Director of the Green Web Foundation, and I'm the Chair of the Policy Working Group at the Green Software Foundation, and as a final reminder, we're going to cover a fair few papers and links and resources, and what we'll do is we'll add all of these to the show notes so that you can do your own research later on as you run through this. All right then, I think we're all sitting comfortably, so shall we begin, fellas?
Aditya Manglik: Yeah, I look forward to it,
Chris Adams: All right. Okay. Adi, I think I might start with you first. We've spoken a little bit about tracking energy consumption and why it's an important thing. Maybe you could just give a bit of a kind of overview about why this is important, what the state of play is in the different systems, because we know that computers run on, say, Linux. Lots and lots of machines run on Linux, but we also know computers use Windows and macOS. Maybe you could provide a bit of background, then we could talk about what the options are for people using these systems.
Aditya Manglik: absolutely. I've been working on this problem since almost five to six years now and it's an absolute pleasure to be talking about it. Well, I often like to say that you cannot improve what you cannot measure and that is where the problem starts. We don't know how to measure the energy consumption of our systems.
For example, if I ask you, how much energy does WhatsApp use? Or when you send a WhatsApp text to your friend, how many CO2 emissions did that message take? Learn Can you give me an answer? No, that's what makes me so excited to get out of the bed every morning and then try to figure out, okay, how much energy is WhatsApp using?
So it turns out that people at Microsoft, Apple, Google also care about this and they really tried to solve this problem and Microsoft has this very interesting kernel system called the Windows Energy Estimation Engine. It is running on all Windows devices. Android has a very interesting service called PowerMetrics.
You can think of it like a daemon. A daemon is a magical service that runs in the background of your system that does all the stuff for you and you don't know that it exists. PowerMetrics on macOS also collects all possible data about the energy consumption of your applications. Now, what about Linux, right, we are, we love open source, and Linux is a very important operating system, right, all servers in this world are, majority of them are running Linux, but we don't know how to measure the energy consumption of these servers, especially from the software, right, people often think to measure energy you need these hardware devices, or you need these electrical engineers to come in and plug monitors and then tell you, oh, this consumed 5,000 joules.
No, we want to solve it using the tools that we have, and I think we can solve it. I believe we can solve it. And that's what I'm working on.
Chris Adams: Okay, so you just mentioned two things, first of all. So one, first of all, you said that if you're using a Windows machine, there's existing tools that you can tap into and get readings from. And if you're using an Apple machine, you've got access to those kind of figures. But it's a somewhat murkier situation with Linux right now, there isn't a kind of common tool that is actually universally used.
That's One of the key takeaways I'm getting. And Hongyu, I believe this is what you've been finding as well, and you've been looking into some of this as well,
Hongyu Hè : Yeah, exactly. I think there are tools, but there is no common thing that everyone uses. And the standards of those tools are varying quite a lot. That's also, as you said, one of the reasons why we contacted the research in the first place.
Chris Adams: Okay, alright, so if I understand this, given that the majority of servers are now running Linux, basically not having some tools for the most common kind of operating system is one of the things which makes it difficult to come up with some of these numbers. That's what I think I'm understanding from here. We're naming this episode Green Kernels, and I figure it might be worth actually just talking about this idea, because this sounds like a relatively low level thing that's built into systems themselves, actually. Adi, could you maybe talk a little bit about this part here? Because I think that you've spent a bunch of time looking at this low level part of an operating system like Linux, like this kernel part. And before we dive into that, maybe you could actually explain what a kernel is and why that might be somewhere that you actually track some of this. 'cause not everyone may know what a kernel is when you're thinking about computers in the first place.
Aditya Manglik: Chris, first of all, I love the name Green Kernels, like when you talked about, to me, talked about this podcast and when you named it Green Kernels, I was like, yes, I came to the right place. Okay. And yeah, what are the kernels? I think our audience is really smart and even smart people sometimes just need to quickly jog their memories.
So what we're going to do is quickly jog their memories. A kernel is the core of an operating system. What does that mean? Okay, so for us a computer is just a computer on which we log in and do something, but what we do is an application, we use Microsoft Word, PowerPoint, Excel, these are all applications, and Windows that is running these things is the operating system, and operating system comprises broadly two parts, a kernel, which is the core, that you don't see, which handles everything for you and the user space, which is what you interact with.
So you know that start button, that you click that is part of the user space and that start button goes behind the curtains and does some interesting stuff that comes back to you and yeah, you see the effect of your action. So the kernel is the primary response, primary entity in any operating system that is responsible for managing the hardware, the applications, the processes.
Chris Adams: Alright, so this kernel part is the thing that. So far, for Windows machines and for Apple machines, there's something in there, but for Linux machines, you don't have that same ability to read information yet. And this is some of the work that you've been doing to look into to basically make some of that readable.
Is that the case?
Aditya Manglik: That's a great question, Chris. So, energy is typically thought of as an electrical engineering topic. And it's difficult and it's fancy. No, people typically don't include energy monitors and systems. That is the fundamental reason why we are trying to do this. You can measure the performance of your programs.
You can measure how much time it took and this is possible because your system tells you how much time your program took. Your CPU tells you how many clock cycles your program executed for. But if you ask it, okay, how much power it consumed or how much energy it consumed, I think things fall apart. And that's why you need to do a lot of modeling and build entire systems to figure out this information.
Now, Linux also has this information, but the models, right, so you can have all the data in the world, but until you know how to make sense of that data, it is useless, right, and that is what the model does for you, and these models don't exist in Linux, they exist in Windows, they exist in Mac OS, Android, iOS, But I'm not so sure if they are existent or if they're good enough for Linux.
That's what I think, but if you know about it, let me know. I would be very happy to know.
Chris Adams: All right, and when you're talking about a model here, maybe you could just elaborate on that, because I'm not sure I quite follow when you talk about something being a model like this. If I'm, let's say, you mentioned the example of WhatsApp, for example. How would I go about figuring out how much energy is actually attributable to, say, WhatsApp, for example, on a computer or something like that?
Maybe if we were to look at that example there, then we can say, okay, we could talk about some of that, then we could see how that becomes more difficult if you're thinking about things like cloud computing, because, as I understand it, The assumptions you might make when thinking about a desktop computer might not be the same as working with a cluster of computers, for example, and I believe this is some of the work that Hongyu you've done most recently and spoke about at HotCarbon.
Aditya Manglik: That's another great question, Chris. Okay, let me quickly dive into it. Imagine this as a car, okay? You are driving a car. Now, you decide where you're going to go. but its the engine that burns the fuel right? you're not burning the fuel, you're simply deciding oh I want to drive to London but your engine is what's going to consume the fuel now when you want to send a message to whatsapp what you do is you write out a message and you hit the send button and behind the scenes what the kernel does or what is actually happening is the kernel converts your message to a bunch of packets And it sends these packets over the network and along this way of converting this message into packets and sending it, you are using your device's CPU, memory, storage, network, screen, maybe the Wi Fi interface, right?
So, you see all of these hardware devices are immediately turned on as soon as you hit that send button, and that's where the energy consumption comes from. And what would a model look like to build such a model? What you would do is you would take in the amount of power of the CPU and multiply it by the time that the CPU was running.
Similarly, you take the amount of power for the, for the network interface, that would be the wifi card, and you multiply it by the time, by the amount of time that it was running, and then you accumulate all of these data points. And that gives you the energy of sending a WhatsApp message on your device.
Okay, we're not even talking about the energy that the servers consume, the energy that your friend's device consumes.
Chris Adams: Is that giving some pointers? And so the idea would be that if you can't get the figures from each of these pieces of equipment themselves, like this, like a CPU, like a screen or something like that, you might use a model to
come up with some numbers for that. Okay.
Aditya Manglik: Yes.
Chris Adams: All right. And that is based on the assumption when you're looking at a single machine, using a single program.
Now, on you, I think when you are, I think it's somewhat different with, there's assumptions might not always hold true. So maybe Hongyu, you could explain how this gets a bit more complicated in the cloud, or some of the parts there, perhaps.
Hongyu Hè : Thanks for the question, Chris. Yeah, I think Adi brought up a really good point about hardware and the model. One thing I'd like to add on at this point is the key reason why we need a model. Adi has introduced the concept of kernel. So kernel is basically a cushion, if you like, between users and underlying hardware. And the hardware is ugly, because they have different interfaces, it's really hard to interact with directly, so there are multiple challenges, and one crucial challenge that we have been facing is the lack of support from hardware, so if the energy attribution is there, so if I'm using WhatsApp and the hardware is telling me, okay, WhatsApp is using this amount of energy.
Then why don't those kernel, the cushion reports this to me, right? That's the key point. So here we don't have the crucial hardware support that we need. That's why we need the model to collect, uh, if you like, the proxy data, like utilization, the time you're using to calculate, uh, the amount of energy from the user side instead of relying on solely from the hardware. And speaking of like classic computing, I think also Adi mentioned a great point about multiple hardware, different devices that really makes the life more difficult. Because we need to take into account different kinds of devices, especially in the cloud, we have heterogeneous devices, CPU, DRAM, GPU, etc. Yeah, it makes things more challenging and much harder to calculate the energy consumption because of the distributed nature as well.
Chris Adams: So if I understand what you've been saying here, so there's one issue, which is a case of attributing the energy to a particular program, for example. And then one of the other issues is basically the fact that across all the different kinds of computers, not every single And Device that draws power will have a consistent way of reporting how much power it's drawing. So if I understand it, there are some tools that we do this. So lots of intel processes have a thing called running average power limit, for example. But it may be the case that if you're using maybe. If we were to step away from our WhatsApp example and say, I'm doing a really big machine learning job using a bunch of very powerful graphics cards, they might not expose the same information about how much power they're using.
So you would need to either model that or you would need to have some other way of getting that information back. Is that the case?
Hongyu Hè : Yeah, that's a very accurate summary. Thanks, Chris.
Chris Adams: Okay. All right. So that gives us some pointers here. And I'll just ask you one thing about this as well, because this is something you touched on in your paper. The example that Adi gave was being in a single computer, where you can be relatively confident that the hard drive is attached to the same computer and the screen is attached to the same computer, like a laptop, everything's in one place. This assumption might not be true when you're looking at cloud computing. Again, I understand it's a little bit more complicated. Is that the case?
Hongyu Hè : Yes, indeed. Yeah, that's, that's a great question. So in a cloud, for example, computing resources like CPU memory that we've been talking about are increasingly shared among many tenants or users, or, you know, for example, the organization like a university or company are using a class of servers. And this really makes The attributing of energy consumption really hard. And also this is quite a sensitive topic as well, because we don't want to point fingers arbitrarily without a very precise model there.
Chris Adams: All right, Adi, I assume this is similar to some of the work that you've been finding as well, because I understand your research has been focusing on the desktop part more than the cloud computing part, right? That's where some of your research has been, or have you been looking at the wider, somewhat more wider than that, for example?
Aditya Manglik: I've been focusing on the desktop for now, but I agree with Hongyu that it's tricky to correctly point fingers at people for consuming the energy that they're using. Yeah, so at the end of the day, what we want to tell people is, okay, computing is great, but we have to be sustainable. And right now, data centers consume 3% of all global electricity.
This number is only going to grow, right? Especially after COVID, we have had a massive increase in digitalization, and now with the large language models coming in, like ChatGPT, it's going to grow exponentially. So we have to be sustainable, and the first step to being sustainable about energy use is to understand where is the energy going.
And, yeah, this problem becomes more tricky because with the growth of cloud, you don't know who exactly is consuming how much. I'm very curious, and I keep talking to Hongyu about his work. He's doing fantastic work in this direction. Yeah, let's just say that we are both very curious about this.
Chris Adams: Okay, so if I understand it, it's a bit easier to get some of the numbers from a computer you have yourself. Right, when it's on your own computer, but, because increasingly we're moving computing workloads away from just the desktop into a kind of wider set, it ends up going from other places. So maybe you could actually talk a little bit, let's say that you do want to actually start measuring this, or you do want to start understanding what role you could play or what, or how you're able to at least measure this so you can start optimising it. Let's say you're working with servers right now and you're using a bunch of Linux computers. What are your options at now? So Ade, I'll start with you actually. I'm on a Linux machine, it's just one machine, and I want to understand the environmental impact of a particular service, or a machine learning job, or any kind of thing I'm about to do. Maybe you could just talk to me about what my options are right now, for the most part.
Aditya Manglik: Yeah, sure. There are a bunch of tools that I know about. The first tool that comes to my mind is something that I've looked at quite some time back, but it's a tool called PowerTOP. Just like you have the top utility in Linux based systems, this tool is called PowerTOP, and I think it is It used to be supported by Intel, and what this tool does is it tells you how much power each process is consuming on your system at any given point of time.
Now, sometimes those numbers are a little shaky. But it does a decent job. Post PowerTOP, I came to know about this interesting tool called Scaphandre. Scaphandre actually goes in and gets you the energy consumption. So Scaphandre monitors, I think Scaphandre has built some high level models for taking in all the information that we talked about in the earlier questions and actually calculating.
The energy consumption for a process. But the problem is that we assume that all of this is okay, is that if you talk about a desktop machine, it is not virtualized. We assume that all the number that you, the numbers that you see, they can be accounted to a single entity. Whereas if we go to the cloud, you have multiple entities running on the same hardware.
That's the fundamental premise of cloud, right? You want to reduce hardware costs by sharing the workloads. And that's where things get murky because we simply don't know how to separate out the energy for each entity. I think Hongyu would be able to shed more light on this.
Hongyu Hè : Yeah. Thank, thanks, Ali. I think those are really great points, especially you mentioned a tool called Scaphandre. I'm not sure if I'm pronouncing it correctly, but yeah, that's, that was one of the baselines of our paper. And indeed, as Chris has mentioned, there are tools available on Linux, it's not a thing people can use. There are tools and, for example, PowerTOP. But the models they have are coarse grained, meaning that, yeah, they are not computing by a fine grained energy attribution per user, per thread even, and we can talk about that, why that's important later, but in the cloud, as Adi has said, virtualization is a crucial technique, if not the most important technique, that enables users to share resources, but for energy attribution, actually, It's a key enemy, I have to say, because in order to get accurate energy attribution, we need to get access to hardware counters that tell us the statistics, the runtime statistics we need for our model to calculate the energy attribution. And that's, yeah, as Adi said, makes things much trickier.
Chris Adams: So if I can just take a step back for a second. So we spoke about, you've got a machine running, and, a machine will be running a series of, we might call them programs, but you might refer to them as processes, and within a given process, there might be a series of threads that's running, that kind of granularity is quite difficult, so if it's just my own machine, and it's just me using a computer, then you can attribute all of the figures to me, essentially, but when there's multiple processes or multiple programs for multiple people, working out who to share the kind of responsibility for the emissions, that's the difficult, that's the part that gets more complicated.
Hongyu Hè : Exactly, yeah, that's a fair summary. Thank you, Chris.
Chris Adams: Okay, we were talking about some of the tools available, so, and Adi, you were talking about Scaphandre as one of the tools which has become quite popular for this, for tracking some of this, but I understand, Hongyu, some of the research you did was, you've been using some other tools to help address some of the problems that you've come up against this when you're looking at basically working in a cloud like environment, for example, where you don't have absolute access to the computer yourself, for example. This was my understanding of some of the work that you're presenting at Hot Carbon, correct?
Hongyu Hè : Yes and no. Yes, it's because we are looking at how to accurately attribute energy in a multi tenant environment. And no, it's because attributing energy consumption in a virtualized environment is still an open question and we haven't solved it yet. And it will be very interesting to see future solutions to that. But indeed, we've compared with multiple tools like Scaphandre, and also the famous Cobbler, for example, but we explicitly Compared with those tools that run that target non-virtualized environment because in a virtualized environment, I think it's a fair game and no one knows exactly the wrong truth. What we found that is that existing tools are too course grain, meaning that when we use them to measure the energy of your application, for example, they will. Also mix in the consumption of other applications that run on the same server as your application does, which is a very common scenario. And we found that this could really lead to about 50% overestimation and over 90% underestimation.
Yes, in our paper, the main objective is really to measure the energy consumption of your application and only your application as precisely as possible and exclude the consumption of other applications.
Chris Adams: I can't help asking, when we start talking about these tools, is there an overhead from measuring your own footprint when you're trying to do this? Because as I understand it, this was This is one thing that has come up a few times, is that, for example, Scaphandre is written in Rust, so it's designed to be a very small, fast, lightweight program, which has some overhead, but I understand that there is going to be an overhead from tracking some of this in the first place, correct?
Hongyu Hè : Yes, that's a really great point. Thanks for bringing it up, Chris. Indeed, as you said, Scaphandre is written in Rust, and Rust is really an efficient language compared to, for example, Python and the mingle of energy attribution is to really have the precise knowledge of the energy efficiency of application so that we can improve and optimize our code accordingly. But as you said, yes indeed, there's a inherent trade off between the preciseness or how fine-grained our model is and the cost right in both terms of performance and and energy. And so our model takes into account, for example, the underlying hardware and to collect a more fine grain stati runtime statistics in our model.
But indeed, the overhead could be larger and mitigate the overhead we use conditional probability to do reasonable estimation whenever applicable, instead of trying to capture every single event per millisecond, so to speak. So that would be really costly. This part is a bit, you know, intricate, and we have more detailed mathematical formulation available in our paper.
But yeah, that's that's the high level idea.
Chris Adams: Okay, if I understand what you've been saying so far, so there's one option, which is to use like a fast programming language, which moves quite quickly, or there's another approach, which is to take A kind of sampling approach so that you are not having to, if you are using something which is maybe a little bit slower, like Python, you don't read quite so much.
And another option is to basically use something which is even closer to the metal, like in kind of the kernel space rather than in user space. And Adi, this is what I think you were talking about when you were talking about kernels. Is that, is that the case?
Aditya Manglik: Yes, that is the case. I think there's a very interesting data point that I read in some blog by Microsoft and what they told is basically, so there's three ways to measure energy. First is that your hardware directly tells you that, okay, I used X amount of joules and that would be a 98% accurate number.
It's not, it's still not 100%, right? Because of thermodynamics, but you still have 98% accuracy that, okay, this is the energy that this particular hardware device consumed. The next best step after that would be a kernel level measurement and a kernel level measurement would be, if I remember correctly, they pointed as 85% accurate.
And that is why it would be great to have something in the kernel, and that is why MacOs and Windows put these systems in the kernel to monitor the energy consumption. And finally, if you have something from the user space, now, it's not that accurate simply because it has visibility into a very small subset of the information that you need to get high in enough accuracy, and I completely agree with your point, that the more accurate you want, the more measurements you make, the more energy you are going to consume, right? So it's like a, it's like a catch 22 situation. I want to calculate something, but in the process of calculating it, I'm increasing the load on the system, and by increasing the load on the system, I am increasing the energy consumption.
You need to find out a good balance. between hardware and software based measurement mechanics.
Hongyu Hè : I think Adi mentioned a really great point, so I think the trade off is not necessarily in the programming language itself, but it lies in the model itself. So as Adi has said, the more fine grained your model is, the more costly it's likely to be. And I think we really need to strike a balance between how detailed you want your measurements to be and yeah, the cost it comes with it.
Chris Adams: I see, okay, and maybe this is a chance for us to zoom out a bit because, as I'm aware, one of the projects that the Green Software Foundation is currently involved with is this project called the Real Time Carbon Standard, the idea of creating some of this as something like a way to report these kinds of figures. As I understand it, one of the tools it seems to be standardizing on, and this is a project which is led by Adrian Cockcroft, who is a former VP of Sustainability and Cloud at Amazon and has basically a 20 plus year background working in this field, I believe they're settling on one tool called Kepler, specifically which ties into kernels to provide some of these numbers, but even then there is an ongoing discussion about, okay, how do you make sure that you have access to, how do you report numbers that are actionable, that developers or designers can use? Without actually disclosing too much information that might be a, a possible source of attack, like a kind of side channel attack, for example, and also what kind of resolution is necessary. Now, as I understand it, I think one of the things that people are pushing for there is the idea of going for minute level resolution rather than millisecond level resolution.
So at cloud level, that would al already be way further than what we have right now, but that might in theory, give you enough to then get an idea about what kind of impact you choosing to use, say, a computing job in one place might be compared to another, or at least give you something to optimize for carbon at that point. This idea of actually exposing the energy being used at this kind of level, I think there's a term that was mentioned in one of your papers about Energy Aware Computing or Energy Aware Cloud Computing? I'll ask you a little bit about this because I know that this was something I had to take away from you, but Adi, I'll come to you on this afterwards actually, because I think this is something that you've actually been speaking about at the LF Energy Core Forum as well actually. So. Maybe you could actually explain this idea of one energy aware cloud environment might actually be Hongyu.
Hongyu Hè : Yeah. As you mentioned that there's a great tool called Kepler. And I think, yeah, this kind of tools is very instrumental. as to, um, what kind of information they can give to both the users and the cloud. operators. And by Energy Aware or even Energy Intelligence, which is another level, is that we can make our decision based on, for example, the energy statistics we collected, for example, using those coupler or energy altogether tools to make decisions that optimize for, not only for performance but also for energy efficiency. And the reason for that is because data centers itself, or even networking, has huge potentials and they have great, you know, energy flexibility and we can use this kind of elasticity to do great things. For example, using data centers as energy storage or energy power bank for the smart grid. Yeah, I think that's one of the ideas, but there are, definitely a huge number of challenges we are facing in order to achieve this kind of energy aware cloud or energy intelligence cloud.
Chris Adams: All right, that feels like it's going in a somewhat different direction. So that's basically, but all that is necessary, in order for that to be possible, you can't be driving blind if you want to have this kind of awareness of what the grid's doing right? That's one of the ideas behind that. Okay, so maybe we should touch on why are we doing this in the first place? Because we spoke a bit more about yes, energy is coming from burning fossil fuels. We're not going to entirely transition our fossil fuels tomorrow so as long as we're burning fossil fuels to provide power there are going to be carbon emissions associated with this.
Adi I'd just like to speak to you about why you got involved in this, why you got excited about this in the first place or why you did do this? Because there must have been some process before you decided to try presenting a, like an energy conference and talk about personal computing in the first place.
I, I was quite surprised to see it, but I was very pleased to see someone actually talk about this and talk, talk about making some of this measurable.
Aditya Manglik: Chris, it's a personal story. It goes a long way back when I was an undergrad and in my undergrad, in my, I think, junior year, I had a laptop, which was not the best, and my battery had started to die out. I had exams to prepare for and my battery was acting up and I could not figure out I just charged in the morning why is it dead in 30 minutes so as a very simple minded engineering student my mind immediately went to the problem okay the battery is working good what is consuming energy let me kill off the applications that are consuming energy and that's how I got into the question Okay, I need to figure out which applications are consuming energy in order to kill them correctly, right?
And that's where the entire journey started. I could not figure out. And then it grew on me that, oh, how do we figure out? Because if we can figure this out... We can do a lot more very interesting things. So for instance, I think Hongyu mentioned really great points about energy aware scheduling in data centers, and I see a lot of effort from these hyperscalers to schedule workloads when renewable energy is available.
So when you talk about solar or wind energy, one of the key characteristics of these sources is that they're not 24 7 available. They're available in abundance. At a fixed point in time, and then they fluctuate a bit. So what you would want to do is you would want to maximize the utilization of these green sources when they are available. And if you can schedule your workloads at the right point in time, you can really decrease your carbon emissions. You can really decrease your utilization of fossil fuels while also maintaining your service level agreements with your customers. And that's a win situation for everyone.
You see how this simple problem of not being able to find out the energy consuming applications on my system turned into trying to save the world by reducing the energy consumption of data centers? I don't know. Yeah, so it's been a fascinating journey and I would love to keep going on this. But yeah, thanks for the question.
Chris Adams: Alright. Okay. So there is, um, I think what you might be referring to here is this notion of carbon intensity changing, depending on how abundant renewables are on the grid, for example. Is this something that you touched on as well when you were doing the work for your research on what you were presenting at Hot Carbon?
Hongyu Hè : Yeah, thanks for asking Chris. It's a great point that as we've discussed like how to use data centers as a utility or power bank, but I think our work is mainly targeting a user level optimization. And as you previously said, I really echo with The concerns from AWS so you know, the amount of information you're exposed to the user and the security concerns that, uh, come with it. And I think we need, really need to strike a balance, uh, between the two because users really need the information to optimize for energy efficiency. But on the other hand, you can't really, you know, expose too much information to the users because of the potential security concerns. And that's really, you know, a, a base, the, um, the virtualize the goal of virtualization, for example.
So it's tricky. But yeah, I think we need, uh, at least get something out for the user to optimize for energy efficiency.
Chris Adams: Okay, all right, so this is where some of this kind of cloud computing might be actually heading towards. Adi, you mentioned something about this idea of being able to control or adjust the carbon intensity of electricity by choosing certain times of day when there's an abundance of power in the grid. As I understand that, that's basically one of the reasons why you might do that, is because is that because there is the assumption there's more power than can be used. Maybe you could talk a little bit about how timing power, timing your usage when there's more renewables on the grid actually does help. Maybe you could expand a little bit on that because I know there was some useful research and I'm trying to find a link for it to bring into this, because there's a really nice model that's actually written in Python that actually demonstrates this and I found some pretty eye opening figures for it, but I figured maybe you might expand on or touch on some of this yourself because it seems to be something that you have an interest in as well.
Aditya Manglik: Absolutely. Let's take a look at two points and I think that would really help make this clear. Majority of us are working in the day, right? We go to our offices and we go back to homes. So majority of us use our devices during the day. And that's when we introduce a lot of work for the servers, right? So the data center operators like to call these patterns as diurnal patterns in which the usage spikes during the day and then dips during the night because people are sleeping.
And let's take a look at the second point. Second point would be, for example, the availability of solar energy. So solar energy, as you can guess, would be much more plentiful during the noon. and let's say less available during the evening. So what you want to do is you want to maximize the use of solar energy when it hits the peak.
But it turns out that people often maximize the usage of these devices after lunch, right? So what you do is if you have a surplus of energy available, you use it to schedule batch jobs. What do we mean by batch jobs? These are long running jobs. For example, training neural networks. During the time when solar is available, and you also keep serving your users and your customers using different sources of energy as and when they're available.
I really hope this example drives home the fact that careful balancing of our work as well as the availability of energy to do that work well, it really makes things happen for everyone.
Hongyu Hè : Yeah, so one quick point I have regarding what Adi has just mentioned is that actually I've done my bachelor thesis on energy procurement and modeling of energy in data centers. Actually it's quite surprising that loads of green energy is being dumped. And actually, uh, the, the, the smart grid is rejecting those green energies because, as you said, some parts of the world, uh, have a lot more excessive green energy than other places, for example, uh, Virginia, and, and I think it's a two-way bridge. By exposing more information to the users, on the other hand, cloud providers can also get more information about their workloads. And this can also benefit to their operation as to how they operate their data centers more efficiently and to participate more in the energy grid.
Chris Adams: I'm just gonna round up for the last few minutes. And I was just going to ask, if the people are interested in this kind of work and this kind of projects, how do they start, or what kind of tools would you suggest we look at? For example, if I start with you, Hongyu, then Adi, I'll come to you next. Hongyu, let's say someone, they've got some servers, or they're running some computing, and they want to start experimenting with these figures here. Where do they find out more about this? Is there a project that you would draw people's attention to, to look at on GitHub, or is there a thing you can pip install, for example, if you're running a computer, something like that?
Hongyu Hè : Actually, we implemented a prototype for our theoretical model called EnergAt, which is available on GitHub. Because we want to evaluate, uh, our theoretical model, uh, experimentally. And yeah, it provides users with both a command line interface and a Python API. So you can just download it by, uh, just pseudo pip install EnergAt, so E N E R. G A T. And sudo is very important here because we need the root permissions and it's being validated so you can find the details of our experiments in our paper, but in a nutshell, it can really precisely measure the energy consumption of our applications, even in a multi tenant environment. But it's not perfect.
As Chris has mentioned, If you want to contribute, there are plenty of opportunities. So, for example, we need a secure and efficient hardware software interface for energy reporting. And also, attributing energy in a virtualized environment is still an open question. And we might want to support more devices and more fine grained accounting as well. Yeah.
Chris Adams: Okay, cool. Thank you, Hongyu. And Adi, I think I'll leave the last word with you. If there's any projects or links you would direct people to, if they have an interest in any of this and would like to learn more.
Aditya Manglik: I think that's a very good question because people need to be aware of this. I think our audience would be using diverse devices. So please go to your device. If you have a Windows device, do pseudo parametrics and see what you get. If you have a Mac OS, go to the activity monitor and see the energy impact, okay?
Just, just see how much each process is impacting your battery. If you have Linux, please download Scaphandre and see how much Chrome or Firefox is using. And if you're really technical, please come and talk to me and Hongyu and we would love to dive deeper into more and more tools and help you solve your problems.
I really hope that gets people started. You can also look into Android and iOS because both of them report really good data about what these processes are using in terms of a battery. And once we build up enough awareness, I think then we can go deeper into. How to make these models better and how to reduce it, right?
Chris Adams: Okay, cool. Thank you for that, Adi. All right, so we've got options across all of the tools you might have there. And there's at least one thing people can start playing with. All right. Okay, gents, I think that takes us up to the time that we have available. And, yeah, thank you very much for coming on. And I quite enjoyed nerding out, plumbing the depths of finding out how to actually understand the energy used by various parts of our computing. Alright, cheers folks, thank you very much for your time, and yeah, I'll see you on one of the future episodes, alright? Take care folks, thanks.
Aditya Manglik: The pleasure was all ours, Chris. Thank you for having us on this call. I really enjoyed it.
Hongyu Hè : Thank you very much, Chris.
Chris Adams: Hey everyone, thanks for listening! Just a reminder to follow Environment Variables on Apple Podcasts, Spotify, Google Podcasts, or wherever you get your podcasts. And please, do leave a rating and review if you like what we're doing. It helps other people discover the show, and of course, we'd love to have more listeners. To find out more about the Green Software Foundation, please visit greensoftware.foundation. That's greensoftware.foundation in any browser. Thanks again, and see you in the next episode!


  continue reading

88 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide