Eco-Friendly Datacenter Management: Power Management Strategies

Fri, 08 Sep 2023 10:19:28 CEST · 11 min read · VMware Aria Operations Power Management Datacenter Sustainability Virtualization Energy Efficiency

Sustainability is key for EU organizations

GREENHOUSE GAS EMMISSIONS (GhG):

We're cutting emissions and costs by optimizing workloads and promoting virtualization.

A Green Score in Aria Operations tracks progress with five parameters, to target improvements, these are:

Workload Efficiency, Resource Utilization, Virtualization Rate, power Source, and Hardware Efficiency.

The large Hyperscalers are saving, and so should we.

vSphere Distributed Power Management

The vSphere Distributed Power Management (DPM) feature optimizes power usage in virtualized data centers by dynamically adjusting server power states based on workload demand, saving energy and costs while maintaining high availability. Its excellence lies in its ability to optimize power consumption and dynamically adjusts the power state of servers based on workload demand. Unused servers are powered off or placed in a low-power state. This reduces energy costs, and contributes to environmental sustainability. So, have you tried to give DPM a second chance? If not, please consider it. For every kilowatt consumed by servers you would need cooling!

DPM uses Wake-on-LAN, IPMI, or iLO for powering on hosts.
Prior configuration of IPMI or iLO is needed for each host before enabling DPM.
Test exit standby for each host before enabling DPM.
Individual host overrides are set from the Host Options page.

DPM Settings	Power Management	Power-On Conditions	Power-Off Conditions
Off	No power management recommendations by vCenter Server. Overrides possible but inactive until Manual or Automatic is set.	-	-
1	Applies recommendations for vSphere HA and capacity requirements.	-	-
2	Applies recommendations for vSphere HA and capacity requirements.	Applied if host utilization much higher than target.	Applied if host utilization extremely low.
3	Applies recommendations for vSphere HA and capacity requirements.	Applied if host utilization higher than target.	Applied if host utilization very low.
4	Applies recommendations for vSphere HA and capacity requirements.	Applied if host utilization higher than target.	Applied if host utilization moderately low.
5	Applies recommendations for vSphere HA and capacity requirements.	Applied if host utilization higher than target.	Applied if host utilization lower than target.

For more Host power management options, have a look at this

Powering off VMs

I am not going to talk more about Aria Operations much, in which you could Right-size VMs, Reclaim Wastage, Reclaim Hosts, Kill Idle VMs, Optimize Oversized Clusters, do SLA-based Operations, that would contribute to the Greenscore above.

Servers, even when idle, consume power, which is a significant cost factor in data centers. By powering off servers (VMs) outside of work hours, we can improve workload efficiency, resource utilization, virtualization rate, power source efficiency, and hardware efficiency, ultimately leading to cost savings and a more sustainable data center operation.

By shutting down servers during off-hours, have positive effects, we...

Reduce workloads in our data center, active servers run more efficiently, less thermal load.
Free up resources that can be allocated to other tasks, improving utilization
Consolidate active workloads onto fewer physical servers == Higher virtualization rate
Reduce power consumption, reducing our carbon footprint.
Extend the lifespan of our hardware components and longevity. Less runtime of hardware

Have a look at what google does

New and Improved

Power Offs After a previous series of articles, we've improved the material behind the scenes. Now, when workloads are deployed, we've got improved scripting and reduced the number of orchestrator workflows to ONE single workflow, and we can choose the working hours for the deployed servers in the UI. This means shut-down and Start-up procedures for servers will align more with the business needs and requirements.

Requesting a new deployment

In the Aria Automation Service Broker, ie. the Self Service Portal, I now have a new Catalog Item “OperatingHours”, just to separate it from the Previous “Save Power” catalog item. Notice: In the search field, I have typed “power”.
When I click “Request”, a new Form that differs slightly from the previous form pops up, and we can fill in our deployment name, the VM size, and which environment we would like this server to go into. Note: The server type (Linux/Windows) has been predefined in the Cloud Template for us. Just to make the example easier and more readable. Let's Click the "Powersave Mode" checkbox.

==NEW!==

Two New fields appear, where we can choose the Power On, and Power Off hours. I have selected 06:00 in the morning and 18:00 (6pm) in the evening. Now we just Submit it.

Deployment + Cloud Template

TAGS: After the deployment, we’ll see the vSphere tags produced by Aria Automation; poweron: 6, poweroff: 18. os: windows, adv-powersave: true

Let’s have a look at what’s behind the scenes! Let’s head over to the Aria Automation Assembler and have a look at our cloud Template:

The Inputs

The Cloud Template inputs are rather easy, and have no magic attached to them

 1  # Inputs to Enable or disable power saving mode, and set times
 2  adv-powersave:
 3    title: Powersave mode
 4    type: boolean
 5    description: |-
 6      <h3> Enable powersave mode </h3>
 7      Please consider to contribute to our Sustainability by automatically shutdown workloads as a green service <br>
 8       <b>  </b>
 9      </ul>      
10  poweron:
11    type: integer
12    minimum: 1
13    maximum: 24
14  poweroff:
15    type: integer
16    minimum: 1
17    maximum: 24

The Tagging

So is the vSphere tags that goes with it

1  # vSphere Tags for the new deployment
2    tags:
3        - key: adv-powersave
4          value: ${input.adv-powersave}
5        - key: poweron
6          value: ${input.poweron}
7        - key: poweroff
8          value: ${input.poweroff}

So basically you would end up with 3 tags, adv-powersave=true, poweron=06:00, and poweroff=18:00

Failsafing the user Input in Service Broker

More behind the scenes, how do we control that the user puts in correct Times, (using hh:mm). Let’s head over to the Aria Automation Service Broker again. Let’s go to “Content and Policies” > “Content” , find our “OperatingHours” and go to “Customize Form” to make a Custom Request form.

The Poweron and Poweroff fields

The Appearance. The only Magic here is that we show the poweron and poweroff fields only if you selected the boolean “powersave mode”.
The Values. As you can see from the next picture, our values are pre-defined for each hour. A little cumbersome, but it works.
The Constraints. We just give it an extra check if the constraints are between 1 and 24

Orchestrator Workflows

After the deployment, the machine is deployed in vSphere with it’s tags, and for now the Aria Automation job is done, except from the underlying and embedded workflow system Aria Automation Orchestrator.

We have replaced the previous two workflows with a single workflow, and there is a schedule every hour.

A simple workflow with a single python script

Honors to Lars Olsson, VMware Sweden, that has provided the Python code and the brains behind it.

Checking the poweron and poweroff values

 1# Main function to handle power management
 2def handler(context, inputs):
 3    global bearer_token
 4
 5# Retrieve bearer_token from vraauth function
 6bearer_token = vraauth(inputs)
 7
 8# Obtain resource data with the "powersave" tag and poweron/poweroff values
 9resource_data = get_resource_ids_with_powersave_tag(bearer_token, inputs)
10
11# Determine the current time on the server
12server_current_time = datetime.datetime.now()
13
14# Adjust the current time by adding the time difference (2 hours in this example)
15current_time = server_current_time + datetime.timedelta(hours=2)
16
17# Print the adjusted current time
18print(f"Current time is {current_time.time()}")
19
20# Extract the hour component of the current time
21current_hour = current_time.strftime("%H")
22
23# Check if the current time matches the poweron_value of any machine
24for resource in resource_data:
25    poweron_value = resource["poweron_value"]
26    poweroff_value = resource["poweroff_value"]
27    resource_id = resource["resource_id"]
28    resource_name = resource["resource_name"]
29    resource_address = resource["resource_address"]
30
31    if poweron_value:
32        adjusted_poweron_hour = int(poweron_value) - 1
33        if current_time.hour == adjusted_poweron_hour:
34            #send_to_slack(f"POWERSAVE: Powering on resource ID: {resource_id}", inputs)
35            power_on_resource(resource_id, resource_name, resource_address, inputs, bearer_token)
36
37    if poweroff_value and current_time.hour == int(poweroff_value):
38        #send_to_slack(f"POWERSAVE: Powering off resource ID: {resource_id}", inputs)
39        power_off_resource(resource_id, resource_name, resource_address, inputs, bearer_token)
40
41# No action needed for other cases

Find vSphere resources with powersave tags and retrieve the values

 1
 2# Function to get resource IDs with "powersave" tag and retrieve poweron and poweroff values
 3def get_resource_ids_with_powersave_tag(bearer_token, inputs):
 4    # vRA API URL
 5    url = inputs["vra_url"]
 6    # vRA API headers with bearer token
 7    vraheaders = {
 8        "accept": "application/json",
 9        "content-type": "application/json",
10        "Authorization": "Bearer " + bearer_token
11    }
12    # Initialize an empty list to store the resource IDs and their poweron and poweroff values
13    resource_data = []
14    # Use a session to make multiple requests to the vRA API
15    with requests.Session() as session:
16        # Make a GET request to the /deployments endpoint to get deployments where resourceTypes=Cloud.vSphere.Machine and tags:adv-powersave=true
17        resp = session.get(f"{url}/deployment/api/resources/?resourceTypes=Cloud.vSphere.Machine&tags=adv-powersave:true", headers=vraheaders, verify=False)
18        # Raise an error if the response status code is not 200 OK
19        resp.raise_for_status()
20        # Get the content of the response
21        json_resp = resp.json()
22
23    # Loop through each resource
24    for item in json_resp["content"]:
25        # Get the resource ID, name, and address
26        resource_id = item["id"]
27        resource_name = item["name"]
28        resource_address = item["properties"]["networks"][0]["address"]
29        
30        # Initialize poweron and poweroff values as None
31        poweron_value = None
32        poweroff_value = None
33        
34        # Get the "properties" field
35        properties = item["properties"]
36        
37        # Loop through each tag in the "properties" field and check for poweron and poweroff values
38        for tag in properties.get("tags", []):
39            if tag["key"] == "poweron":
40                poweron_value = tag["value"]
41            elif tag["key"] == "poweroff":
42                poweroff_value = tag["value"]
43        
44        # Append the resource ID, name, address, and its poweron and poweroff values to the resource_data list
45        resource_data.append({
46            "resource_id": resource_id,
47            "resource_name": resource_name,
48            "resource_address": resource_address,
49            "poweron_value": poweron_value,
50            "poweroff_value": poweroff_value
51        })
52
53# Return the list of resource data
54return resource_data

power off

 1
 2# Function to power off a specific resource
 3def power_off_resource(resource_id, resource_name, resource_address, inputs, bearer_token):
 4    # vRA API URL
 5    url = inputs["vra_url"]
 6    # vRA API headers with bearer token
 7    vraheaders = {
 8        "accept": "application/json",
 9        "content-type": "application/json",
10        "Authorization": "Bearer " + bearer_token
11    }
12    # vRA API payload to power off the resource
13    payload = {
14        "actionId": "Cloud.vSphere.Machine.Shutdown",
15        "inputs": {},
16        "reason": "Power Off"
17    }
18    # Send the power off request to vRA using the requests library
19    with requests.Session() as session:
20        resp = session.post(f"{url}/deployment/api/resources/{resource_id}/requests", headers=vraheaders, json=payload, verify=False)
21        try:
22            # Raise an error if the response status code is not 200 OK
23            resp.raise_for_status()
24            # Send a message to Slack to inform that the resource is being powered off
25            send_to_slack(f"SustainaBot: Power off successfully called for resource: {resource_name} - {resource_id} (Address: {resource_address})", inputs)
26        except requests.exceptions.HTTPError as err:
27            # If the status code is 400, log the error and continue to the next resource
28            if err.response.status_code == 400:
29                print(f"SustainaBot: Power off failed for resource: {resource_name} - {resource_id} (Address: {resource_address}): {err}. Is it already powered off?", inputs)
30            else:
31                # If the status code is not 400, raise the error
32                raise

power on

 1# Function to power on a specific resource
 2def power_on_resource(resource_id, resource_name, resource_address, inputs, bearer_token):
 3    # vRA API URL
 4    url = inputs["vra_url"]
 5    # vRA API headers with bearer token
 6    vraheaders = {
 7        "accept": "application/json",
 8        "content-type": "application/json",
 9        "Authorization": "Bearer " + bearer_token
10    }
11    # vRA API payload to power on the resource
12    payload = {
13        "actionId": "Cloud.vSphere.Machine.PowerOn",
14        "inputs": {},
15        "reason": "Power On"
16    }
17    # Send the power on request to vRA using the requests library
18    with requests.Session() as session:
19        resp = session.post(f"{url}/deployment/api/resources/{resource_id}/requests", headers=vraheaders, json=payload, verify=False)
20        try:
21            # Raise an error if the response status code is not 200 OK
22            resp.raise_for_status()
23            # Send a message to Slack to inform that the resource is being powered on
24            send_to_slack(f"SustainaBot: Power on successfully called for resource: {resource_name} - {resource_id} (Address: {resource_address})", inputs)
25        except requests.exceptions.HTTPError as err:
26            # If the status code is 400, log the error and continue to the next resource
27            if err.response.status_code == 400:
28                print(f"SustainaBot: Power on failed for resource {resource_name} - {resource_id} (Address: {resource_address}): {err}. Is it already powered on?", inputs)
29            else:
30                # If the status code is not 400, raise the error
31                raise

Send to slack

 1
 2# Function to send a message to a Slack channel
 3def send_to_slack(message, inputs):
 4    # Slack webhook URL
 5    webhook_url = inputs["slack_webhook_url"]
 6
 7# Slack message payload
 8payload = {
 9    "text": message
10}
11
12# Send the message to Slack using the requests library
13response = requests.post(
14    webhook_url, data=json.dumps(payload),
15    headers={'Content-Type': 'application/json'}
16)
17
18# Raise an error if the response status code is not 200 OK
19if response.status_code != 200:
20    raise ValueError(
21        f'Request to Slack returned an error {response.status_code}, the response is:\n{response.text}'
22    )