Host Scavenging Feature

Host scavenging allows you to leverage the compute power of hosts that would ordinarily not be added to the cluster, like desktop computers. When idle, these resources run work sent from an application manager (such as Platform Symphony).

Contents

  • About host scavenging

  • Scope

  • Configuration to enable host scavenging

  • Host scavenging behavior

  • Configuration to modify host scavenging

  • Host scavenging commands

About host scavenging

The host scavenging feature adds hosts to the cluster that only run work when they are idle. Local users on the hosts are not interrupted, but once they are no longer using the host, the host is used in the cluster. When a user starts using the host again, the host is closed to the cluster and work runs on other hosts.

Figure 1. Host scavenging not enabled (default)
Figure 2. Host scavenging enabled

Summary of host scavenging process

  1. A scavenging agent, elim.sa, is included in the EGO package and deployed to hosts during installation. However this scavenging agent is disabled by default.

  2. An administrator enables the host scavenging agent on selected hosts.

  3. An administrator creates a special scavenging resource group and adds the scavenge-ready hosts. This separates opportunistic (scavenge-ready) hosts from dedicated hosts used deterministically by the cluster.

  4. Once the scavenging agent is enabled, it monitors the local load information and dynamically opens the local host for scavenging or closes it. If the host is closed, it is reclaimed.

  5. When the scavenging agent closes the host and the host is reclaimed, it no longer qualifies for allocation to any consumer until it is opened again. This happens automatically once the host is not busy (determined by configurable threshold values) if the scavenging agent closed the host.

Resource groups for scavenge-ready hosts

The host scavenging feature requires a resource group of scavenge-ready hosts . You set this resource group to exclude management hosts and include hosts with the static resource tag "scvg". Once set up, any new host added to the cluster with the resource tag "scvg" automatically joins this resource group.

The scavenge resource group must be the last resource group created.

Process priority

By default, when the scavenging agent opens the local host, it also sets the OS process priority of any future grid workload to lowest priority. This can be modified but only with help from Platform Computing. We do not suggest changing it.

  • Normal process priority: When set to normal, EGO allocates resources to run workload at normal process priority as controlled by the OS.

  • Lowest process priority: When set to this priority level, EGO allocates resources to run workload at the lowest process priority as controlled by the OS (on Windows, it is the setting for IDLE_PRIORITY_CLASS).

Scope

Applicability

Details

Operating system

  • Linux/UNIX

  • Windows

Security

  • No security issues

Dependencies

For this feature to work properly:
  • Platform Symphony must be installed on all hosts participating in scavenging.

  • An administrator must define the static resource tag "scvg".

  • A resource group defined for scavenge-ready hosts with the resource requirement

    select (!mg && scvg)
    .

  • The resource requirement for other resource groups that include compute hosts needs to specify

    !scvg
    to prevent host overlaps.

  • The resource group for scavenge-ready hosts should be created last. The order of creation of resource groups affects the order in which resources are reclaimed throughout the cluster.

  • External dependencies (database, shared file system) are not recommended for applications using scavenged hosts to run work.

Configuration to enable host scavenging

This feature is enabled by running the following commands.

Scavenge-ready hosts need both the scavenge resource tag and the agent control flag set.

  • Scavenge resource tag (scvg): Marks a host as scavenge-ready and allows it to be identified with a scavenge resource group.

  • Agent control (agent_control): Enables or disables the local scavenging agent. The value can be on, fastrelease, or off. Enabling the scavenging agent lets it monitor whether the host is busy or idle.

Where

Command

Description

On each host that you want scavenged

egoconfig addresourceattr "[resource scvg]".

Adds a "scvg" tag to hosts to indicate they are scavenge-ready.

From any host

egosh ego elimrestart SA on host_name

Sets the "agent_control" flag to "on" and enables the agent on specified hosts with a grace period, using default threshold values.

From any host

egosh ego elimrestart SA fastrelease host_name

Sets the "agent_control" flag to "fastrelease" and enables the agent on specified hosts without a grace period, using default threshold values.

Follow the steps in Enable host scavenging to set up this feature.

Host scavenging behavior

Scavenge-ready host states and status

When the scavenging agent detects that the host is busy, it closes the host. The running workload is terminated after a grace period and the host is prevented from further allocation.

The host status changes to closed and the reason indicates that the scavenging agent closed the host.

Note that the reclaim grace period set for a consumer does not apply when a scavenge-ready host is configured using the fastrelease command option.

When a scavenged host starts and stops running cluster work

The scavenging agent opens a host when all three of the following configurable thresholds indicate that a host is not busy.

Threshold

Display Name

Description

Preconditions for triggering scavenging

User idle time (minutes)

uit_t

User idle time threshold of the host in minutes

User idle time setting is exceeded

CPU utilization (%)

cu_t

CPU utilization threshold of the host as a percentage

CPU utilization is lower than setting

CPU idle time (minutes)

cit_t

CPU idle time threshold of the host in minutes

CPU idle time setting is exceeded

The combination of these three thresholds being reached triggers a host to be opened and ready for opportunistic workload.

When the host starts being used locally, the threshold values are no longer met and the scavenging agent closes the host, and the host is reclaimed.

Once the thresholds are reached again (indicating that the host is not busy once more), the host is automatically opened again.

How to determine if a host is busy or not

Server and desktop scavenging are supported. If uit_t is set to 0, server scavenging is assumed; otherwise, desktop scavenging is assumed. For server scavenging, the CPU idle time is the only criteria to determine if a host is busy or not.For desktop scavenging, the user idle time is the main criteria. The host is closed when the user idle time is below the threshold. If the user idle time reaches its threshold, the CPU idle time is considered. When both thresholds are reached, the host is opened. At other times, the host does not change state.The CPU idle time is reset if the CPU utilization threshold is reached.

Defaults

When no thresholds are specified, the following default values are used.

Threshold

Default Value

User idle time (minutes)

10

CPU utilization (%)

0

CPU idle time (minutes)

10

Configuration to modify host scavenging

Modify host scavenging in the following ways:
  • Configuration to define thresholds

  • Configuration to modify cluster reclaim behavior

  • Configuration to disable host scavenging

  • Configuration to disable the grace period

  • Configuration to change process priority

Configuration to define thresholds

Command

Example

Behavior

egosh ego elimrestart SA on, uit_t,cu_t,cit_t host_name ..

egosh ego elimrestart SA on,2,0.3,1.67 host1

Changes the threshold values for host1 to

  • User idle time threshold of 2 minutes

  • CPU utilization threshold of 30%

  • CPU idle time threshold of 1.67 minutes (or 100 seconds)

You can modify the default threshold values that determine when the scavenging agent opens and closes the scavenged host.

Configuration to modify cluster reclaim behavior

Setting the cluster to reclaim before borrowing makes sure that scavenged hosts are borrowed by other consumers only after all their own resources are reclaimed and used up.

It is a best practice to configure the cluster in this way when using the host scavenging feature.

Configuration

Behavior

Reclaim lent resources before borrowing selected in Cluster Properties.

Makes sure that any resources lent out are reclaimed for use by the owner before borrowing begins to satisfy demand.

Configuration to disable host scavenging

Command

Example

Behavior

egosh ego elimrestart SA off host_name ...

egosh ego elimrestart SA off host1

The scavenging agent no longer monitors the scavenge-ready hosts and no longer opens or closes them according to the thresholds set.

Until you delete the scavenge resource group, work can continue to run on these hosts.

Note:

Use the keyword all to disable the agent on all hosts running it at once. Otherwise, the command is for the specified hosts only or if no hosts are specified, only for the local host.

Delete the scavenge resource group and the scavenge consumer.

N/A

Once you delete the scavenge resource group and the scavenge consumer, as long as the hosts do not belong to any other resource groups, work is no longer allocated to those hosts.

Configuration to disable grace period

Command

Example

Behavior

egosh ego elimrestart SA fastrelease host_name ...

egosh ego elimrestart SA fastrelease host1

When a predefined threshold is reached, the scavenging agent closes the host and terminates running workload without a grace period.

Host scavenging commands

Commands for submission

Not applicable. There are no submission commands that affect host scavenging.

Commands to monitor

Command

Description

Configure Resource Groups for scavenge resource group: List of member hosts

All hosts that are listed in the scavenge resource group in the Member hosts section and have the state ok are scavenge-ready. Add the status column using the table preferences.

Hosts (List View): agent_control

Hosts that are listed with agent_control as on have the agent control flag turned on, meaning the scavenging agent is monitoring and controlling the host according to the threshold values set.

Hosts (List View): scvg

Hosts that are listed with scvg have the scavenge resource tag applied to them. These hosts are scavenge-ready and are dynamically added to a resource group that specifies a resource requirement of select (!mg && scvg).

egosh resource list -o status,ut,it,agent_control,uit_t,cu_t,cit_t host_name

Lists the scavenge-related information for a host.

Hosts need both the agent control set to on and the scavenge resource tag (scvg) applied for host scavenging to function properly. If a host is missing one of the two, the feature does not work properly.

Commands to control

Command

Description

egosh ego elimrestart SA on host_name ...| all

Turns the scavenging agent on for a specific host or for all hosts with the resource "scvg" tag associated with them. Uses default threshold values.

egosh ego elimrestart SA off host_name ...| all

Turns the scavenging agent off for local (if no host specified), a specific host name, or all hosts (using the keyword all).

egosh ego elimrestart SA on,2,0.3,1.67 host_name ...| all

When turning the scavenging agent on, you can also set the threshold values.

egosh ego elimrestart SA fastrelease host_name...| all

When turning the scavenging agent on, you can also disable the grace period (fastrelease). By default, the grace period is enabled.

Commands to display configuration

Command

Description

Hosts (List View): uit_t, cu_t, and cit_t

View the thresholds set for each scavenge-ready host by using the Platform Management Console.

Scroll to the right to see the values set for the thresholds for user idle time (uit_t), CPU utilization (cu_t), and CPU idle time (cit_t).