Using standby services to reduce service startup times

Standby services minimize the need to restart services at the time resources are allocated to an application by allowing these services to run idle when there is no workload.

Scope


Applicability

Details

Operating system

  • Windows

  • Linux

  • Solaris

Limitations

The standby service feature is not supported by Symphony DE. In this case, standby service configuration is ignored by Symphony.


About standby services

To maximize the utilization of resources, Symphony, by default, releases resources as soon as there are no tasks pending. Each time the resources are released, service instances on these resources are terminated. When more tasks are received, new resources are allocated and the services start again. Sometimes starting up a service takes much longer than the actual run time of the workload; for time-critical workload, this may not be acceptable.

Standby services minimize the need to restart services when resources are allocated to an application by allowing these services to keep running. Standby services also allow other consumers to use these resources when there is no workload for the running service. This is due to the fact that standby services do not occupy slots, thereby allowing EGO to allocate these resources to other applications. Once the service instance is associated with a slot and is used to run tasks, it is no longer considered a standby service.

Standby services are only started on a resource when the application has workload to process. So it is possible that even though all the resources in a resource group are configured to run standby services, not all resources may have them running. Once the standby service is started, it remains running until the application is unregistered or disabled.

System behavior when applications are configured with standby services

This section describes Symphony behavior during the lifecycle of a standby service.

  1. The lifecycle begins when an application is registered and enabled.

  2. The Session Director reads the application profile and starts the Session Manager (SSM) for the application.

  3. When the SSM receives workload for the application, the SSM requests resources from EGO.

  4. For each resource received, the SSM sends the service information to the SIM and the SIM starts the standby service instance.

  5. When the workload is finished, the SSM releases the slot to EGO but keeps the standby activity alive for the service. For applications with multiple services, the standby activity for the default service is kept alive.

  6. EGO unallocates the slot but keeps the standby service running.

  7. If the SSM receives new workload, EGO allocates the resources that have the standby services running.

  8. When the SSM receives the resource allocations from EGO, it associates the resource with the SIM already running on the resource, rather than start a new SIM. The activities are reassociated with the allocations.

  9. The standby services are shut down when the application is disabled or unregistered.

System behavior when applications with standby services share resources

When EGO allocates resources to applications, it first searches through the resources groups that are dedicated to standby services. Therefore, it is recommended that the consumer own all the resources in a resource group dedicated to standby services, as it guarantees that the resources with standby services running will be available to the consumer when they are needed. If the consumer has unsatisfied demand and it previously lent out resources to another consumer, the lending consumer recalls the owned resources.

The resource plan for consumers with standby services should not allow borrowing, as this would not guarantee the availability of a resource with the standby service running when EGO allocates the resource to the application.

Failure recovery

This section describes system behavior pertaining to the recovery of standby services in the event of a Symphony component failure.

EGO failure

After recovery, all the information related to standby services is restored by EGO. All the allocations and activities will be recovered including the activities without slots allocated

SIM failure

If the SSM detects a standby SIM failure, the number of slots with standby services in the system decreases by the number of slots affected by the failure. There is no request to restart the standby SIM immediately. When workload is submitted, the SSM requests the necessary resources and then consumes the ones with standby services first. If the resources demanded by the workload consume all the existing standby services within the system, the SSM requests EGO to start new SIMs, which start new serv ices. After the workload completes, the SSM returns the resources and keeps the SIMs and their service instances in standby mode.

Standby service failure

Since the SIM does not monitor the Service Instance while it is idle, if the standby service goes out of service, the SIM will not know it. When the SIM is assigned to a session that wants to use the standby service, the SIM must start a new service before it can submit tasks to it.

Standby services and preloaded services - what’s the difference?

Symphony offers two ways to handle services with long start-up times: standby services and preloaded services. Preloaded services are started before workload is submitted by the client and require the resource to remain allocated to the application. Consequently, the resource cannot be shared with other consumers. A resource with standby services running, on the other hand, is only allocated to an application when there is workload to be processed. Once workload is finished, the resource is released (with the service running in standby mode), and made available to other consumers.

Standby services can be combined with preloaded services. In this case, an application can be configured to have a number of slots for preloaded services in addition to standby services. When the application receives workload, if the number of pre-loaded services cannot satisfy the demand, the SSM requests new resources from EGO and uses the standby services to supplement the requested resources.

When to use standby services

Standby services are recommended for environments where the tasks are sent intermittently and the service startup time is relatively long in comparison to the running time of the tasks. By using this feature, users can reduce the impact of starting services on the overall task turnaround time.

Here are additional considerations when deciding if standby services are the best choice:

  • Since standby services are kept running and they occupy host resources such as memory, they are not recommended for services with nominal startup times or services that are memory-intensive, which makes the host unusable by other applications. In this case, it is recommended to use preloaded services to retain the resources and keep the services running.

  • Does this application's resource plan entitle it to own (ownership) or deserve (share ratio) a number of slots? If the answer is no, then it is not suitable for standby services.

  • Do other applications need the selective reclaim feature? If the answer is yes, do not use standby services as standby service configuration will not allow selective reclaim to take effect.

Configuring standby services: best practices

You should follow the best practices outlined here to ensure that resources with standby services running are available when required by the application.

  1. Create a resource group exclusively for the application’s standby services.

  2. The application’s consumer should own all the resources in the resource group.

  3. Enable lending and disable borrowing in the consumer’s resource plan.