Client development guidelines

Type of program

A Symphony client does not need to be an end user program. It can be a middle-tier proxy server to pass multiple end users’ compute requests to a grid, and then return the compute results back to the end users. It can also be a master application program or even a service that itself is a compute-intensive piece to work with many other services.

Uninitialization

Once you uninitialize, all objects become invalid. For example, you can no longer create a session or send an input message.

Serialization and deserialization

Remember to double-check that your Message object serialization and deserialization order are the same.

A client can call different service methods

The Symphony API only provides a basic service invocation mechanism via opaque input/output messages and an onInvoke( ) service call. If needed, you can call different methods in a service on top of this mechanism.

Ensure permissions when writing logs

When you run a client, make sure client application has write permission under its current working directory since it needs to write logs, or change the directory in which the client application writes logs in the api.log4j.properties file.

Client applications log to the current directory by default. This is defined in the $SOAM_HOME/conf/api.log4j.properties on Linux, and %SOAM_HOME%\conf\api.log4j.properties on Windows.

The client log is updated once the client attempts to access the Symphony API. Ensure your client application has write privileges to the directory specified in the properties file.

Threads and multithreading

Synchronous clients are not required to do anything special or even be aware of the API's use of threads. Asynchronous clients need only to follow basic rules for working in a multi-threaded environment, such as for example, not blocking the callback thread. The API implementation for the most part hides its threading model from the developer.

Memory management in the client for Java and .NET

Client and service applications handle a lot of data, and out of memory errors may occur.

To help control memory, you can do the following in your code:

  • When retrieving task output:

    • Retrieve task output asynchronously, using the Session Callback object.

    OR

    • Retrieve task output synchronously, in groups. For example, if you send 1000 large input messages, retrieve output in groups of 50 rather than all at once.

  • On both the client and service, set unused references to null.

  • When you have large data or many tasks, try to trigger the garbage collector periodically to collect unused objects and avoid running into memory issues.

  • For applications that consume large amounts of memory, consider implementing Symphony serialization instead of native serialization. Use the following guidelines to determine memory requirements for processing byte arrays and strings.

    • Memory requirements for byte arrays:

      Symphony serialization—3x the size of the byte array

      Native serialization—4x the size of the byte array

    • Memory requirements for strings:

      Symphony serialization—2x the size of the string in bytes. For example, if a string has 10 characters = 20 bytes, it will require 40 bytes.

      Native serialization—2.5x the size of the string in bytes. For example, if a string has 10 characters = 20 bytes, it will require 50 bytes.

Recoverable clients

A recoverable client is a client that can tolerate an abnormal termination of its execution and is able to recover and continue to process workload. Recovery of such a client usually involves it being restarted and given enough context to allow it to connect and open an existing session that previously contained its workload. For this type of client, it is usually recommended to set the discardResultsOnDelivery attribute to “false” in the applicaton profile to allow for a simplified recovery procedure.

Large number of tasks

For large numbers of tasks, for example, if you have 100,000 tasks in one session, you can get better performance by retrieving output asynchronously in a callback function. If you prefer to retrieve output synchronously, it is recommended to retrieve output in smaller groups—for example, you get better performance if you retrieve output for 10,000 tasks at a time, instead of retrieving output for 100,000 tasks with one fetchTaskOutput( ) call.

How many sessions to create

There are different ways to manage the Symphony session. You can:
  • Close a done session immediately

  • Keep the idle session open

Close a done session immediately

The simplest way is to create a session, tightly pack all the tasks in, get all the outputs out, then close the done session.

Keep an idle session open for quick responsiveness of loosely-packed tasks

For applications that have very short tasks and tasks that come in periodically, creating a session for every discrete task pack or every task is not efficient because a session is a heavier scheduling unit than a task. It takes longer to create and close a session than to send a task within an existing session.

For this type of application, create a session and keep it open even if you do not have tasks for a short time period. This way the system responds much faster.

When there is no task in a session, Symphony immediately moves the service instances from the idle session to other busy sessions.

It is worth noting that it is not a good idea to keep an idle session open for too long, because open sessions occupy system resources. As a best practice, close a session if it is idle for too long. (The appropriate length of idle time should be determined by the developer.)

Smart pointers

A smart pointer is an object that encapsulates a real reference. When an object is no longer required, a smart pointer frees it. As a developer, you need not be concerned about catching problems like memory leaks.

In Symphony, you use smart pointers for all objects that are not user-implemented. You never need to clean up an object that is created with the API. When objects are out of scope, they clean up themselves.

Remember:

Smart pointers do not exist for objects that are user-implemented such as service containers, messages, and common data. For these objects, you still need to free up memory and manage it.

Data

Limits on message size

Symphony has no hard limit for the message size other than the physical limits imposed by systems outside Symphony. These limits may be determined by the size of the physical and virtual memory, and the operating system. The maximum data size also depends on the type of serialization used. Here is a guideline:

Note:

Available memory refers to the usable physical memory at the moment, and not just the manufacturer's specification for the memory.

Windows limits

  • Symphony Serialization: 500 MB maximum data size, due to the application memory limit of 2 GB for a 32-bit host; requires at least 2 GB of available memory on the host to support this.

  • Native Serialization: 400 MB maximum data size, due to the application memory limit of 2 GB for a 32-bit host; requires at least 2 GB of available memory on the host to support this.

Linux limits

  • Symphony Serialization: 800 MB maximum data size, provided the host has at least 3 GB of available memory.

  • Native Serialization: 800 MB maximum data size, provided the host has at least 3.5 GB of available memory.

Optimum ratio of task message size to task compute time

The optimum ratio of task message size to task compute time depends on the network bandwidth and the performance target you want to achieve.

In a normal size grid (100Mbps or 1Gbps network, 500 CPUs), to achieve > 90% CPU efficiency, the best practices are:

  • If the ratio of task data size/task compute time is less than 10KB/second, send the task data by value; otherwise send the task data by reference

  • If the task data sending time is less than 10% of the task compute time, then send the task data by value; otherwise send the task data by reference

Distributing data among tasks: by value or by reference

A symphony session manager is responsible for managing and scheduling sessions, services, and tasks. Overloading a session manager with data is not a good idea because it slows task distribution across compute hosts. As a best practice, think about pass-by-value versus pass-by-reference.

Pass-by-value with sendTaskInput() to pass small, task-specific data

Use sendTaskInput() only to pass small amounts of task-specific data.

If the task-specific input and output data is small, a Symphony client or service can pass the input and output data by value. The client sends the data value in the Symphony task message through session manager. The service gets the data value through the Symphony message from session manager.

Pass-by-value with common data if the dataset resides in a client

If the shared market dataset resides in a client, the client can distribute the data with session common data. The data is distributed to the service instance when the service instance is assigned to the session. The service instance can access the common data from the onSessionEnter() method. The client can update the common data by using the update( ) method.

Service instances can cache the data in memory or the local disk for multiple tasks. You only need to use the sendTaskInput() call to pass small task-specific data.

Pass-by-reference with common data for large data or when dataset resides in a shared location

If the shared market dataset resides in a shared location such as a database, file system, or cache system, the client can distribute a reference to the shared data with session common data. The reference to the dataset is distributed to the service instance when the service instance is assigned to the session. The service instance can access the common data from the onSessionEnter() method. The client can update the common data by using the update( ) method..

Service instances can load the data from the shared location and cache it in memory or the local disk for multiple tasks.

You only need to use the sendTaskInput() call to pass small task-specific data.

Using external data sources

In addition to the session common data and task input/output data, the service instances and tasks can also receive input data from other data sources, and save output data to other data destinations. These data sources and destinations can be a database, a file server, a cache system, or even directly with the client application.

Data loss prevention

In a grid environment, there may be hundreds or thousands of compute hosts distributed in a cluster. In a typical risk management application, there may be hundreds of thousands of perturbations of market data/conditions. Each one of these can be a workload unit.

When you submit this workload to a grid, you expect the grid system to distribute the workload on grid, and guarantee processing without losing any workload, even if there are failures in hardware or software in:

  • Grid management machines or software

  • Compute machines and service applications

A reliable grid system should guarantee a transactional handling of application execution on the grid. A failure or even an entire system reboot should not require rerunning the workload from the beginning.

One problem in a traditional MPI-based parallel application is that when there is a failure in a distributed environment, the MPI-based application may fail and need to rerun from the beginning. Rerunning a large workload or the entire workload in the system not only wastes time and resources, but also may miss the time window of business opportunities.

Add recovery with recoverable sessions

Platform Symphony supports reliable computing by persisting Symphony session and task inputs and outputs. However, sometimes you may not want to recover your workload when a failure or error happens, or, you may want to trade persistency for performance— task persistency takes time and disk space and may slow down the overall system response time.

You can define whether a session is recoverable or non-recoverable in the application profile through the session type. In the client application, you can then specify the appropriate session type in createSession().

Choose a recoverable session when

  • You have a long session that may last hours to compute many CPU-intensive tasks, and you do not want to waste CPU cycles to resubmit tasks in the session if a failure or error occurs.

  • It is difficult or impossible to resubmit tasks in the session when a failure or error occurs.

  • You have a mission-critical session that has to be finished before a deadline.

Choose a non-recoverable session when

  • You have a short session that may only last for minutes, and you can always create a new session to resubmit tasks if a failure or error occurs.

  • You want Symphony to immediately clean up the session and release the CPUs if a failure or error happens. Keeping this session running in the system is just waste of CPU cycles.

  • You have an interactive online session that requires quick response time.

Implement application-level checkpointing for sessions

If you have long running tasks, you may not want to rerun a task from the beginning in case of failure.

A good practice is to have a long running task that periodically persists its intermediate results, such as every 10 minutes, so that when the task is rerun by Symphony, it can continue from where the last intermediate results that were persisted.

You need a persistent shared location like a persistent shared data cache or a shared file system because a task may be rerun on a different machine than previously.

Once a task can persist its intermediate results, you can perform application-level checkpointing by suspending the session.

A service instance can get an interrupt event by calling serviceContext.getLastInterruptEvent(), and use a grace period to persist intermediate results in a persistent shared location. Later on, either when the whole suspended session is resumed, or then the unfinished task is redispatched, another service instance picks up the task, and restores the intermediate results from the shared location.