gtpc1m2fTransmission Control Protocol/Internet Protocol

New Socket Options Supported

TCP/IP native stack support supports socket options on the setsockopt API call that were not previously supported. These new options give application programmers more capability and can improve application productivity. The new options are:

SO_RCVBUF
Sets the receive buffer size for the socket.

SO_RCVLOWAT
Sets the receive low-water mark for the socket.

SO_RCVTIMEO
Sets the receive timeout value for the socket.

SO_SNDBUF
Sets the send buffer size for the socket.

SO_SNDLOWAT
Sets the send low-water mark for the socket.

SO_SNDTIMEO
Sets the send timeout value for the socket.

Send Buffer and Receive Buffer Sizes

The SO_RCVBUF option sets the receive buffer size and the SO_SNDBUF option sets the send buffer size. These options allow you to limit the amount of IP message table (IPMT) storage used by a given socket and are a primary flow control mechanism.

For TCP sockets, each packet that is sent includes a window size indicating how much more data the remote end is allowed to send. The TPF system sets the window size to the amount of available space in the receive buffer of the socket. When the TPF application can process the data faster than it is sent, flow control is not really an issue. However, if the data arrives faster than the TPF application can process it, the rate at which the remote end sends data needs to be controlled.

Set the receive buffer size to a value just large enough that the TPF application has data to process, which means that when the TPF application issues a read API call, there is data available. The goal is to read the data fast enough so the next piece of data is always ready when the TPF application requests (reads) the next message. Reading data too fast can cause the IPMT to become full because many messages will be queued while waiting for the TPF application to process them.

For UDP sockets, controlling the send and receive buffer sizes is the only flow control mechanism available unless you include application-level acknowledgments into your application design.

Timeouts

Before TCP/IP native stack support, select was the only socket API call that had a timeout capability. For all other socket API calls, if the socket is running in blocking mode, control would not be returned to the application until the operation was completed successfully, which could take minutes or even hours on a read API call (for example, if the remote end has no data to send).

The SO_RCVTIMEO option defines the receive timeout value, which is how long the TPF system waits for certain socket API calls to be completed before the operation times out. The SO_RCVTIMEO value is used for socket API calls that are waiting for data to arrive. These include read, recv, recvfrom, activate_on_receipt, activate_on_receipt_with_length, accept, activate_on_accept, and connect. For example, assume the SO_RCVTIMEO value for a socket is 5 seconds. If a read API call is issued for the socket and no data arrives in 5 seconds, control will be passed back to the application with a return code indicating that the operation timed out.

The SO_SNDTIMEO option defines the send timeout value, which is how long the TPF system waits for send-type API calls to be completed before the operation times out. These include send, sendto, write, and writev. A send-type operation is blocked when there is not enough room in the send buffer of the socket to build the packets for the new data passed on this send-type API call. For TCP sockets, this can happen when you send data faster than the remote end can process it.

The default for both the SO_RCVTIMEO and SO_SNDTIMEO values is 0, which means do not time out. If your application does not change these values, the code will operate as it did before TCP/IP native stack support. If your application does change the SO_RCVTIMEO or SO_SNDTIMEO option to a nonzero value, the application must be prepared to get back a timeout return code.

Low-Water Marks

When a read API call is issued for a TCP socket, the application specifies the maximum amount of data to read. If, for example, the maximum amount of data to read is x, the application could receive x or fewer bytes of data on its read API call because TCP does not have any concept of a message. If the application wants exactly x bytes of data, it often has to issue multiple read calls because the data is received in multiple packets from the network. For example, the TPF application issues a read call specifying a maximum length of 10 000 bytes. The remote application sends 10 000 bytes of data; however, the data is sent as five 2000-byte packets. When the first packet arrives, the read call is completed and the application is passed 2000 bytes. The application must then issue another read call, specifying a maximum of 8000 bytes this time to read in the remaining data. Depending on the timing of when the packets arrive, the application might have to issue five read calls to read in all of the data.

The SO_RCVLOWAT option allows a TCP application to indicate the minimum amount of data to pass to the application. Using the same example, the application would set the SO_RCVLOWAT value to 10 000 and issue one read call. The TCP/IP native stack support code will wait for 10 000 bytes of data to arrive and then pass all 10 000 bytes to the application. This reduces the number of socket API calls issued.

A socket application uses the select for write API to see if a socket is writable. If there is at least 1 byte of available space in the send buffer of the socket, the socket is considered writable. If an application has a 4000-byte message to send, it really wants to know if there are at least 4000-bytes available in the send buffer of the socket. The SO_SNDLOWAT option allows the application to set the minimum amount of space that must be available in the send buffer before a select for write operation will consider the socket to be writable.

The default values for the SO_RCVLOWAT and SO_SNDLOWAT options are both set to 1. This means that if your application does not change these values, the code will operate as it did before TCP/IP native stack support.

activate_on_accept API

The new activate_on_accept API call is available for sockets that use TCP/IP native stack support. This API call performs the same function as the accept API call, but does so in a way that no ECBs are tied up while waiting for remote clients to be connected.

When a TCP server application issues an accept API call, the ECB is suspended until a remote client connects or until the accept operation times out (if the SO_RCVTIMEO option is enabled on the listener socket). In addition, many server designs have an ECB in a loop issuing accept, passing the connection to a new ECB and then the original ECB issues another accept call. This results in long-running ECBs. If you have many TCP server applications active, you can end up with many suspended ECBs as well, all waiting for the accept call to be completed successfully.

The activate_on_accept API function allows a TCP server application to indicate which TPF program to activate when a remote client connects. Control is always returned immediately to the ECB that issued activate_on_accept, allowing it to exit. While we are waiting for a remote client to be connected, there are no ECBs tied up in the TPF system. When a remote client does connect, a new ECB is created and the specified TPF program is activated.

The activate_on_accept API allows you to specify the I-stream on which to activate the TPF program in the new ECB. The default is 0, which means select the least busy I-stream. This enables you to load balance instances of your server application by using the TPF I-stream scheduler logic.