Thermostat/IPCArchitecture

From IcedTea

Jump to: navigation, search

Contents

1 Motivation

Over the years of developing Thermostat, we discovered that there were problems that the Thermostat agent could not solve as a standalone process. Some of these problems arose from the desire to support system-wide, multi-user monitoring with the agent. One issue we discovered was the Java Attach API would not allow the root user to attach to JVMs owned by non-root users. This prevented the agent from obtaining the JMX service URL from these JVMs, which is required to establish a JMX connection. To solve this, we created a helper process for the agent that we called the "agent proxy". This process would drop root privileges to run as the same user of the target JVM and obtain the JMX service URL.

Later, in order to improve the security of the Thermostat agent, we moved the command channel server to a separate process. We had determined that we should minimize the parts of the agent that may run as a privileged user. This would lower the attack surface for the agent. If an unprivileged component were compromised due to a security vulnerability, it could cause much less damage. A key candidate for this privilege separation was the command channel. It creates the only network-facing listening port in the agent. We moved this port to an subprocess that would drop privileges if run as root. It then relayed information using a pipeline with the agent process.

We found this simple pipeline with stdout and stdin had a few drawbacks:

  • stdout/stdin had to be used exclusively for inter-process communication (IPC). This was very prone to breaking due to log messages, even logging from externally-loaded Java agents.
  • There are no message boundaries, making it difficult to determine where one communication sequence ends and another begins. This also makes error handling difficult, if we don't receive the data we expect.
  • We have to reimplement this communication code each time we add a new helper process.

Due to these drawbacks, we decided to create a reusable IPC service that new helper processes could use with minimal additional effort. The default implementation for the IPC service uses Unix sockets, which free up the helper process' stdin/stdout streams, and allow for an arbitrary number of independent communication streams. Additionally, the IPC service uses a messaging protocol that enforces boundaries in the data stream that ensures that data sent with one write operation is the same data that is read by one read operation.

2 Client/Server Model

The IPC service uses a client/server model for communication. The server side always refers to the Thermostat agent. The agent itself, and agent plugins can begin to use the IPC service through the AgentIPCService OSGi service. This service operates on implementation-independent IPC constructs we refer to as "IPC servers". In the Unix sockets implementation, and IPC server is backed by a Unix socket. An IPC server is identified by its name alone, no other implementation details are exposed through the API. Using the AgentIPCService, callers can create new IPC servers, determine if a server exists for a given name, and destroy servers.

When creating a new IPC server, the caller must provide a callback instance of ThermostatIPCCallbacks. When data is received by this server, the IPC service will asynchronously notify the caller via this callback. In addition to the message payload, the callback also provides the caller with the ability to asynchronously reply to the sender of the message. This functionality is encapsulated within the IPCMessage interface. IPC servers can also be created with an optional owner argument. This argument is a UserPrincipal, which should correspond to a user on the local system. IPC implementations can use this owner principal for access control to IPC servers. For example, the Unix sockets implementation creates socket files that can only be read and written by the corresponding local user. This is useful in the case where the agent is running as root, and the IPC client processes are running as other unprivileged users. By creating the IPC servers owned by the user running the IPC client, both the privileged server side and unprivileged client side will have full communication access.

The client side in this model refers to processes other than the Thermostat agent. These can often be small helper processes, such as the agent proxy or command channel server. Another example is the Byteman helper program that is injected into JVMs and provides Byteman rules with the capability to send metrics to Thermostat. We do not want to assume that these processes run an OSGi framework, so we provide a thermostat-ipc-client-common helper script. Launcher scripts can source this file, which defines a IPC_CLASSPATH shell variable containing a classpath with all necessary jars for the client-side IPC service API.

Client programs can access the IPC service using the ClientIPCService class. In order to obtain an instance of ClientIPCService, clients need to call ClientIPCServiceFactory.getIPCService(File ipcProperties). The parameter is a properties file used to communicate implementation-specific information between the server and client, such as the IPC implementation to use. In the Unix socket implementation, this properties file also contains the location of the socket files. The Thermostat agent and its plugins can easily determine the location of this file using AgentIPCService.getConfigurationFile(). It is the responsibility of the server to communicate this location to the client. For example, this can be done using a command line argument or system property passed when creating the client process.

Once a client has an instance of ClientIPCService, it can connect to an IPC server created by the server-side using ClientIPCService.connectToServer(String name). After connecting successfully, this method returns a channel of type IPCMessageChannel. Instead of the standard read/write methods, this channel has readMessage and writeMessage. The writeMessage method is used to send a ByteBuffer of data to the IPC server. When the method returns, the full ByteBuffer has been transmitted to the server. Clients can then follow up with a readMessage call, which waits for a reply from the server. The readMessage call returns a ByteBuffer containing the full message sent from the server. This works due to the messaging layer used by the IPC service that keeps track of message boundaries.

The diagram below illustrates the typical lifecycle and interaction between the server and client sides using the IPC service. The dotted line indicates that clients may send additional messages after receiving a reply. Image:ipc-client-server.png

3 Message Protocol

The IPC service uses "messages" as a high-level unit of communication. At the API level, these messages take the form of variable-length ByteBuffers. As outlined in the previous section, the server-side API for sending and receiving messages is asynchronous. The callback is not invoked until a full message has been received. The client-side API synchronous and blocking, whose methods do not return until a full message has been written/read. In both cases, the API ensures that only complete transmissions are presented to the API consumer. This bypasses the complications of read and write methods that do not fully read and write the entirety of what was sent by the other end. The details of how these messages are transmitted is up to the underlying implementation. The remainder of this section covers how the Unix socket implementation handles messages.

Note: As we create implementations for different IPC mechanisms, it might make sense to make some of the Unix socket code (such as message headers) available for all implementations to use.

The Unix socket implementation uses message headers to provide contextual information about the raw data it transmits. These headers allow the messaging layer to know properties such as the length of the data being transmitted, and if subsequent messages are part of same transmission. The following illustrates the layout of the message header:

Image:message-header.png

We start with a 4-byte magic number sequence that provides basic sanity testing for the header. If we receive what should be a new message that does not start with this sequence, it is an error. We then have a protocol version and header size (both Integers). The headers are designed to have variable length, and the protocol version indicates what fields to expect for this header. The header size indicates how many bytes to allocate for the header and gives a starting point for the message payload. This is followed by the message size (an Integer), which gives the length in bytes of the message payload that immediately follows the header. Finally, there is a moreData flag (a Boolean represented as one byte) which says whether the following message is a continuation of this message. This denotes a multi-part message.

Multi-part messages are messages that are too large and have been broken up into smaller messages by the messaging layer for easier transmission. The parts are then transmitted in sequential order. So long as the parts are received in the same order as sent, and the moreData flag is set, the other side can easily reconstruct the original message. The original, full message is then delivered at the API-level. The consumers of the API need only concern themselves with the full message on either end of the communication.

In order to prevent unusual cases where reconstructing multi-part messages consumes large amounts of agent memory, we impose size limits and throw an exception if the channel receives a message that exceeds this limit. This exception should not affect the function of other channels using the IPC service. There are a few different size limits used by the IPC service, which cover buffers which are allocated with arbitrary sizes. Since message headers can have a length specified with the header size field in the header, we set a limit on the maximum value we will accept. As mentioned, there is a size limit on the largest message we will reconstruct out of its parts. There is also a size limit on individual parts of a multi-part message. Messages larger than this will be broken into parts before transmission. Finally, we have a size limit for internal buffers which determine how much we can read at once from the underlying Unix socket. For details on the current size limits we use, see the MessageLimits class.

4 Extensibility

The IPC Service is designed to support multiple IPC mechanisms. Adding an implementation for a new mechanism should not require users of the IPC Service to make any modifications. The IPC implementation to use is determined at runtime from the IPC properties file. This file is automatically created by the Thermostat agent and is located at $USER_THERMOSTAT_HOME/etc/ipc.properties. This is a standard Java properties file, and contains a "type" key. By default this type key has the value "unixsocket", which tells the IPC Service to use the Unix socket implementation. By changing the type to another value, will cause the IPC Service to use a different implementation. In order for the IPC Service to recognize a new implementation, an entry must be added to the IPCType enum. The enum's constructor requires entries to also specify the string value to be used in the properties file (i.e. "unixsocket" for Unix sockets).

After adding a new type in IPCType, the next step is to provide an implementation of IPCPropertiesProvider. This provider class allows the client-side code to iterate through registered providers and only instantiate the IPCProperties subclass that matches the requested IPCType. The provider's getType() method should return the new IPCType just added, and its create method should instantiate and return IPCProperties. The returned IPCProperties object must be constructed with the new IPCType for its type argument. It may be useful to create a subclass of IPCProperties that can also contain data from properties specific to its IPC mechanism. For example, the Unix sockets implementation has a "unixsocket.dir" property where users can change the default location for the Unix socket files. By convention, all implementation-specific properties should prefix themselves with the corresponding "type" value. Note: The IPC properties file is copied at runtime to a world-readable location in order to allow servers and clients belonging to different system users to communicate. This file should not contain any sensitive data.

To integrate the new implementation into the Thermostat agent, it must register an OSGi service implementing ServerTransport. ServerTransport extends AgentIPCService to include additional start() and shutdown() lifecycle methods. For example, the Unix socket implementation uses these methods to set up, secure, and cleanup the Unix sockets directory it uses. The ServerTransport also has a getType() method, which should return the new IPCType created above. For the AgentIPCService methods, this is where the server-side API for the IPC Service is implemented. These are the operations dealing with creating and destroying IPC servers mentioned in the Client/Server Model section.

Similarly on the client-side, new implementations must provide a ClientTransport, which allows the client to connect to IPC servers. Since client programs are likely not running an OSGi framework, the IPC Service instead uses Java ServiceLoaders for extensibility on the client side. The new client-side implementation must provide a ClientTransportProvider, which is responsible for creating ClientTransport instances and returns the new IPCType from its getType() method. The new implementations of both ClientTransportProvider and IPCPropertiesProvider (mentioned above) must be registered using the ServiceLoader mechanism, by creating the appropriate provider configuration files under META-INF/services. For example, here are the provider configuration files for the Unix sockets implementation:

5 Implementations

5.1 Unix Sockets

While many details are given in the previous sections, there are a few additional details to mention not covered elsewhere. For the underlying native support for Unix sockets, we use the jnr-unixsocket Java library. This library, also used by JRuby, provides a complete SelectableChannel implementation using Unix sockets, along with a native Selector. This allowed us to create a scalable non-blocking selector in the agent that processes all server-side communication for the IPC Service.

In terms of security, one advantage of Unix sockets is that the file system enforces access control for us. At startup, we prepare a directory to use that will contain any Unix sockets created by the IPC Service. By default this directory is under java.io.tmpdir, which is likely /tmp on Linux systems. This directory is accessible by all users, but only writable by the user running the Thermostat agent.

drwxr-xr-x. 3 userA userA 60 Nov 18 17:33 /tmp/thermostat-socks/

Within this thermostat-socks directory, there is a subdirectory for each user that owns an IPC server. These owner-specific subdirectories contain the Unix sockets for the IPC servers belonging to that user. These directories are readable and writable only by that user.

drwx------. 2 userA userA 60 Nov 18 17:33 /tmp/thermostat-socks/userA/
srwxrwxr-x. 1 userA userA 0 Nov 18 17:33 /tmp/thermostat-socks/userA/sock-command-channel

If the Thermostat agent is running as root in order to monitor JVMs belonging to all users on the system, there may be multiple owner-specific subdirectories under the top-level sockets directory.

drwxr-xr-x. 4 root root 80 Nov 18 17:44 /tmp/thermostat-socks/
drwx------. 2 root root 60 Nov 18 17:44 /tmp/thermostat-socks/root/
srwxr-xr-x. 1 root root 0 Nov 18 17:44 /tmp/thermostat-socks/root/sock-agent-proxy-0

drwx------. 2 userA root 80 Nov 18 17:44 /tmp/thermostat-socks/userA/
srwxr-xr-x. 1 userA root 0 Nov 18 17:44 /tmp/thermostat-socks/userA/sock-command-channel

Before the IPC service creates a Unix socket for an IPC server, it first verifies that the parent directory for the socket is owned by the expected user and has the expected permissions set.

Personal tools