Thermostat/Architecture

From IcedTea

Jump to: navigation, search


Contents

1 Highlevel overview

The following deployment/component diagram depicts the basic components of Thermostat and how they fit together:

Image:Thermostat-architecture-high-level-overview.png

The Thermostat agents monitor the target JVMs and send the resulting data to the Storage component. There can be many agents running on different hosts, each monitoring several JVMs. On the other side, the clients connect to that same storage to retrieve the relevant data for display, analysis, etc. The Storage is provided by MongoDB, and can be a single database in the simplest case, or a whole cluster of distributed DBs.

2 Detailed description

Image:Thermostat-architecture-high-level-mongo.png

In this design, agents use a locally running mongos process to push their data, which is really a front-end to a cluster of database shards. The mongos instances (with the help of the config servers) determine which shard any particular piece of data is sent to. An agent has several backends, which correspond to distinct methods of collecting performance data.

This image does gloss over a few things worth mentioning. For one, all of the shard nodes (including those also running config servers) do communicate with each other (at the discretion of the config servers) as needed, migrating blocks to ensure balanced storage across the shard cluster. It will be necessary to make good decisions about shardking keys in order to minimize the system overhead from such migration. Second, the config servers themselves are a replica set of three; they are all interconnected and any mongos that is connecting to the cluster is actually configured to use all three (which of the three they talk to at any given time is a mongo implementation detail that we do not need to worry about). The mongo developers have optimized their replication protocol for the three-node case, and recommend that clusters use exactly three config servers for this reason. Finally, while we show mongos instances as connected only to the config servers, they use that connection to determine the cluster/shard configuration, and will when inserting into the database actually send data directly to the appropriate shard instance (and the converse when querying the database).


3 Thermostat Agent

Image:Thermostat-architecture-agent.png

The Agent uses backends for collecting data. Those backends use different ways to retrieve those data, e.g. system information, JVMStat, SystemTap, etc, and are pluggable. The agent then uses a Storage API (that is part of Thermostat) to send those data to the storage. We have a thin abstraction in place to ensure independence of the actual database. Right now we focus on the highly scalable and performant MongoDB.

4 Thermostat Client

Image:Thermostat-architecture-clients.png

The client itself consists of a typical model-view-controller triade. The central part is the controller, which uses the same Storage API (with the same storage abstraction) as the agent for retrieving the relevant data. Thus, in terms of MVC, the Storage API acts as a 'model'. On the other end, the controller uses the 'View' component (which is an abstract interface) to receive user events and display the result of those events (e.g. display performance data as diagram). It is at this level of abstraction that we can support various different types of clients: a command line interface, an Eclipse plugin, standalone application (using Swing or the Eclipse platform) or even a web client.

5 Scalability

Thermostat needs to be able to scale in a couple of respects:

   It needs to support the monitoring (diagnosing, etc) of many JVMs.
   The amount of generated data needs to be transmitted and stored in a timely manner.
   On the client side, the data (potentially large amounts of) needs to be processed & analyzed efficiently
   The UI of the client needs to be able to present the data in a useful way
Personal tools