Thermostat
From IcedTea
Thermostat is an instrumentation tool for the Hotspot JVM, with support for monitoring multiple JVM instances on multiple hosts, optionally in a cloud environment.
- We want a tool that allows users of IcedTea/OpenJDK to monitor running JVMs, especially remote JVMs.
- Both high level (uptime, cpu and memory usage) and low level (including hotspot's perf data) information should be available through the tool.
- The tool should be usable in a production environment, with minimal overhead.
The name Thermostat is intended as a play on words - Thermostat is to Hotspot much as IcedTea is to Java. There are a number of ways of accessing information about the running Hotspot JVM. These include jstatd, JMX, JVMTI, and Systemtap. There is also information available from the host system that would be relevant to developers and administrators. Thermostat should provide a pluggable framework for using one or more of these information sources to monitor local or remote JVMs, in order to best allow us as well as other members of the Open Source Java community to extend the collection capability of the system.
It is worth noting that there is some overlap in terms of the information that several of these monitoring tools/APIs can provide.
OSGi Services
Thermostat exports a number of services:
Client
- com.redhat.thermostat.client.osgi.service.VMContextAction
- com.redhat.thermostat.client.osgi.service.ApplicationService
- com.redhat.thermostat.client.osgi.service.ContextAction
- com.redhat.thermostat.client.osgi.service.VmInformationService
- com.redhat.thermostat.service.process.UNIXProcessHandler
Database schema
database: thermostat
collections named as *-stats should contain data that often changes (or new data points are often added). Collections named as *-info should be fairly constant (and/or change only once in a long while).
collection: agent-config
Keeps the current agent configuration
{
agent-id: [uuid]
start-time: [timestamp]
backends: {
backend-name1: {
name: [human readable name]
description: [description]
version: [version]
active: [
new: [true|false] # monitor new vms
pids: [
[pid1]
[pid2]
....
]
]
key1: val1
}
}
}
collection: host-info
Contains static machine information
{
agent-id: [uuid]
hostname: [human readable name]
os_name: [distro]
os_kernel: [linux with version]
cpu_num: [number of cpus]
memory_total: [total memory]
}
collection: network-info
{
agent-id: [uuid]
iface: [interface]
ip4addr: [ip4addr]
ip6addr: [ip6addr]
}
collection: cpu-stats
{
agent-id: [uuid]
timestamp: [timestamp]
load: [15-load, 10-load, 5-load]
}
collection: memory-stats
{
agent-id: [uuid]
timestamp: [timestamp]
total: [total memory; long]
free: [long]
buffers: [long]
cached: [long]
swap-total: [long]
swap-free: [long]
commit-limit: [long]
}
collection: vm-info
Contains vm information
{
agent-id: [uuid]
vm-id: [uuid] # is this needed?
pid: [vm pid aka lvmid]
runtime-version: [jre version]
java-home: [path to java home]
main-class: [main class]
command-line: [command line]
vm-name: [hotspot server/client vm]
vm-arguments: [vm args]
vm-info: [midex mode/interpreted... etc]
vm-version: [a version]
environment: {
env-key1: env-val1
}
libraries: [
[lib1.so],
[lib2.so]
]
start-time: [timestamp]
stop-time: [timestamp]
}
Thoughts: 1. We need vm-id as separate from pid to keep things unique. We need to be careful with pids being reused. Perhaps it's not needed? If we do discard it, all references to vm-id mean the vm-pid (aka lvmid).
collection: vm-cpu-stats
{
agent-id: [uuid]
vm-id: [uuid]
processor-usage: [percentage]
}
collection: vm-memory-stats
{
agent-id: [uuid]
vm-id: [uuid]
timestamp: [timestamp]
allocated: [allocated]
free: [free]
eden: {
gen: new
collector: [collector]
capacity: [bytes]
max-capacity: [bytes]
used: [bytes]
}
s0: {
gen: new
collector: [collector]
capacity: [bytes]
max-capacity: [bytes]
used: [bytes]
}
s1: {
gen: new
collector: [collector]
capacity: [bytes]
max-capacity: [bytes]
used: [bytes]
}
old: {
gen: old
collector: [collector]
capacity: [bytes]
max-capacity: [bytes]
used: [bytes]
}
perm: {
gen: perm
collector: [None]
capacity: [bytes]
max-capacity: [bytes]
used: [bytes]
}
}
collection: vm-gc-stats
{
agent-id: [uuid]
vm-id: [uuid]
collector: [collector name]
run-count: [number of runs]
wall-time: [time in millis]
}
collection: io-disk-stats
collection: io-network-stats
collection: vm-io-stats
collection: vm-sync-stats
collection: vm-class-stats
collection: vm-method-traces
collection: vm-thread-dumps
Hostpost Notes
Memory Management
A good introduction is the Hotspot Memory Management Whitepaper
Generations
Hotspot currently has 3 generations: YoungGen, OldGen, and PermGen.
YounGen and OldGen may have different garbage collectors operating on them. PermGen uses the same garbage collector and is collected along with OldGen. PermGen will be going away in an OpenJDK7 Update release.
Garbage Collectors
There is a lot of inconsistency and old documentation around this area. This is an attempt to summarize it.
An overview is available at JVM Internals and Question of the Month - Garbage Collection Algorithms. See Java Hotspot Garbage Collection and Garbage Collection FAQ for more details.
A minor collection (or young generation collection) is a collection operation performed on just that generation. A major collection (or full collection) causes a collection to be performed on all generations. For a major collection, the specific YounGen collector is run on the YoungGen, and the OldGen collector is run on OldGen and PermGen).
Serial Collector (-XX:+UseSerialGC)
A serial stop-the-world-collector. Collects YoungGen (using a copying collector called Copy) and OldGen (using a mark-sweep-compact collector called MSC ).
Parallel Collector (-XX:+UseParallelGC)
A parallel stop-the-world collector. Collects YoungGen (a parallel version of SerialGC called PSScavenge). Has support for adapting heap sizes. If no OldGen collector is explicitly specified, uses PSMarkSweep.
Parallel Compacting Collector (-XX:+UseParallelOldGC)
Collects OldGen (using a collector named PSParallelCompact). Needs/Implies UseParallelGC.
Parallel Young Generation Collector (-XX:+UseParNewGC)
A parallel stop-the-world collector. Collects YoungGen (using a collector named PCopy). Similar to UseParallelGC, but does not support hotspot ergonomics. If no OldGen collector is specified, MSC is used.
Concurrent Mark Sweep Collector (+XX:UseConcMarkSweepGC)
Collects OldGen (in multiple phases - some stop-the-world, others concurrent - using a collector named CMS). Needs/Implies UseParNewGC. The -Xincgc option turns this on.
Garbage-First Garbage Collector (-XX:+UseG1GC)
Collects YoungGen (using a collector named G1 incremental collections) and OldGen (using a collector named G1 stop-the-world full collections). See the description for more information.

