define command {
command_name check_nsvpn_status
command_line $USER1$/check_snmp -H $HOSTNAME$ -C $ARG1$ -o nsVpnMonP2State.$ARG2$ -l $ARG3$ -c 1:
}
export MIBS=ALL snmpwalk -v 1 -Cc -Os -c myrocommunity 1.1.1.1 netscreenVpnMonVpnNameWhich will return something like this:
nsVpnMonVpnName.0 = STRING: "IKE-CompanyA2CompanyB" nsVpnMonVpnName.1 = STRING: "IKE-CompanyA2CompanyC"From this we know that "IKE-CompanyA2CompanyB" is index 0 and "IKE-CompanyA2CompanyC" is index 1. If you want to walk the whole netscreenVpnMon just use that instead of netscreenVpnMonVpnName. There are several stats there that you might be interested in.
snmpget -v 1 -Os -c myrocommunity 1.1.1.1 nsVpnMonP2State.0 nsVpnMonP2State.0 = INTEGER: active(1)Thus we use check_snmp to see our tunnel state. You might notice that this is a check against the Phase 2 state of the tunnel. This only validates that the Phase 2 is built. Reachability THROUGH the tunnel is not validated.
If you don't want to worry about the MIB files then you can use the following OIDs:
define command {
command_name check_j2ee_container
command_line $USER1$/check_http -H $HOSTNAME$ -p $ARG1$ -u /Ping.do -s pong -c 5 -w 3
}
If you get REALLY crazy, you could have the page be a bit more dynamic and try to grab a database connection from the connection pool or follow through any web services you interact with inside the app. This pushes some work over to your developers but they aren't the ones getting called at 1AM because "MyApp is down".
We use times of 3 seconds for a warning and 5 for a critical. If the container is taking 3 seconds to server STATIC content, you can guess there is problem somewhere. This has helped us to catch many problems such as a container taking a heapdump or our load balancer overallocating to a single server.