|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Nagios was formerly known as Netsaint. It was written
by Ethan Galstad. It is a daemon written in C that was initially designed
to monitor networked hosts and services but then evoled into pretty
generic solution.
It has the ability to notify contacts
(via email, pager or other methods) when problems arise and are resolved.
Host and service checks are performed by external "plug-ins", making it
easy to write custom checks in your language of choice. Several CGIs are
included in order to allow you to view the current and historical status
via a Web browser, and a WAP interface is also provided to allow you to
acknowledge problems and disable notifications from an internet-ready cellphone.
Nagios prepackaged modules can monitors a pretty wide variety of system properties, including
standard system
performance metrics such as load average and free disk space; the presence
of important services like HTTP and SMTP; and per-host network availability
and reachability. It also allows the system administrator to define what
constitutes a significant event on each host--for example, how high a load
average is "too high"--and what to do when such conditions are detected.
By default invocation probes is performed on the local host althouth
ssh can be used to bypass this limitation. There is a possibility to
simulate telnet daemon
using Perl. To get data from remote hosts you need iether to
program this functionality into modules or use so called passive mode
In addition to detecting problems with hosts and their important services,
Nagios also allows the system administrator to specify what should be done
as a result. A problem can trigger an alert to be sent to a designated recipient
via various communication mechanisms (such as email, Unix message, pager).
It is also possible to define an event handler: a program that is
run when a problem is detected. Such programs can attempt to solve the problem
encountered, and they can also proactively prevent some serious problems
when they get triggered by warning conditions.
The information that Nagios collects is displayed in a series of automatically
generated Web pages. This format is quite convenient in that it allows a
system administrator to view network status information from various points
throughout the network.
The narrow column on the left of the display lists links to all of the
possible Nagios displays (the one for the current display has been highlighted
in the illustration). The Tactical Overview shows very general statistics
about the overall network status. In this case, 20 hosts are being monitored,
and 16 are currently up. Three hosts are down, and one is unreachable from
the monitoring system, presumably because the gateway to it is down. Of
the problems on the three hosts that are down, one has been acknowledged
by a system administrator. The display also indicates that there are three
services that have "critical" status (probably indicating a failure), and
two others are in a "warning" state.
Each of the problem indicator displays also functions as a link to another
Web page giving details about that particular item.
Figure 3 illustrates the detailed display that can be obtained for an
individual host (or device). Here we see some detailed information about
a host named leah. Once again, there are several sections to
the display. The host name and IP address appear in the upper left of the
display, along with an icon that the system administrator has assigned to
this host. Here, the icon suggests that the system's operating system is
some version of Windows; conventionally, icons are keyed to the operating
system type. The table in the upper right gives some overall uptime and
reachability statistics about the host over the period that the current
monitoring session has been running.
The table below the operating system icon, titled "Host State Information"
provides information about the current status of the host, including whether
or not it is up, how long it has been that way, when it was last checked,
and the command used to perform the check, and the settings of various configuration
parameters (such as host notifications and event handler).
The box titled "Host Commands" contains a series of links, which allow
the system administrator to perform many different monitoring-related actions
on this host. The various items are described in Table 1. Examining the
list will give you further details about Nagios' capabilities.
Table 1. Available actions in the Nagios Host Information display
| Item |
Meaning |
| Disable checks of this host |
Stop monitoring this host for availability. |
| Acknowledge this host problem |
Respond to a current problem (discussed below). |
| Disable notifications for this host |
Don't send alerts if this host is unavailable. |
| Delay next host notification |
Delay the next alert for host unavailability. |
| Schedule downtime for this host. Cancel scheduled downtime for
this host |
Define or cancel schedule downtime. During downtime, host unavailability
is not considered a problem |
| Disable notifications for all services on this host. Enable
notifications for all services on this host. |
Don't/do send alerts if a service on this host fails. |
| Schedule an immediate check of all services on this host |
Check all services as soon as possible (rather than waiting
for their next scheduled time). |
Disable checks of all services on this host
Enable checks of all services on this host
|
Disable or enable checking service health on this host. |
| Disable event handler for this host |
Prevent the event handler from running when a problem is detected
on this host. |
| Disable flap detection for this host |
Don't try to detect flaps (rapid up-down or on-off oscillations)
on this host or its services. |
The second menu item allows you to acknowledge any current problem. Acknowledging
simply means "I know about the problem, and it is being handled." Nagios
marks the corresponding event as such, and future alerts are suppressed
until the item returns to its normal state. This process also allows you
to enter a comment explaining the situation, an action that is helpful when
more than one administrator regularly examines the monitoring data.
If you don't like all of these table-oriented status displays, Nagios
also has the capability to use graphical ones. For example, Figure 4 illustrates
a map created for the small network being monitored here. The map is laid
out to indicate three separate groups of hosts, with host taurus
serving as a gateway between the group at the upper left and the ones at
the bottom of the window.
Much more complex network topologies can be represented in an analogous
way. See the Nagios Web
site for example screen shots.
Configuring Nagios
Initially, configuring Nagios can seem daunting, and there is a fair
amount of startup overhead to getting things going. But keep in mind that:
- It is not as hard or as time-consuming as it initially seems.
- It is well worth the effort.
Nagios uses the following configuration files:
- nagios.cfg: This is the main Nagios configuration
file, containing global settings for the package. It defines directory
locations for the package's various components, the user and group context
for the daemon, what items to log, log file rotation settings, various
time-outs and other performance-related settings, and additional items
related to some of the package's advanced features (such as enabling
event handling and defining global event handlers).
- Object configuration files: This class of files
specifies which hosts and services are monitored. In addition, they
can be used to define host and service test commands, host groups, alerts
and their recipients, event handlers, and other object-specific settings
used by Nagios.
- cgi.cfg: This file holds settings related to the
Nagios displays, including paths to Web page items and scripts, and
per-item icon and sound selections. The file also defines allowed access
to Nagios's data and commands.
- resource.cfg: This file defines macros that may
be used within other settings for clarity and security purposes, such
as to hide passwords from view in CGI programs.
The package provides sample starter versions of all of these file. We
will consider some aspects of these file types in the remainder of this
article.
Nagios configuration files are generally stored in /usr/local/nagios/etc
The nagios.cfg File
This configuration file contains directives that apply to the entire
Nagios monitoring system. Here is an annotated sample version illustrating
some of its most important features:
# File locations
log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
resource_file=/usr/local/nagios/etc/resource.cfg
lock_file=/usr/local/nagios/var/nagios.lock
...
The first part of the configuration file
specifies various file locations, including the general log file,
files holding service check command and notification and event handler
command definitions (checkcommands and misccommands).
Other cfg_file directives are used by the administrator
to specify the object definition files in use at that site (indicated
by the one in red). Locations for other types of files follow. The
lock file holds the PID of the current nagios process.
|
# Logging settings
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
use_syslog=1
log_host_retries=1
log_event_handlers=1
...
These directives specify logging settings,
including how often logs are rotated (here, daily), the archive
directory for old files, whether to log significant problems to
syslog as well, and whether to log individual event types.
|
# Global settings
nagios_user=nagios
nagios_group=nagios
date_format=us
admin_email=nagadmin
admin_pager=19995551212
These lines specify various global settings,
including the user/group as which the nagios daemon runs, the output
format for dates (here, US style), and the administrator's email
address. The final item sets the value of the $ADMINPAGER$
macro, which can be used in command definitions.
|
# Package-wide event handlers
enable_event_handlers=1
global_host_event_handler=global-event-command
global_service_event_handler=global-svc-command
Settings related to event handlers. You
can optionally define a single event handler for all host failures
and service failures in this file if appropriate. Commands are defined
in an object configuration file.
|
# Concurrent checks and time-outs
max_concurrent_checks=0
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
...
These directives control the number of
maximum checks that can be made at the same time (0 means an unlimited
number), as well as time-outs for various types of commands (values
in seconds).
|
# Retained status information
retain_state_information=1
retention_update_interval=60
use_retained_program_state=1
These lines tell Nagios to retain information
about host and service status between sessions, saving the values
every 60 seconds, and reloading them when the facility starts up.
|
# Passive service checks
accept_passive_service_checks=1
check_service_freshness=1
These directives enable "passive checks":
status data produced by external commands which Nagios imports periodically.
|
# Save Nagios data for later use
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
These directives allow you to save Nagios
data externally for long term analysis or other purposes. The commands
specified here must be defined in some object configuration file.
The simplest such command simply writes the command's output to
an external file: e.g., echo $OUTPUT$ >> file, but
you can perform whatever action is appropriate (e.g., send the data
to an RRDTool or other database).
|
Note that the directives appear in a slightly different order in the
sample nagios.cfg file provided with the package.
Object Configuration Files
The bulk of Nagios configuration occurs in the object configuration files.
These files define hosts and services to be monitored, how various status
conditions should be interpreted, and what actions should be taken when
they occur. These files are used to define the following items:
- Hosts: Computers and
other network devices
- Host Groups: Named
groups of hosts
- Services: Important
daemons providing specific network services
- Contacts: User to
be contacted in the event of a problem
- Contact Groups: Named
groups of contacts
- Time Periods: Day
and/or time ranges within a week, used to specify when checks are to
be performed, notifications are to be sent, and the like
- Commands: Commands
to be run for all purposes (host/service checking, notifications, event
handling, and so on). Nagios provides two files containing many predefined
commands: checkcommands.cfg and misccommands.cfg.
- Host Dependencies: Specifications of host reachability
dependencies. When an intermediate host is down, checks are skipped
for all hosts that are dependent on that one.
- Service Dependencies: Specifications of service
dependency requirements. When a service host is down, checks are skipped
for all other services that are dependent on it.
- Host Escalations: Definitions of optional escalation
levels for host problems
- Host Group Escalations: Definitions of optional
escalation levels for host groups
- Service Escalations: Definitions of optional escalation
levels for failed services
The items in red will need to be defined for virtually every Nagios installation;
the ones in black are optional. In the sample Nagios configuration provided
with the package, each type of object is defined in a separate configuration
file (named after the object type, excluding any spaces). However, you can
arrange your definitions in any form that makes sense to you.
Hosts and Host Groups
All of these items are defined via templates: named sets of attributes
and settings that can be easily applied to any number of actual objects.
For example, here is a template definition for hosts:
define host{
; Template name
name normal
; This is only a template (not a real host)
register 0
; Host notifications are enabled
notifications_enabled 1
; Command to check if host is available
check_command check-host-alive
; Recheck failures this many times
max_check_attempts
; Repeat failure notifications every 2 hours
notification_interval 120
; When to check (time period name)
notification_period 24x7
; Notify when down, unreachable and on recovery
notification_options d,u,r
; Host event handler is enabled
event_handler_enabled 1
; Event handler command (defined elsewhere)
event_handler host-eh
; Flap detection is disabled
flap_detection_enabled 0
; Save performance data
process_perf_data 1
; Save status information across restarts
retain_status_information 1
}
This template defines a variety of host-monitoring settings (which are
explained in the comments following the semicolons). Here is a host definition
that uses this template:
define host{
; Template on which to base host
use normal
; Note the attribute is not "name" as above
host_name beulah
; Longer description
alias beulah: SuSE 8.1
; IP address
address 192.168.1.44
; Overrides template value
max_check_attempts 8
}
Other hosts may be defined in a similar way. Host definitions themselves
can also be used as templates, provided that a name attribute
is included.
Once hosts have been defined, they may be placed into host groups via
directives like this one:
define hostgroup{
hostgroup_name bldg2
alias Building 2
contact_groups admins1
members beulah,callisto,ariadne,leah,lovelace,valley
}
This definition creates the host group named bldg2, consisting
of six hosts (all previously defined via define host directives). The
contact_groups attribute specifies who to send notifications
to, and it is defined elsewhere (as we'll see).
You can use as many host groups as you want to. Hosts can be part of
multiple host groups, and host groups themselves may be nested.
Services
Here are two service templates and a service definition:
define service{ ; Define defaults for all services
name generic
register 0
; Check service every 30 minutes
normal_check_interval 30
; Retry failing checks every 3 minutes, up to 5 times
retry_check_interval 3
max_check_attempts 5
event_handler_enabled 1
check_period 24x7
; Repeat notifications for failures every 2 hours
notification_interval 120
notification_period 6to22
; Notify contacts about critical failures/recoveries
notification_options c,r
notifications_enabled 1
contact_groups admins
}
define service{ ; Define the SMTP service
use generic
name generic-smtp
register 0
service_description Check SMTP
check_command check_smtp
event_handler eh_smtp
contact_groups mailadmins
}
define service{ ; Define services to be monitored
use generic-SMTP
; Monitor SMTP for all hosts in this host group
host_groups mailhosts
}
The first template (generic) defines some settings, which can
be applied to a variety of service types. The second template, generic-SMTP,
uses the first template as a starting point and adds to them in order to
create a generic SMTP monitoring service. Specifically, it defines a check
command, an event handler, and a contact group that are appropriate for
the SMTP service. The final define service stanza sets up SMTP monitoring
for all of the hosts in the mailhosts host group.
Contacts and Contact Groups
Here are two stanzas defining a contact and a contact group:
define contact{
contact_name nagadmin
alias Nagios Admin
; When to notify about service problems
service_notification_period 6to22
; When to notify about host problems
host_notification_period 24x7
; Notify on critical problems and recoveries
service_notification_options c,r
; Notify on host down and recoveries
host_notification_options d,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-epager
email nagios-admins@ahania.com
pager $ADMINPAGER$
}
define contactgroup{
contactgroup_name mailadmins
alias Mail Admins
members mailadm,chavez,catfemme
}
The first stanza defines a contact named nagadmin. It also
defines what events to notify this contact about and the time periods during
which notifications should be sent. The commands to use to generate the
alerts are also specified, along with arguments to them (see below).
Time Periods
Time period definitions are quite simple. Here are the definitions of
the two time periods we have used so far:
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
define timeperiod{
timeperiod_name 6to22
alias Weekdays, 6 AM to 10 PM
Monday 06:00-22:00
Tuesday 06:00-22:00
Wednesday 06:00-22:00
Thursday 06:00-22:00
Friday 06:00-22:00
}
Note that only the applicable days need be included in the definition.
Commands
The commands referred to in many of the preceding object definitions
also must be defined. For example, here is the SMTP service check command
definition:
define command{
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$
}
This command runs the check_smtp script stored in the directory
defined in the macro $USER1$ (defined in the resource.cfg
file--see below); this macro conventionally holds the path to the Nagios
plug-ins directory. The command is passed the option -H, followed
by the IP address of the host to be checked (the latter is expanded from
the built-in $HOSTADDRESS$ macro).
You can determine the syntax for any plug-in by running it with the
--help option. You can also extend Nagios by adding custom
plug-ins of your own. See the documentation for details on how to accomplish
this.
Event handers are defined in the same way, as in this example:
define command{
command_name eh_smtp
command_line /usr/local/nagios/eh/fix_mail $HOSTADDRESS$ $STATETYPE$
}
Here, we define the command named eh_smtp. It specifies
the full path to a program to run, passing two arguments: the host's IP
address and the value of the $STATETYPE$ macro. This item is
set to HARD for critical failures and SOFT for warnings.
Here are the definitions of commands used for notifications (we've wrapped
the command_line setting for clarity):
define command{
command_name notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios 1.0 *****\n\n
Notification Type: $NOTIFICATIONTYPE$\n\n
Service: $SERVICEDESC$\n
Host: $HOSTALIAS$\n
Address: $HOSTADDRESS$\n
State: $SERVICESTATE$\n\n
Date/Time: $DATETIME$\n\n
Additional Info:\n\n$OUTPUT$" |
/usr/bin/mail -s "** $NOTIFICATIONTYPE$
alert - $HOSTALIAS$/$SERVICEDESC$
is $SERVICESTATE$ **" $CONTACTEMAIL$
}
This command constructs a simple email message using the printf
command and many built-in Nagios macros. It then sends the message using
the mail command, specifying the recipient as the $CONTACTEMAIL$
macro. The latter contains the value of the corresponding email
attribute for the host or service that is generating the alert.
The cgi.cfg File
The cgi.cfg configuration file has several different functions
with the Nagios system. Among the most important is authentication, allowing
Nagios and its data to be restricted to appropriate people. Here are some
sample directives related to authorization:
use_authentication=1
authorized_for_configuration_information=netsaintadmin,root,chavez
authorized_for_all_services=netsaintadmin,root,chavez,maresca
The first entry enables the access control mechanism. The next two entries
specify users who are allowed to view Nagios configuration information and
services status information (respectively). Note that all users also must
be authenticated to the Web server using the usual Apache htpasswd
mechanism.
This same configuration file is also used to store settings for icon-based
status displays, as in these examples:
hostextinfo[janine]=;redhat.gif;;redhat.gd2;;168,36;,,;
hostextinfo[ishtar]=;apple.gif;;apple.gd2;;125,36;,,;
These entries specify extended attributes for the hosts defined in the
entries labeled janine and ishtar. The filenames in this
example specify images files for the host in status tables (GIF format--see
Figure 3) and in the status map (GD2 format), and the two numeric values
specify the device's location--for example, x and y coordinates--within
the 2D status map. (Figure 4 provides an example status map display).
The resource.cfg File
The final configuration file we will consider is the resource.cfg
file. It is used to define site-specific macros, conventionally named
$USER1$ through $USER32$:
# $USER1$ = path to plugins directory
$USER1$=/usr/lib/nagiosplugins
...
# Store a username and password (hidden)
$USER3$=administrator
$USER4$=somepassword
The first macros defines the path to the Nagios plug-ins directory; this
usage is assumed by the supplied sample configuration files.
The other two macros are used in this case to store a username and password.
These items can be used in command definitions for added security. The
resource.cfg file itself can be protected against all non-root
access without compromising the ability of CGI programs to run successfully.
Pre-Checking a Nagios Configuration
Since Nagios configuration is somewhat involved, the package provides
a command that can be used to verify it prior to running the program. Here
is an example of its use:
# cd /usr/local/nagios/etc
# /usr/local/nagios/bin/nagios -v nagios.cfg
This will check the Nagios configuration, which uses nagios.cfg
as its main configuration file.
Nagiosgraph is an add-on for Nagios. It collects
service perfdata in RRD format, and displays the
resulting graphs via CGI.
September 11th, 2008 by
Mike Diehl in
I've created a Nagios script that monitors the Help Desk voicemailbox
and sets a service alert if there are any critical alerts in Nagios.
I've also written a script that can call me, perhaps on my cellphone,
in the event of a service outage. With these two scripts in place, I
can get a call on my cellphone any time someone calls my Help Desk and
leaves a message. I can also get a call if any of my monitored services
fail. Theoretically, I can be at a park playing with my boys and know
that my servers are happy... until the cellphone rings.
I understand that I have kind of a unique situation, but the same
concept is applicable in a business production environment, so lets
get down to looking at code.
First, let's talk about the Help Desk monitoring script. Essentially,
this script checks to see if there are any files in the INBOX in the
Help Desk mailbox. Here is the code:
#!/usr/bin/perl -w
local *DIR;
my ($file, $error);
$error = 0;
opendir DIR, "/var/spool/asterisk/voicemail/customers/611/INBOX/"
or die("Error: Permission denied\n");
while ($file = readdir(DIR)) {
if ($file eq ".") { next; }
if ($file eq "..") { next; }
$error++;
}
$error = $error/4;
if (!$error) {
print "OK\n";
exit 0;
} else {
print "CRITICAL: $error\n";
}
exit 2;
Of course, you need to make sure that the Nagios user has access
to the Asterisk voicemailbox, but that can be taken care of by setting
the script set-uid. The script, as you can see, is pretty simple. If
there are any other files in the directory, the script assumes that
there is a voicemail and sends a CRITICAL alert to Nagios. Otherwise
everything is OK.
To enable Nagios to use this check script, we need to define it in
checkcommands.cfg. Here is the definition, I used:
define command{
command_name check_help
command_line /etc/nagios/local/check_611.pl
}
Now, I can refer to the check_help check script in the services.cfg
file. Here's how I did it:
define service {
use generic-service
name Help_Desk
host_name my_server
service_description Help Desk Voicemail
check_command check_help
register 1
}
With this configuration in place, Nagios can indicate an alarm any
time there is voicemail in the Help Desk mailbox. But that's only half
of what I promised to write about. The next script allows Nagios to
call me to let me know that I've got a fire to put out. Here is that
script:
#!/usr/bin/perl
foreach $main::phone ("15055551234") {
$main::call = <
MaxRetries: 0
RetryTime: 1
WaitTime: 120
Account: Enterprise
Context: apps
Extension: OUTAGE
Priority: 1
EOF
;
open FILE, ">/tmp/outage.call";
print FILE $main::call;
close FILE;
system("mv /tmp/outage.call /var/spool/asterisk/outgoing");
}
As you can see, this script isn't complicated, either. It simply
creates an Asterisk “call file” and puts it in Asterisk's outgoing spool
directory. The script is capable of calling multiple numbers... just
in case. It's important to that the call file be created in another
directory and moved into the spool directory. Otherwise bad things can
happen if Asterisk tries to read the file while the script is still
writing it.
Obviously this script relies on some configuration in the Asterisk
dial plan. Here is the relevant part of the dial plan:
exten => OUTAGE,1,answer
exten => OUTAGE,2,playback(/etc/asterisk/sounds/OUTAGE)
exten => OUTAGE,3,hangup
At this point, you're probably realizing that I'm not doing anything
complicated. All that is needed from Asterisk's point of view is an
audio message in /etc/asterisk/sounds/OUTAGE (.wav or .au) that indicates
that something is on fire. Asterisk will select the most reasonable
file extension and play the file when the call is answered.
So all that is left to do is configure Nagios to use this notification
method. This is configured in the misccommands.cfg file. Here is how
I did it:
# 'notify_by_phone' command definition
define command{
command_name notify_by_phone
command_line /etc/nagios/local/notify_by_phone.pl
}
Now that all of the configuration is done, we restart Nagios and
reload the Asterisk dial plan. To do this, we type “/etc/init.d/nagios
restart” at the command line and “extensions reload” at the Asterisk
console.
So now, anytime I have voicemail at the Help Desk, it's indicated
in the Nagios monitoring screen as a critical alert. Also, anytime any
of my servers or services are unavailable, I can get a phone call on
either my home phone or my cell phone. This means that my customers
don't HAVE to have those phone numbers and I can still provide quality
service to them.
Now I realize that I have a unique situation, but I hope that this
article serves as an example of how to create custom Nagios service
checks and notifications, as well as hinting at some of the integration
options available in Asterisk.
__________________________
Mike Diehl is a recently self-employed Computer Nerd and lives in
Albuquerque, NM. with his wife and 3 sons. He can be reached at mdiehl@diehlnet.com
By Wojciech Kocjan on November
20, 2008 (8:00:00 PM)
System monitoring tool
Nagios
offers a powerful mechanism for receiving events and commands from
external applications. External commands are usually sent from event
handlers or from the Nagios Web interface. You will find external
commands most useful when writing event handlers for your system,
or when writing an external application that interacts with Nagios.
This article is excerpted from the newly published book
Learning Nagios 3.0 from
Packt Publishing.The external commands pipe is a pipe
file created on a filesystem that Nagios uses to receive incoming
messages. The communication does not use any authentication or authorization
-- the only requirement is to have write access to the pipe file,
rw/nagios.cmd, which is located in the directory passed as the localstatedir
option during compilation.
An external command file is usually writable by the owner and
the group; the usual group used is nagioscmd. If you want a user
to be able to send commands to the Nagios daemon, simply add that
user to this group.
A small limitation of the command pipe is that there is no way
to get any results back, so it is not possible to send any query
commands to Nagios. Therefore, by just using the command pipe, you
have no verification that the command you have passed to Nagios
has been processed, or will be processed soon. It is, however, possible
to read the Nagios log file and check whether it indicates that
the command has been parsed correctly.
The Nagios Web interface uses an external command pipe to control
how Nagios works. The Web interface does not use any other means
to send commands or apply changes to Nagios.
From the Nagios daemon perspective, there is no clear distinction
as to who can perform what operations. Therefore, if you plan to
use the external command pipe to allow users to submit commands
remotely, you need to make sure that authorization is in place so
that unauthorized users cannot send potentially dangerous commands
to Nagios.
The syntax for formatting commands is easy. Each command must
be placed on a single line and end with a newline character. The
syntax is as follows:
[TIMESTAMP] COMMAND_NAME;argument1;argument2;...;argumentN
TIMESTAMP is written as Unix time -- that is, the number of seconds
since 1970-01-01 00:00:00. You can create this by using the date
command. Most programming languages also offer the means to get
the current Unix time.
Commands are written in upper case. The arguments depend on the
actual command. For example, to add a comment to a host stating
that it has passed a security audit, you can use the following shell
command:
echo "['date +%s'] ADD_HOST_COMMENT;somehost;1;Security Audit;
This host has passed security audit on 'date +%Y-%m-%d'" >/var/nagios/rw/nagios.cmd
This will send an
ADD_HOST_COMMENT command to Nagios over the external command
pipe. Nagios will then add a comment to the host, somehost, stating
that the comment originated from Security Audit. The first argument
specifies the host name to add the comment to; the second tells
Nagios if this comment should be persistent. The next argument describes
the author of the comment, and the last argument specifies the actual
comment text.
Similarly, adding a comment to a service requires the use of
the
ADD_SVC_COMMENT command. The command's syntax is similar to
that of the ADD_HOST_COMMENT command except that the command requires
the specification of the host name and service name.
You can also delete a single comment or all comments using the
DEL_HOST_ COMMENT,
DEL_ALL_HOST_COMMENTS, and
DEL_SVC_COMMENT or
DEL_ALL_SVC_COMMENTS commands.
Other commands worth mentioning are related to scheduling checks
on demand. Often, it is necessary to request that a check be carried
out as soon as possible; for example, when testing a solution.
You can create a script that schedules a check of a host, all
services on that host, and a service on a different host, as follows:
#!/bin/sh NOW='date +%s' echo "[$NOW] SCHEDULE_HOST_CHECK;somehost;$NOW"
\ >/var/nagios/rw/nagios.cmd echo "[$NOW] SCHEDULE_HOST_SVC_CHECKS;somehost;$NOW"
\ >/var/nagios/rw/nagios.cmd echo "[$NOW] SCHEDULE_SVC_CHECK;otherhost;Service
Name;$NOW" \ >/var/nagios/rw/nagios.cmd exit 0
The commands
SCHEDULE_HOST_CHECK and
SCHEDULE_HOST_SVC_CHECKS accept a host name and the time at
which the check should be scheduled. The
SCHEDULE_SVC_CHECK command requires the specification of a service
description as well as the name of the host to schedule the check
on.
Normal scheduled checks, such as the ones scheduled above, might
not actually take place at the time that you scheduled them. Nagios
also needs to take allowed time periods into account as well as
checking whether checks were disabled for a particular object or
globally for the entire Nagios.
There are cases when you'll need to force Nagios to do a check
-- in such cases, you should use
SCHEDULE_FORCED_HOST_CHECK,
SCHEDULE_FORCED_HOST_SVC_CHECKS, and
SCHEDULE_FORCED_SVC_CHECK commands. They work in exactly the
same way as described above, but make Nagios skip the checking of
time periods, and ensure that the checks are disabled for this particular
object. This way, a check will always be performed, regardless of
other Nagios parameters.
Other commands worth using are related to custom variables, introduced
in Nagios 3. When you define a custom variable for a host, service,
or contact, you can change its value on the file with the external
command pipe.
As these variables can then be directly used by check or notification
commands and event handlers, it is possible to make other applications
or event handlers change these attributes directly without modifications
to the configuration files.
How might this work? Suppose that the IT staff registers its
presence via an application without any GUI. This application periodically
sends information about the latest known IP address, and that information
is then passed to Nagios assuming that the person is in the office.
This would later be sent to a notification command to use that specific
IP address while sending a message to the user.
Assuming that the user name is jdoe and the custom variable name
is DESKTOPIP, the message that would be sent to the Nagios external
command pipe would be as follows:
[1206096000] CHANGE_CUSTOM_CONTACT_VAR;jdoe;DESKTOPIP;12.34.56.78
This would cause a subsequent use of $_CONTACTDESKTOPIP$ to return
a value of 12.34.56.78.
Nagios offers the
CHANGE_CUSTOM_CONTACT_VAR,
CHANGE_CUSTOM_HOST_VAR, and
CHANGE_CUSTOM_ SVC_VAR commands for modifying custom variables
in contacts, hosts, and services.
The commands explained above are just a small subset of the full
capabilities of the Nagios external command pipe. For a complete
list of commands, visit
the External Command List.
Posted by
philcore
on Mon 28 Nov 2005 at 12:23
Nagios is a powerful, modular network monitoring system that can
be used to monitor many network services like smtp, http and dns on
remote hosts. It also has support for snmp to allow you to check things
like processor loads on routers and servers. I couldn't begin to cover
all of the things that nagios can do in this article, so I'll just cover
the basics to get you up and running.
apt-get install nagios-text
First we need to define people that will be notified, and define how
they should be notified. In the example below, I define two users, joe
and paul. Joe is the network guru and cares about routers and switches.
Paul is the systems guy, and he cares about servers. Both will be notified
via email and by pager. Note that if you are going to monitor your email
server, you will want to use another notification method besides email.
If your email server is down, you can't send anybody an email to notify
them! :) In that case you will want to use a pager server to send a
text message to a phone or pager, or set up a second nagios monitor
that uses a different mail server to send email.
Edit /etc/nagios/contacts.cfg and add the following users:
define contact{
contact_name joe
alias Joe Blow
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-pager
host_notification_commands host-notify-by-email,host-notify-by-epager
email joe@yourdomain.com
pager 5555555@pager.yourdomain.com
}
define contact{
contact_name paul
alias Paul Shiznit
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-epager
host_notification_commands host-notify-by-email,host-notify-by-epager
email paul@yourdomain.com
pager 5556666@pager.yourdomain.com
}
Now add the users to groups.
In /etc/nagios/contactgroups.cfg add the following:
define contactgroup{
contactgroup_name router_admin
alias Network Administrators
members joe
}
define contactgroup{
contactgroup_name server_admin
alias Systems Administrators
members paul
}
You can add multiple members to a contact group by listing comma separated
users.
Now to define some hosts to monitor. For my example, I define two
machines, a mail server and a router.
Edit /etc/nagios/hosts.cfg and add:
define host{
use generic-host
host_name gw1.yourdomain.com
alias Gateway Router
address 10.0.0.1
check_command check-host-alive
max_check_attempts 20
notification_interval 240
notification_period 24x7
notification_options d,u,r
}
define host{
use generic-host
host_name mail.yourdomain.com
alias Mail Server
address 10.0.0.100
check_command check-host-alive
max_check_attempts 20
notification_interval 240
notification_period 24x7
notification_options d,u,r
}
Now we add the hosts to groups. I define groups called 'routers' and
'servers' and add the router and mail server respectively.
Edit /etc/nagios/hostgroups.cfg
define hostgroup{
hostgroup_name routers
alias Routers
contact_groups router_admin
members gw1.yourdomain.com
}
define hostgroup{
hostgroup_name servers
alias Servers
contact_groups server_admin
members mail.yourdomain.com
}
Again, for multiple members, just use a comma separated list of hosts.
Next define services to monitor on each of the hosts. Nagios has
many built-in plugins for monitoring. On a debian sarge system, they
are stored in /usr/lib/nagios/plugins. Here we want to monitor the smtp
service on the mail server, and do ping checks on the router.
Edit /etc/nagios/services.cfg
define service{
use generic-service
host_name mail.yourdomain.com
service_description SMTP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups server_admin
notification_interval 240
notification_period 24x7
notification_options w,u,c,r
check_command check_smtp
}
define service{
use generic-service
host_name gw1.yourdomain.com
service_description PING
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups router_admin
notification_interval 240
notification_period 24x7
notification_options w,u,c,r
check_command check_ping!100.0,20%!500.0,60%
}
And that's it. To test your configurations, you can run
nagios -v /etc/nagios/nagios.cfg
If all is well we can restart nagios and move on to the apache side
to get a visual view of the monitor.
/etc/init.d/nagios restart
Assuming you have a working apache install, you can add the apache.conf
file included in the nagios package to set up the nagios cgi administration
interface. The web interface is not required to run nagios, but it is
definitely worth setting it up. The simplest way to get it up and running
is to copy the supplied conf file over to our apache installation. On
my system, I'm running apache2. Systems running apache 1.3.xx will have
slightly different setups.
cp /etc/nagios/apache.conf /etc/apache2/sites-enabled/nagios
Of course you may want to set it up as a virtual server, but I leave
that as an exercise for the reader. Now you will want to set up an allowed
user to view the cgi interface. By default, nagios issues full administrative
access to the nagiosadmin user. Nagios uses apache htpasswd style authentication.
So here we add a user and password to the default nagios htpasswd file.
Here we add the user nagiosadmin with password mypassword to the nagios
htpasswd file.
htpasswd2 -nb nagiosadmin mypassword >> /etc/nagios/htpasswd.users
You should now be able to restart apache and logon to
http://your.nagios.server/nagios
Nagios is a very powerful tool for monitoring networks. I've only
touched on the basics here, but it should be enough to get you up and
running. Hopefully, once you do, you'll start experimenting with all
the cool features and plugins that are available. The documentation
included in the cgi interface is very detailed and helpful.
The author uses Perl for the plug-in
Linux JournalA while back, I wrote an article for Linux Journal's
web edition entitled
“Howto be a good (and lazy) System Administrator.” A couple astute
readers, after reading the article, asked if I was familiar with the
Nagios monitoring system, and I am. I've been using Nagios for a few
years now.
I had intended to write this article as a How-to on getting Nagios
configured and running for the first time. However, it turns out that
the documentation that comes with Nagios is really pretty good. And
even if you do have problems, and I did, the user community is also
quite responsive. So, rather than beating a dead horse, (with sympathy
to horse lovers) I decided to continue the Good and Lazy Administrator
Theme and discuss extending Nagios with custom service checks and custom
notifications.
Nagios uses a plug-in mechanism to implement all of it's server and
service checks as well as all of it's notifications. This is good news
for hackers, as it allows us to build new functionality that either
no one else has though of, or has need of. I wrote a couple scripts
for my Nagios system. One does a custom service check to see if I have
voicemail waiting for me at the Help Desk, and the other does a custom
notification by telephone. Before I go on, I should give a little bit
of background.
Perl plugin: check_logfiles is a plugin for Nagios which checks logfiles
for defined patterns
check_logfiles is a plugin for Nagios which checks logfiles for defined
patterns. It is capable of detecting logfile rotation. If you tell it
how the rotated archives look, it will also examine these files. Unlike
check_logfiles, traditional logfile plugins were not aware of the gap
which could occur, so under some circumstances they ignored what had
happened between their checks. A configuration file is used to specify
where to search, what to search, and what to do if a matching line is
found.
About: Nagstamon is a Nagios status monitor with a UI that
resides in the GNOME systray or on the Windows desktop. It informs you
in realtime about the status of your Nagios monitored network.
Changes: This release fixes a problem with passwords containing
special characters, and an issue where it omitted showing failed services
on hosts in scheduled downtime.
About: check_oracle_health is a plugin for the Nagios monitoring
software that allows you to monitor various metrics of an Oracle database.
It includes connection time, SGA data buffer hit ratio, SGA library
cache hit ratio, SGA dictionary cache hit ratio, SGA shared pool free,
PGA in memory sort ratio, tablespace usage, tablespace fragmentation,
tablespace I/O balance, invalid objects, and many more.
Release focus: Major feature enhancements
Changes: The tablespace-usage mode now takes into account
when tablespaces use autoextents. The data-buffer/library/dictionary-cache-hitratio
are now more accurate. Sqlplus can now be used instead of DBD::Oracle.
About: check_lm_sensors is a Nagios plugin to monitor the
values of on-board sensors and hard disk temperatures on Linux systems.
Changes: The plugin now uses the standard Nagios::Plugin CPAN
classes, fixing issues with embedded perl.
Perl plugin: check_logfiles is a plugin for Nagios which checks logfiles
for defined patterns
check_logfiles 2.3.3 (Default)
Added: Sun, Mar 12th 2006 15:09 PDT (2
years, 1 month ago)
Updated: Tue, May 6th 2008 10:37 PDT (today)
About:
check_logfiles is a plugin for Nagios which checks logfiles for defined
patterns. It is capable of detecting logfile rotation. If you tell it
how the rotated archives look, it will also examine these files. Unlike
check_logfiles, traditional logfile plugins were not aware of the gap
which could occur, so under some circumstances they ignored what had
happened between their checks. A configuration file is used to specify
where to search, what to search, and what to do if a matching line is
found.
A short, superficial intro book (190).
Killing phaze from the review below: "This is the book you should
pass to your manager so (s)he understands why and how an open solution like
Nagios is the better choice and can be used for achieving surpassing solutions.
"
Warning: Several reviews of
this book looks like plants: written by the readers who has a single networking
book review or just a single review.
Spot on for a well structured book with many WOW-factors,
May 17, 2007 By
Nils Valentin (Tokyo, Japan)
-
See all my reviews
--- DISCLAIMER: This is a requested review by PTR, however any opinions
expressed within the review are my personal ones. ---
Introduction - 6p
CHAPTER 1 Best Practices - 12p
CHAPTER 2 Theory of Operations - 26p
CHAPTER 3 Installing Nagios - 11p
CHAPTER 4 Configuring Nagios - 23p
CHAPTER 5 Bootstrapping the Configs - 10p
CHAPTER 6 Watching - 46p
CHAPTER 7 Visualization - 42p
CHAPTER 8 Nagios Event Broker Interface - 19p
APPENDIX A Configure Options - 3p
APPENDIX B nagios.cfg and cgi.cfg - 9p
APPENDIX C Command-Line Options - 10p
Index - 14p
The book is with 190 pages (230p. when including appendix and
index) very compact. It teaches you Nagios in a way I have never
heard / read before. I must assume that the authors clear structured
style - which runs through the book like a red line - must be responsible
for the excellent outcome.
The book starts in the introduction with the title "Do it right the
first time" and that hits it right on the spot. What make out the features
of this little portable knowledgebase is the exceptional well thought
through contents and its explanations by the author. David is not filling
pages by explaining each and every parameter, but rather showing you
the big picture, and explaining how to approach new issues or how one
technical solution is better over another.
This is the book you should pass to your
manager so (s)he understands why and how an open solution like Nagios
is the better choice and can be used for achieving surpassing solutions.
The book itself basically is divided in two sections:
Background, setup and configuration - Chapters 1-5
Advanced Topics - Chapters 6-8
I did find any of the chapters to have a nice balance of the amount
of information needed but some EXCEPTIONAL good parts of book where:
Chapter 1 Best practices
Chapter 2 - the part about scheduling
Chapters 6-8 as a whole
Chapter 6 has a thorough explanations on monitoring the different OS's
(especially the Windows part !!) or other applications.
Chapter 7 for its overall thoroughness of how to visualize your data
to reach the next level of a better understanding of the systems / network
you are monitoring.
Chapter 8 is describing a filesystem based status interface. The NEB
module will write a file with its current status code for each service.
I have to admit that some technical details went over my head, but I
thought that was pretty cool !!
The featured points above is what I found to be exceptionally good and
most likely the strongest sales points for this little portable knowledgebase.
That doesnt mean that the other not mentioned parts of the book are
weak, mind you.
Funny enough the above mentioned points where EXACTLY the points which
I haven't seen explained this thorough anywhere before.
So David's book was exactly spot on for me.
Summary:
To sum it all up in very simple words: This is a hell of a book !!
Its the most compact, well structured book on Nagios that I have seen
to date. It contains many WOW-factors. While reading each chapter you
can virtually "feel" how Davids explanations and tips and tricks already
helped you to avoid time consuming pitfalls.
So this book is not about "to buy or not to buy", this is an investment
you dont want to miss !!
I was especially impressed by the thoroughness the book is written by
from the first page. Also the contents of the first chapter wasnt new
to me, the way it was explained already provided many of those A-ha
moments.
The main asset of the book is not the description of the tools itself,
but rather the tought and considerations the author put into it and
the sharing of those thoughts in a way that the reader can actually
visualize how and why one solution is better over another, without actually
having to go to the "luxury to experience the pitfalls" in a live disaster
scenario.
PS: AFTER I finished reading the book I re-read the "Editorial Review"
Amazon gave above and found it pretty well describing the actual book
and what you should expect.
>> You can find more reviews on Nagios related books including a comparison
by deploying my profile. <<
With the Nagios Looking Glass (NLG) tool, developer Andy Shellam
has tried to resolve a common problem for network administrators running
Nagios. What happens if you want to provide access to up-to-date information
from Nagios without giving users access to the full Nagios console?
Providing read-only access to the Nagios console can be complicated,
and can occasionally require network re-structuring or can even pose
a security risk.
NLG is designed to fix those issues by taking a feed from Nagios
status data via an HTTP connection and displaying it on a public Web
server. It works in a client-server model with a PHP-based polling server
installed on your Nagios server. A receiver client, also PHP-based,
is installed on your Web server. If you want to use NLG locally, you
can also run the client and the server together on your Nagios server.
The receiver client creates an AJAX-enabled page based on a template.
You can also customize this template to display whatever you require.
You can see a demo of NLG at http://looking-glass.andyshellam.eu/demo/.
02.05.2007
Nagios also comes with a Web-based console, extensible Nagios Event
Broker (NEB), that allows you to integrate Nagios with other tools,
like database back-ends, and a large collection of monitoring commands
and capabilities. It's current release, version 2.0, is stable and production
ready. You can take a look at Nagios at http://www.nagios.com.
Development of Nagios has not stopped with version 2.0, though. Nagios'
principal developer, Ethan Galstad, has recently released some information
on the status and potential features of the next release, version 3.0.
Galstad's announcement also suggests an alpha release of version 3.0
could be scheduled as early as the end of February 2007.
Features: What's new in Nagios 3.0
So what's new with version 3.0? Well, a lot. Let's walk through the
major new features and look at how some of Nagios' old features have
been expanded or changed.
One of the interesting features introduced in Nagios 2.0 was adaptive
monitoring. Adaptive monitoring allowed a Nagios configuration to be
changed during runtime. For example, you can change the command being
used to check a host, based on changing conditions in your environment.
In the new version, this functionality is expanded to include the ability
to change the times during which checks are scheduled to occur. This
allows you to turn on/off checks at specific times according to conditions
in your environment.
Notifications have also been enhanced, now allowing a delay to be
added to first notifications. Notifications can be generated when flapping
is disabled and, most importantly, notifications can now be sent out
when a scheduled downtime starts, ends or is cancelled.
Objects and templates haven't been forgotten either. One particularly
useful change is the ability to use multiple templates for objects.
Another is the addition of custom variables in host, service and contact
objects. Version 2.0 only allows the application of one template to
an object. Multiple templates offer greater flexibility and power, which
will make a significant difference to the configuration of objects.
Custom variables allow you to define your own directives in object
definitions and, therefore, attach additional information about an object
to its definition. These variables can be retrieved and used elsewhere
in your Nagios environment. For instance, you could define the SNMP
community strings for a host in its definition and then use these later
in a check or external command.
Other object and template changes include: merging service and host-extended
information object data into service and host object definitions, and
adding group member directives to the host and service group objects.
Enhancements to external commands are also present, including the
ability to process commands found in an external file. The suggested
use of this functionality is for passive checks with long output or
complicated scripting. A further added to Nagios 3.0 is that external
command checking is now turned on by default. In previous versions,
such checking was set off by default.
Host and service logic alterations have also been made. Most notably,
host checks now run asynchronously in parallel with each other. This
should help balance overall check performance. Another enhancement is
the ability to cache host and service check results and a function to
enable the predictive checking of dependent hosts and services.
The ability to output multiple lines of data from host and service
checks has also been added. Previously, Nagios 2.0 was limited to a
single line of output from checks, thus reducing the utility of some
checks. Now, multiple lines can be received
and processed by Nagios and the size of plug-in output has been correspondingly
increased to 2Kbs.
A number of performance optimizations have been included in Nagios
3.0, as well as enhancements to the Nagios Event Broker and the embedded
Perl interpreter. Also worth mentioning are updates to macros and to
status, comment and retention data.
"The most well-known NEB module is the NDO Utilities module.
The NDO Utilities module is written by Nagios' developer, Ethan Galstad,
and is designed to output events and data from Nagios to standard file or
a Unix socket. "
01.29.2007 | searchenterpriselinux.techtarget.com
The Nagios enterprise monitoring tool generates a variety of events.
The principal events generated are the results of monitoring applications,
databases, devices, services and hosts. Also generated is performance
data and notification events such as outages and downtime. There are
a number of ways to integrate and utilize these events. The most advanced
and effective event integration mechanism is the Nagios Event Broker
(NEB).
NEB uses callback routines that are executed when events occur in
the Nagios server. Using NEB you can write broker modules that can process
these events. NEB allows you to output and integrate events into a variety
of tools including MySQL databases, SNMP traps, syslog messages or use
the event data in a variety of other applications and tools.
Nagios Event Broker functions and triggers
NEB uses shared code libraries called modules that are hooked into
the Nagios server when it is executed. Each module can register callback
procedures that are able to receive and process events. When an event
occurs, NEB checks for the presence of a registered callback and, if
detected, sends the event to the module. The module receives the event
and performs whatever actions are coded into it.
The broker can process a large number of events including, amongst
others:
- Nagios process startup and shutdown
- Host and Service checks
- Plug-in commands and notifications
- External commands and event handlers
- Flapping, comments and downtime
You can see a full list of the callbacks in the nebcallbacks.h
include file located in the include directory of the Nagios source
package.
Enabling Nagios Event Broker
NEB should be enabled by default when you compile Nagios (unless
you disable it). If you want to ensure that NEB gets compiled then specify
the --enable-neb configure option when configuring Nagios.
# ./configure --enable-neb
Registering modules with Nagios Event Broker
Modules are included into the Nagios configuration by using broker_module
configuration options in the nagios.cfg configuration file. For
example:
broker_module=/usr/local/nagios/bin/testmodule.o
This line would load a module called testmodule.o located
in the /usr/local/nagios/bin directory. You can also specify
a configuration file for a module like so:
broker_module=/usr/local/nagios/bin/testmodule.o config_file=/usr/local/nagios/etc/testmodule.cfg
You need to restart Nagios for any newly defined modules to take
effect.
Writing modules for Nagios Event Broker
NEB Modules can be written in C or C++. You can see an example of
a module in the Nagios package. Located in the module directory off
the root of the package is the helloworld module. You can create it
by compiling the helloworld.c file.
# gcc -shared -o helloworld.o helloworld.c
You can then add this module to Nagios using the broker_module
directive in the nagios.cfg configuration file. Restart Nagios
and the module is now loaded.
The Helloworld module is extremely simple. Helloworld logs a message
to the default Nagios log file when Nagios is started and stopped and
when aggregated status updates start and finish. The message looks like:
[1137151111] helloworld: An aggregated status update just started.
[1137151112] helloworld: An aggregated status update just finished.
You can review the contents of this module (which includes some basic
inline documentation)
Available modules for Nagios Event Broker
There are not a lot of NEB modules available, so far. The most well-known
NEB module is the NDO Utilities module. The NDO Utilities module is
written by Nagios' developer, Ethan Galstad, and is designed to output
events and data from Nagios to standard file or a Unix socket. It also
comes with a module,
NDO2DB, that can write Nagios data to a MySQL or PostgreSQL database.
It should provide (together with the helloworld module) a good introduction
to NEB and help you get started on writing your own modules.
You can also find the following NEB modules:
NEB
module that logs to a socket based on client requests
A
NEB module (as yet unreleased) that does event correlation with Nagios
and SEC.
A NEB module that helps integrate Cacti with Nagios.
Further help with Nagios Event Broker
There is not a lot of documentation available for NEB thus far. The
only major piece of documentation available is about the
NEB API. You can also review the Nagios source code relevant to
NEB, particularly the include files.
As always the
Nagios development and user mailing lists are good starting places
for assistance.
Dec 05, 2005 (
Sys Admin)
In the past few years, Nagios has become the industry standard open
source systems monitoring tool. If you're using an open source app to
monitor the availability, state, or utilization of your servers or network
gear, then chances are you are using Nagios to do it. To those who have
worked with it, this is no surprise. The lightweight design of Nagios
offloads the actual query logic into "plug-ins", which are easily created,
modified, and re-purposed by sys admins. The lack of complex query logic
leaves the Nagios daemon free to manage scheduling and notifications
and to handle UI.
Nagios's "keep it simple" approach makes it straightforward to administer,
network transparent, and amazingly flexible.
Two excellent articles by Syed Ali in previous editions of
Sys Admin covered the installation and configuration of Nagios.
In this article, I'll pick up where those articles left off and provide
some creative solutions to problems commonly faced by sys admins working
with Nagios to monitor the health and performance of systems.
It is still unclear why false alerts were generated. Is
this just a plug for Hyperic ?
There was nothing technically wrong with the HP ProLiant servers at
Mynewplace.com,
an online rental services agency based in San Francisco, but the IT
staff kept on getting beeped at 4 a.m. with alerts that eventually proved
to be false alarms.
So while the servers were fine, the IT staff wasn't. Entire days
were being wasted each month diagnosing their clutch of 50 HP ProLiant
DL145s and DL385s running Red Hat Enterprise Linux 4 AS and ES, said
John Shin, Mynewplace.com's director of systems. Shin decided he needed
to make some changes. .
Struggling with network monitoring
"We were struggling with monitoring," Shin said, but that may have
been an understatement. Things were so bad, in fact, that at one point
last year he contemplated disabling the monitoring application altogether
because it was doing more harm than good.
The application was Nagios, a popular open source systems and network
monitoring application that provides alerts for user-defined hosts and
services. In Shin's network, however, it was triggering false
alarms because of simple network management protocol [SNMP] incompatibilities
with Mynewplace.com's open source application server, Resin 3.0.
Resin is based on a Java implementation of the PHP scripting language
and is maintained and supported by San Diego-based Caucho Technology
Inc.
Nagios, JVM and Resin 3.0 woes
Since Resin and Nagios were not directly compatible, Shin would expose
the application stack's Java virtual machines (JVMs) through SNMP and
monitor the environment that way. Unfortunately, response times under
those conditions were sluggish, he said.
"Nagios was not really the problem," Shin said. "It was the
JVM stack not being able to respond to it correctly. It was recording
events in SNMP that were then watched by Nagios and that made things
crawl. There were a lot of man hours wasted, and it would trigger the
4 a.m. pages."
In spite of its popularity on open source repositories like SourceForge.net,
Nagios has its detractors. In a recent interview about Nagios with SearchEnterpriseLinux.com,
Zenoss Inc. CEO Bill Karpovich criticized Nagios for its lack of
enterprise-level support. "The maintainers never thought of it
as a project that an IT manager would use to monitor an entire enterprise
environment," he said. Zenoss is an open source startup vendor in the
systems management space.
... ... ...
The feature-rich, expensive offerings from HP and the other members
of the "big four" – IBM, CA and BMC – have spawned
the "little four" (a phrase coined by analyst
firm RedMonk), comprised of Hyperic, Zenoss, Qlusters and GroundWork.
Executives from those companies have bet their chips
on the valuable midmarket for customer wins like Mynewplace.com.
Compared with OpenView, offerings from the "little four" were priced
approximately two-and-a-half times less on average, Shin found, although
he would not cite specific dollar amounts. OpenView had another strike
against it: "It did not have the framework in place to monitor some
of our key applications," namely Resin and Postgres, Shin said.
Nagios is a free, open source enterprise monitoring tool designed
to run on Linux. It has extensive monitoring and management capabilities
that allow you to check applications, databases and network devices,
as well as Windows and Unix/Linux hosts and services. It is easy to
install, fast to configure and highly customizable.
Nagios also comes with a Web-based console, extensible Nagios Event
Broker (NEB), that allows you to integrate Nagios with other tools,
like database back-ends, and a large collection of monitoring commands
and capabilities. It's current release, version 2.0, is stable and production
ready. You can take a look at Nagios at http://www.nagios.com.
Development of Nagios has not stopped with version 2.0, though. Nagios'
principal developer, Ethan Galstad, has recently released some information
on the status and potential features of the next release, version 3.0.
Galstad's announcement also suggests an alpha release of version 3.0
could be scheduled as early as the end of February 2007.
Features: What's new in Nagios 3.0
So what's new with version 3.0? Well, a lot. Let's walk through the
major new features and look at how some of Nagios' old features have
been expanded or changed.
One of the interesting features introduced in Nagios 2.0 was adaptive
monitoring. Adaptive monitoring allowed a Nagios configuration to be
changed during runtime. For example, you can change the command being
used to check a host, based on changing conditions in your environment.
In the new version, this functionality is expanded to include the ability
to change the times during which checks are scheduled to occur. This
allows you to turn on/off checks at specific times according to conditions
in your environment.
Notifications have also been enhanced, now allowing a delay to be
added to first notifications. Notifications can be generated when flapping
is disabled and, most importantly, notifications can now be sent out
when a scheduled downtime starts, ends or is cancelled.
Objects and templates haven't been forgotten either. One particularly
useful change is the ability to use multiple templates for objects.
Another is the addition of custom variables in host, service and contact
objects. Version 2.0 only allows the application of one template to
an object. Multiple templates offer greater flexibility and power, which
will make a significant difference to the configuration of objects.
Custom variables allow you to define your own directives in object
definitions and, therefore, attach additional information about an object
to its definition. These variables can be retrieved and used elsewhere
in your Nagios environment. For instance, you could define the SNMP
community strings for a host in its definition and then use these later
in a check or external command.
Other object and template changes include: merging service and host-extended
information object data into service and host object definitions, and
adding group member directives to the host and service group objects.
Enhancements to external commands are also present, including the
ability to process commands found in an external file. The suggested
use of this functionality is for passive checks with long output or
complicated scripting. A further added to Nagios 3.0 is that external
command checking is now turned on by default. In previous versions,
such checking was set off by default.
Host and service logic alterations have also been made. Most notably,
host checks now run asynchronously in parallel with each other. This
should help balance overall check performance. Another enhancement is
the ability to cache host and service check results and a function to
enable the predictive checking of dependent hosts and services.
The ability to output multiple lines of data from host and service
checks has also been added. Previously, Nagios 2.0 was limited to a
single line of output from checks, thus reducing the utility of some
checks. Now, multiple lines can be received and processed by Nagios
and the size of plug-in output has been correspondingly increased to
2Kbs.
A number of performance optimizations have been included in Nagios
3.0, as well as enhancements to the Nagios Event Broker and the embedded
Perl interpreter. Also worth mentioning are updates to macros and to
status, comment and retention data.
To see a full list of the changes, or if you wish to try Nagios 3.0
before its alpha release, you can download a current CVS snapshot from
http://www.nagios.org/development/cvs.php . The Changelog file contained
in the snapshot provides a reasonably full list of the proposed changes.
Notes:
- This is a Spartan WHYFF (We Help
You For Free) site written by people for whom English
is not a native language.
Some amount of grammar and spelling errors should be
expected.
- The site contain some broken links
as it develops like a living tree...
Please try to use Google, Open directory,
etc. to find a replacement link (see
HOWTO search the WEB for details). We would appreciate
if you can
mail us a correct link.
|
|
|
|
In case of broken links
please try to use Google search. If you find the page please notify
us about new location
HowToContactNagios - Munin - Trac
Munin integrates perfectly with
Nagios. There are,
however, a few things of which to take notice. This
article shows example configurations and explains the
communication between the systems.
Receiving messages in
Nagios
¶
First you need a way for
Nagios to accept messages from Munin.
Nagios has exactly such
a thing, namely the NSCA
which is documented here:
http://nagios.sourceforge.net/docs/1_0/addons.html#nsca.
NSCA consists of a
client (a binary usually named send_nsca
and a server usually run from inetd. We
recommend that you enable encryption on
NSCA communication.
You also need to configure
Nagios to accept messages via
NSCA.
NSCA is, unfortunately,
not very well documented in
Nagios' official documentation. We'll cover
writing the needed service check configuration further
down in this document.
Configuring
Nagios
¶
In the main config file, make sure that the
command_file directive is set and that it works.
See
http://nagios.sourceforge.net/docs/2_0/configmain.html#command_file
for details.
Below is a sample extract from
nagios.cfg:
command_file=/var/run/nagios/nagios.cmd
The /var/run/nagios
directory is owned by the user
nagios runs as. The
nagios.cmd is a
named pipe on which Nagios
accepts external input.
Configuring
NSCA, server side
¶
NSCA is run through
(x)inetd. Using inetd, the below line enables
NSCA listening on port
5667:
5667 stream tcp nowait nagios /usr/sbin/tcpd /usr/sbin/nsca -c /etc/nsca.cfg --inetd
Using xinetd, the blow line enables
NSCA listening on port
5667, allowing connections only from the local host:
# description: NSCA (Nagios Service Check Acceptor)
service nsca
{
flags = REUSE
type = UNLISTED
port = 5667
socket_type = stream
wait = no
server = /usr/sbin/nsca
server_args = -c /etc/nagios/nsca.cfg --inetd
user = nagios
group = nagios
log_on_failure += USERID
only_from = 127.0.0.1
}
The file /etc/nsca.cfg
defines how NSCA
behaves. Check in particular the
nsca_user and
command_file directives, these should correspond to
the file permissions and the location of the named pipe
described in nagios.cfg.
nsca_user=nagios
command_file=/var/run/nagios/nagios.cmd
Configuring
NSCA, client side
¶
The NSCA client is a
binary that submits to an NSCA
server whatever it received as arguments. Its behaviour
is controlled by the file /etc/send_nsca.cfg,
which mainly controls encryption.
You should now be able to test the communication
between the NSCA client
and the NSCA server,
and consequently whether
Nagios picks up the message.
NSCA requires a defined
format for messages. For service checks, it's like this:
<host_name>[tab]<svc_description>[tab]<return_code>[tab]<plugin_output>[newline]
Below is shown how to test
NSCA.
$ /usr/sbin/send_nsca -H localhost -c /etc/send_nsca.cfg
foo.example.com test 0 0
1 data packet(s) sent to host successfully.
This caused the following to appear in /var/log/nagios/nagios.log:
[1159868622] Warning: Message queue contained results for service 'test' on host 'foo.example.com'. The service could not be found!
Messages are sent by
munin-limits based on the state of a monitored data
source: OK, Warning
and Critical. Munin does not currently
support a Unknown state (This will be fixed in the
future, see
Ticket 29 for more information).
Configuring munin.conf
¶
Nagios uses the
above mentioned send_nsca
binary to send messages to
Nagios. In /etc/munin/munin.conf, enter
this:
contacts nagios
contact.nagios.command /usr/bin/send_nsca -H your.nagios-host.here -c /etc/send_nsca.cfg
 |
|
Be aware that the -H switch to
send_nsca
appeared sometime after send_nsca
version 2.1. Always check send_nsca
--help! |
Configuring Munin
plugins
¶
Lots of Munin plugins have (hopefully reasonable)
values for Warning and Critical levels. To set or
override these, you can change the values in
munin.conf.
Configuring
Nagios services
¶
Now Nagios needs to
recognize the messages from Munin as messages about
services it monitors. To accomplish this, every message
Munin sends to Nagios
requires a matching (passive) service defined or
Nagios will ignore the
message (but it will log that something tried).
A passive service is defined through these directives
in the proper Nagios
configuration file:
active_checks_enabled 0
passive_checks_enabled 1
A working solution is to create a template for
passive services, like the one below:
define service {
name passive-service
active_checks_enabled 0
passive_checks_enabled 1
parallelize_check 1
notifications_enabled 1
event_handler_enabled 1
register 0
is_volatile 1
}
When the template is registered, each Munin plugin
should be registered as per below:
define service {
use passive-service
host_name foo
service_description bar
check_period 24x7
max_check_attempts 3
normal_check_interval 3
retry_check_interval 1
contact_groups linux-admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
check_command check_dummy!0
}
Notes
¶
- host_name is either the FQDN of the
host_name registered to the
Nagios plugin, or
the host alias corresponding to Munin's
NSCA_Setup
nagios-yum-cfengine
- Paperback: 464 pages
- Publisher: No Starch Press;
U.S. Ed edition (May 30, 2006)
- Language: English
- ISBN-10: 1593270704
- ISBN-13: 978-1593270704
- Product Dimensions: 9.2 x 7
x 1.1 inches
Best for Nagios admins who want specific details on plug-ins,
September 4, 2006
I recently received review copies of Pro Nagios 2.0 (PN2) by James Turnbull
and Nagios: System and Network Monitoring (NSANM) by Wolfgang Barth.
I read PN2 first, then NSANM. Both are excellent books, but I expect
potential readers want to know which is best for them. The following
is a radical simplification, and I could honestly recommend readers
buy either (or both) books. If you are completely new to Nagios and
want a very well-organized introduction, I recommend PN2. If you are
somewhat familiar with Nagios and want detailed descriptions of a wide
variety of Nagios plug-ins, I recommend NSANM.
NSANM strengths lie in the depth of coverage of certain elements when
compared to PN2. PN2 devotes 7 pages to host checks, while NSANM's Ch
7 offers 21 pages. PN2 supplies 8 pages on service checks, but NSANM's
Ch 6 gives 46 pages. This level of detail can be very useful. For example,
NSANM's explanation of check_squid also shows to to configure Sguid
to allow access to its cache manager.
NSANM shares more information on certain background protocols like SNMP.
PN2's SNMP section is about 7 pages, whereas NSANM's Ch 11 is 36 pages.
NSANM demonstrates more aspects of Nagios' Web interface and the CGI
programs generating pages. I thought author Wolfgang Barth made very
effective use of diagrams, like the network topology explanation in
Ch 4, the service checks in Ch 5, and notification in Ch 12.
NSANM includes some material not mentioned in PN2, like using Nagios
with Cygwin. Sometimes the books are very complementary, as shown by
PN2's discussion of NSClient++ and NSANM's overview of NSClient and
NC_Net.
NSANM is lacking coverage of security, redundancy, and failover, however.
PN2 does address these critical issues. Beware the some of the "chapters"
in NSANM are very short -- like Ch 8 (2 pages!) and Ch 19 (barely 6
pages). I think short sections like those should have been integrated
into longer chapters or moved into the appendices.
Overall, NSANM is a very good book. I believe new Nagios readers should
read PN2, and strongly consider NSANM as a complementary reference volume.
- Hardcover: 424 pages
- Publisher: Apress (April 17, 2006)
- Language: English
- ISBN-10: 1590596099
- ISBN-13: 978-1590596098
- Product Dimensions: 9.3 x 7.1 x 1.1 inches
A short, superficial into book (190).
Killing phaze from the review below: "This is the book you should
pass to your manager so (s)he understands why and how an open solution like
Nagios is the better choice and can be used for achieving surpassing solutions.
"
Warning: Several reviews of
this book looks like plants: written by the author who has a single networking
book review or just a single review.
Spot on for a well structured book with many WOW-factors,
May 17, 2007
By |
Nils
Valentin (Tokyo, Japan) -
See all my reviews
|
--- DISCLAIMER: This is a requested review by PTR, however any opinions
expressed within the review are my personal ones. ---
Introduction - 6p
CHAPTER 1 Best Practices - 12p
CHAPTER 2 Theory of Operations - 26p
CHAPTER 3 Installing Nagios - 11p
CHAPTER 4 Configuring Nagios - 23p
CHAPTER 5 Bootstrapping the Configs - 10p
CHAPTER 6 Watching - 46p
CHAPTER 7 Visualization - 42p
CHAPTER 8 Nagios Event Broker Interface - 19p
APPENDIX A Configure Options - 3p
APPENDIX B nagios.cfg and cgi.cfg - 9p
APPENDIX C Command-Line Options - 10p
Index - 14p
The book is with 190 pages (230p. when including appendix and
index) very compact. It teaches you Nagios in a way I have never
heard / read before. I must assume that the authors clear structured
style - which runs through the book like a red line - must be responsible
for the excellent outcome.
The book starts in the introduction with the title "Do it right the
first time" and that hits it right on the spot. What make out the features
of this little portable knowledgebase is the exceptional well thought
through contents and its explanations by the author. David is not filling
pages by explaining each and every parameter, but rather showing you
the big picture, and explaining how to approach new issues or how one
technical solution is better over another.
This is the book you should pass to your
manager so (s)he understands why and how an open solution like Nagios
is the better choice and can be used for achieving surpassing solutions.
The book itself basically is divided in two sections:
Background, setup and configuration - Chapters 1-5
Advanced Topics - Chapters 6-8
I did find any of the chapters to have a nice balance of the amount
of information needed but some EXCEPTIONAL good parts of book where:
Chapter 1 Best practices
Chapter 2 - the part about scheduling
Chapters 6-8 as a whole
Chapter 6 has a thorough explanations on monitoring the different OS's
(especially the Windows part !!) or other applications.
Chapter 7 for its overall thoroughness of how to visualize your data
to reach the next level of a better understanding of the systems / network
you are monitoring.
Chapter 8 is describing a filesystem based status interface. The NEB
module will write a file with its current status code for each service.
I have to admit that some technical details went over my head, but I
thought that was pretty cool !!
The featured points above is what I found to be exceptionally good and
most likely the strongest sales points for this little portable knowledgebase.
That doesnt mean that the other not mentioned parts of the book are
weak, mind you.
Funny enough the above mentioned points where EXACTLY the points which
I haven't seen explained this thorough anywhere before.
So David's book was exactly spot on for me.
Summary:
To sum it all up in very simple words: This is a hell of a book !!
Its the most compact, well structured book on Nagios that I have seen
to date. It contains many WOW-factors. While reading each chapter you
can virtually "feel" how Davids explanations and tips and tricks already
helped you to avoid time consuming pitfalls.
So this book is not about "to buy or not to buy", this is an investment
you dont want to miss !!
I was especially impressed by the thoroughness the book is written by
from the first page. Also the contents of the first chapter wasnt new
to me, the way it was explained already provided many of those A-ha
moments.
The main asset of the book is not the description of the tools itself,
but rather the tought and considerations the author put into it and
the sharing of those thoughts in a way that the reader can actually
visualize how and why one solution is better over another, without actually
having to go to the "luxury to experience the pitfalls" in a live disaster
scenario.
PS: AFTER I finished reading the book I re-read the "Editorial Review"
Amazon gave above and found it pretty well describing the actual book
and what you should expect.
>> You can find more reviews on Nagios related books including a comparison
by deploying my profile. <<
Copyright © 1996-2008 by Dr. Nikolai Bezroukov.
www.softpanorama.org was
created as a service to the UN Sustainable Development Networking Programme (SDNP)
in the author free time.
Submit
comments This document is an industrial compilation designed and created
exclusively for educational use and is placed under the copyright of the
Open Content License(OPL).
Original materials copyright belong to respective owners. Quotes are made
for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with. We do not warrant the correctness
of the information provided or its fitness for any purpose.
Last modified:
December 22, 2008