Splunk, Syslog-ng, and You

Splunk is quite useful for dealing with a large volume of log data from various devices and presenting the information in a easy-to-use interface that makes both searching and generating charts, graphs, and other trending data quite easy. Like all systems, however, sometimes configurations need to change, services need to be restarted, or other issues arise that require Splunk’s network listeners to go off-line. If you use the Splunk Universal Forwarder, this is not really a big deal as the data will queue on the Forwarder until it re-establishes a connection with the Indexer, but what about all of those network and appliance devices out there that only provide syslog as a means of collecting information? When the Splunk network listener goes offline, all those juicy syslog messages just bounce off of a deaf server until the service starts back up. While maybe a few log messages going missing doesn’t sound like the end of the world, it could mean the difference between scratching your head for hours and solving a problem in minutes, so it is worth doing something about.

To solve this issue, we rely on our old friend syslog-ng. Syslog-ng is an extremely flexible syslog processing daemon, which allows us to organize logging from the network both for our own sanity as well as Splunk’s. Because we likely have many devices sending us syslog, we will make syslog-ng sort these out into single files per host so Splunk can read each file individually, making it simple to assign the host value correctly in the indexer.

On an Ubuntu server, this is all handled in /etc/syslog-ng/syslog-ng.conf. First, we define a source that is only information coming in from the network:

# Network-sourced syslog messages
source s_net {
       udp();
       tcp();
};

Second, we define a destination that uses a variable called HOST to automatically create subdirectories for us. We will give it its own subdirectory to work under so as not to disturb any package-created logging directories. For this example, I am using /var/log/network/:

# Per-host logging of network traffic
destination df_hurdef_perhost {
       file("/var/log/network/$HOST/syslog-ng.log"
       owner(root) group(adm) perm(0640) dir_perm(0750) dir_group(adm)
create_dirs(yes));
};

Lastly, we define a logging statement to combine our first two parts together. In addition, we recommend adding in a built-in filter to this statement to exclude debug-level logs. This helps keep the usage on your Splunk license down when someone is running debugs on a system that you do not necessarily want to index:

# Local logging of all network-sourced, non-debug logs
log {
       source(s_net);
       filter(f_at_least_info);
       destination(df_hurdef_perhost);
};

After these have been added to the config, create /var/log/network/, and send the syslog-ng daemon a reload. If your devices log fairly regularly, you should immediately start seeing IP addresses in your directory listings of /var/log/network/. Once this is working, go into your Splunk Manager interface and select Data Inputs. From here, select Add Data, then select From files and directories from the second list. leave the first radio button selected, and enter the following path:

/var/log/network/*/syslog-ng.log

Enable the More settings checkbox, change set host to “segement in path”, and set the segment number to “4″ to correspond with the asterisk in the path provided. This will let Splunk use the directory namecreated by syslog-ng to populate the host value in the index directly.

Finally, if you are using different indexes, select the appropriate index for the information to be added to. Click save, and you are ready to start indexing the data you have collected so far. The most convenient thing about this method is that as soon as syslog data starts coming into the network from a new IP address, Splunk will immediately see it and start indexing it with no configuration changes or restarts, and best of all, if you need to restart Splunk for any reason, it will catch right back up when it restarts because syslog-ng will keep on logging to the local file.

As a final note, these syslog-ng.log files can grow to be rather large, so it is best to configure log rotation for them. Because Splunk will be the long-term storage for this log data, we just need to keep a few days worth that rotate off like the other system logs. Fortunately, syslog-ng already has a log rotation script provided at /etc/logrotate.d/syslog-ng, so just add the following section to it right before the last entry that contains the postrotate script:

/var/log/network/*/syslog-ng.log {
  rotate 5
  daily
  compress
}

Happy Splunking!

This entry was posted in Blog, Splunk. Bookmark the permalink.