Intro - validating data integrity and keeping performance costs down
Having the ability to login to Splunk (sometimes seamlessly with SSO) and instantly see problem areas in your environment is extremely beneficial. Whether you’re a C-level executive, project manager, or one of the in-the-trenches tech nerds, having visibility that enables you to see issues, or gather other contextual value from your data, can be priceless. What’s even better is that with the clear graphical format you can get to the point efficiently and also see how all the pieces come together.
Now, in my time working with Big Data, I’ve learned a lot about the technical aspects of file systems, resource usage, and what terabytes and terabytes of data processing really looks like. What I didn’t realize was just how many cans of worms could be found hidden within those technologies, such as storage speed, non-optimized Splunk indexing configurations, and even lack of CPU/Memory resources.
They say, “practice doesn’t make perfect, but perfect practice does.” For instance, adding more and more aspects (or data sources) into one project has the potential to increase risk of misconfigurations or mistakes - which (I’m assuming) we all want to try to avoid as much as possible. For that reason, we’ll discuss some tips and tricks that will make your life exponentially easier when it comes to validating the integrity of your data, along with keeping performance costs down while searching. There are two main portions of data integrity that are extremely important with Splunk: time configuration and line breaking. In Part 1 of this post, we’ll focus on time.
Note: In this blog post we’ll be discussing a subset of a giant, but it’s definitely a good conversation to have regarding the foundation for any data source in Splunk. For a full list of props configurations, please read the doc!
TIME - how to get it right and make sure you’re not skipping a beat
Splunk has a few configurations that deal with time in props.conf. That said, we search data on a linear time-based scale, and time can’t easily be changed after it is indexed into Splunk. For that reason, we need to make sure we get it right from the start.
TIME_PREFIX = <regular_expression>
TIME_PREFIX is exactly what it sounds like - the text that happens right before an event’s timestamp. TIME_PREFIX is written as a regular expression. Splunk will scan the data for this string, and if it cannot find it, timestamp extraction will not occur. That means this is a powerful tool that can make or break your data, and should always be tested before moving to production. The default for this configuration is “empty”. Using this setting will greatly improve processing.
TIME_FORMAT = <strptime_style format>
Splunk’s TIME_FORMAT attribute allows the admin to tell Splunk what (strptime) format the timestamp is in - whether it be “month/day/year”, a 24 hour clock, UTC or epoch time, etc. The default for this configuration is “empty.” Splunk will automatically try to find and parse a timestamp for you, but is not accurate 100% of the time, as a variety of different log sources can host different time formats - especially when dealing with international data inputs. The time format argument reads directly following the TIME_PREFIX string and can help Splunk greatly reduce processing overhead if it knows where and how to look for something right off the bat - like knowing your sunglasses are already on your head while you spend 15 minutes looking all over for them before you leave the house.
For a list of all strptime format variables, check out this link.
MAX_TIMESTAMP_LOOKAHEAD = <integer>
This is the last critical time component I’ll discuss here. Given a TIME_PREFIX, Splunk will decide how far into the data to search for a timestamp. As timestamps come in many shapes and sizes, helping Splunk out with how large the timestamp string is can be very useful.
Think about it, even if you have a set of digits after your timestamp, Splunk can never try to read the incorrect digits after a timestamp if you know a timestamp is only 15 characters - especially when the default here is set to 150 characters. If your log looks like:
Mar 15 07:45:00 2032 bytes…
Splunk may try to read “2032” as the year instead of “2032 bytes”. Since the lookahead is only 15 characters, this would be avoided, along with mistaken events 15 years in advance. This would also cut down on overhead/time required for Splunk to decipher this itself. While it may seem like a trivial task, keep in mind that Splunk sometimes ingests hundreds of thousands of events per second - the math adds up, no one needs an angry CISO yelling at them for Splunk “being too slow.”
Example - know what to do with your timestamps in real-time
Now that you have a brief overview of what each time attribute is responsible for, let’s take a look at a real-life example. Keep in mind that props.conf settings are set on a per source/sourcetype/host basis. I’ve found the easiest (but not always possible) way to standardize these are by sourcetype. So we’ll base these settings on a per sourcetype basis for example purposes (proxy_console_syslog in this case).
Here we have a standard proxy’s console log that houses some critical information about the behind-the-scenes task of a proxy server:
Mar 21 10:04:54 ProxySG: 500000 Dynamic categorization error: unexpected response code 500 from service(0) SEVERE_ERROR myapi_api.cpp 100
The first thing that we notice is that the timestamp is at the very beginning of the line - making our TIME_PREFIX nice and simple - removing the struggle of having to regex something special.
We’ll use a “start of line” character for the prefix here:
TIME_PREFIX = ^ // The “^” character translates to “beginning of a string”
Our TIME_FORMAT looks like it is in a standard format. Note that it does not include the “year”. Looking at the timestamp, the format appears to be written as “abbreviated-month, day-of-month, hour-of-day, minute, second”.
Per our link with strptime variables listed above, we can safely say the format should look something like:
TIME_FORMAT = %b %d %H:%M:%S
The MAX_TIMESTAMP_LOOKAHEAD here is literally just how many characters starting with the first character of the timestamp, the timestamp is. For our timestamp “Mar 21 10:04:54”, we would count “M” as #1, and the last digit “4” as the last character of the timestamp. After you count that out, you should have a lookahead value of 15 - that’s right, spaces lives matter, too.
MAX_TIMESTAMP_LOOKAHEAD = 15
You can now officially say you know what to do with your timestamps. Here’s how we bring the configuration together in your props.conf:
[proxy_console_syslog] TIME_PREFIX = ^ TIME_FORMAT = %b %d %H:%M:%S MAX_TIMESTAMP_LOOKAHEAD = 15
Conclusion - at this point you should no longer have timestamping issues
At this point, your proxy console logs will never have a timestamping issue again. Keep in mind that for these changes to come to life, they need to be installed on the first Splunk Enterprise server your data hits (either your indexers or your Heavy Forwarders, if you use them) and Splunk will need to be restarted. Stay tuned for Part 2, where we’ll talk about the other arch enemy of Splunk events: line breaking.