Splunk 6.3 was released this year at .conf, Splunk’s annual conference. Some of the new features were expected, including faster performance, scheduler improvements, and so forth, along with others that were interesting and impressive, even though they may not have made it into the keynote. In this blog post, we’ll explore the new features in Splunk 6.3, as well as touch upon some upgrade concerns and recommendations.
Performance Enhancement: Multiple Pipeline Sets and Faster Searches
The biggest performance enhancement feature announced at .conf was the ‘twice as fast’ searching and indexing. The two benefits of this advancement include:
- Improving resource utilization of host machine
- Increasing speed of batch searches
Any time companies announce speed improvements, we remain skeptical until progress is experienced in real deployments. So, fortunately, we we were able to gather a little more information about how they achieved this speed improvement.
Instead of a single search, or indexing, pipeline there are actually multiple, parallel pipelines. Each pipeline is a thread that can only run on a single CPU. With multiple pipelines, all running in parallel, Splunk is able to more efficiently use numerous CPUs for a single task. What this means is that Splunk will be able to better utilize servers that have unused CPUs.
This will NOT increase performance if your servers already have a high CPU utilization. This also does NOT change the amount of resources required for Splunk, nor change the reference hardware specifications. The purpose of this change is to allow Splunk to scale better in servers with a very high number of CPUs.
For more details, you can see the slides of this .conf2015 session here: “Harnessing 6.3 Performance and Scalability”.
Problem Solution: Scheduler Improvements and Priority Scoring
Scheduler improvements are another enhanced feature of Splunk 6.3. The goal of this improvement is to reduce the amount of scheduled searches that are skipped, and prevent the same searches from being skipped every time.
Splunk’s old scheduler took everything that was supposed to run at that time, and queued them to run in alphabetical order. If a search didn’t get any cycles to run before the scheduler ran again, then it would go back into the queue. If that same search continued to get deferred until it was supposed to run again, then it would skip that run of the search. Since the order of the searches is the same every time (alphabetical), then typically, the same searches got skipped.
The new scheduler has a priority scoring that takes into account previous skips, time that it takes to run, and allowance for a scheduling “window”. The scheduler starts by taking all of the searches scheduled to run ≤ now() and randomly shuffles them. This is the first step to ensure that the same searches don’t get skipped every time. Then it figures out a priority score for each search.
Formula Breakdown: Search Scheduler Priority Score Calculator
Here is a formula for how the search scheduler calculates the priority score:
score(j) = next_runtime(j) + avg_runtime(j) x priority_runtime_factor - skipped_count(j) x period(j) x priority_skipped_factor + schedule_window_adjustment(j)
Here is a breakdown of what each part of that equation means:
- next_runtime(j) = The time that the search is scheduled to run – this is the base of the score – if everything else is equal, then searches that are scheduled to run first get a better priority score
- avg_runtime(j) = The average time that the search takes to run – this allows shorter searches to run and get out of the way to free up resources for longer searches to run
- priority_runtime_factor = This is a variable that can be adjusted to modify how much the avg_runtime affects the score – we do not recommend changing this
- skipped_count(j) = The number of times that the search has been skipped – this allows searches that have been skipped to get a better priority
- period(j) = This is the interval between consecutive searches – if a search that runs every minute is skipped, it’s probably less of an issue than a search that only runs once per day – longer periods make the skipped_count to impact the score more than shorter periods
- priority_skipped_factor = This is a variable that can be adjusted to modify how much the skipped_count affects the score – we do not recommend changing this
- schedule_window_adjustment(j) = This is the length of window remaining on a search with a window configured – this allows non-windowed searches to get a better priority, but if there are available cycles, then the windowed search can run during the window
Finally, the scheduler reorders the searches by priority, with the lowest priority searches running first. Adding a window to a search does not GUARANTEE that it will run in that window, it only adds a window during which the search is allowed to run earlier than scheduled, if extra resources are available.
Advantageous Search Schedule Variable: max_searches_perc
Another advantageous change to the search scheduler is a variable “max_searches_perc” setting. The “max_searches_perc” setting determines what percent of the max concurrent search limit is allowed to be used by the search scheduler.
By default, only 50% of the max concurrent search limit can be used for saved searches. This is intended to permit search slots to be available for users’ ad-hoc searches. The change allows multiple “max_searches_perc” settings for different times (using cron syntax). This is intended to allow more schedules searches to run overnight and/or on the weekends, when users are unlikely to be running ad-hoc searches.
For more details, you can see the slides of this .conf2015 session here: “Making the Most of the New Splunk Scheduler“.
New Custom Alert Actions: “sendalert” command
The most useful new feature is the new custom alert actions. Previously, scripts for alert actions had to be in $SPLUNK_HOME/bin/scripts/. With the new customer alert actions, the scripts can be packaged with Splunk Apps. App developers can also create custom user configuration interfaces. Verifying alert actions is much easier with the new “sendalert” command.
For more details, you can find the Splunk documentation here: “Developing Views and Apps for Splunk Web: Custom alert actions overview”.
Compliance Requirement: Data Integrity Checking Has Returned
Many of our clients have been asking about data integrity. Data integrity was previously supplied by a feature called “data block signing”, but was removed in Splunk 6.2. Data integrity checking is now a requirement for PCI DSS compliance, and has returned in Splunk 6.3.
In order to provide data integrity, Splunk hashes each slice of newly indexed data to the “l1Hashes” file. When the bucket rolls to warm, Splunk hashes the contents of the “l1Hashes” file to the “l2Hash” file. There are splunk CLI commands to check the integrity of an entire index, or a single bucket.
This feature is disabled by default, and can be changed via a setting in indexes.conf on a per index basis. The hash is only computed automatically for NEW DATA, although there does seem to be a Splunk CLI command to manually generate the hash for older data.
Upgrading: Supported Features, Functionality and Recommendations
Splunk supports upgrading Splunk Enterprise to 6.3 from 6.0 or later. If you are still on Splunk 5, you will need to upgrade to Splunk 6 first. Universal Forwarders can be upgraded from Splunk 5 directly to 6.3.
Some feature issues you will want to be aware of before upgrading, include:
- Index clusters cannot do an online/rolling upgrade. You will have to bring down the entire indexer cluster.
- The Deployment Monitor app is not supported in Splunk 6.3. This app’s functionality has been replaced by the Distributed Management Console. We will be covering the DMC in a separate blog post.
- DB Connect (dbx) 2.0.4 or lower has some UI issues when upgrading. Make sure to upgrade this app to 2.0.5 or higher before upgrading Splunk to 6.3.
- All accelerated Data Models will rebuild after the upgrade to Splunk 6.3.
Hurricane Labs always recommends waiting until the first patch release (6.3.1) before upgrading your production environment.
Notes: Additional Improvements, Along with a Few that are Not As Impressive
Another big improvement to take note of is the “Indexer Discovery” for forwarders. The forwarders can ask the cluster master which indexers to are available to take data. The cluster master can optionally weight indexer selection that it sends to the forwarders based on disk capacity. This is based on total disk size, not just the available space.
There are improved map visualizations in Splunk 6.3 . The old maps could only show points, or pie graphs, on specific points on a map. New maps include choropleth maps with Colored Regions. For example: US States can be shaded in proportion to the value of some field.
SEARCH COMMAND HISTORY VIEWING AND INTERACTION
Splunk 6.3 allows viewing and interaction with search command history. It will show the search history as a table and allows adding search history items to the search bar. This enables you to combine previous searches into one search.
NEW ANOMALYDETECTION COMMAND
A new search command provides additional statistical anomaly detection using a histogram-based approach. The “anomalydetection” command can also work similar to the existing “anomalousvalue” or “outlier” commands by providing a zscore or iqr approach (respectively) to anomaly detection.
DATA MODEL ACCELERATION
Data Model Acceleration has been optimized to provide for more efficient searching. It is important to note that these optimizations are incompatible with the previously generated summaries. So, all accelerated Data Models will rebuild after the upgrade to Splunk 6.3. There are ways to prevent this, but you won’t get the benefits from the increased performance until the data models rebuild.
Data Model Acceleration will run now two searches at the same time, instead of one, so this should help them complete faster, but might cause increased CPU usage after the upgrade.
DISTRIBUTED MANAGEMENT CONSOLE
There are many improvements to the Distributed Management Console (DMC) including new views and alerting. A separate blog post will cover DMC in more detail.
HTTP EVENT COLLECTOR
The HTTP event collector is a new feature that allows an app to send data in special JSON format directly into splunk via HTTP/HTTPS. This feature was built for app developers. A separate blog post will cover this in more detail.
SAML SINGLE SIGN-ON: NOT AS IMPRESSIVE AS IT SEEMS
SAML Single Sign-On was a feature that was potentially exciting. Unfortunately, it only supports PingFederate PingID, which makes this feature a little disappointing.
Splunk 6.3 has quite a few enhanced performance areas that we are very excited to take advantage of, particularly when it comes to the saved search schedule improvements, custom alert actions, and many others! Hurricane Labs already believes in Splunk’s capabilities when it comes to strengthening enterprises’ security infrastructures, along with producing major ROI value. But, we’re certainly not disappointed when we see Splunk’s successes in becoming even more robust, agile and powerful than it was before.