Under the Hood: Examining Your Splunk Deployment Server

This tutorial will help you understand how Splunk’s configuration enforcement system (the deployment server) works, and how to avoid some common pitfalls when updating your apps.


  • Tom Kopchak
  • Nov 26, 2018
  • Tested on Splunk Version: 7.2

Introduction

Have you ever thought you knew how something worked, only to have your mind blown when you learned more? We recently ran into a scenario that forced us to re-examine our understanding of how the Splunk deployment server behaved with respect to local changes on a deployment client. This was a learning experience for me, so I wanted to share this knowledge in the hope that it can help someone else too.

Deployment Server Functionality

The Splunk deployment server is considered a configuration enforcement mechanism: that is, it exists to keep the configuration consistent between the deployment_apps directory on the deployment server and the target app directory on the deployment clients of a given server class.  

It was our original belief that this was a bidirectional relationship, whereby the client would compare the hash of its app to what was on the deployment server and, if there was a difference, automatically update the app to match the deployment server. In fact, there were some apps - such as the OPSEC LEA app - that (at least in older versions) would write data locally and were considered incompatible with a deployment server due to this mechanism.

This understanding, however, was not quite correct.  

What actually happens is that the checksum for an app is calculated once (at app installation) and maintained on the deployment client, in the file serverclass.xml ($SPLUNK_HOME/var/run/serverclass.xml). We’ll cover this file in more detail below, if you’re interested.   

Each app in serverclass.xml looks something like this:

<app name="all_deploymentclient" checksum="9052369574012309262" restartSplunkd="true" restartSplunkWeb="false" stateOnClient="enabled" localArchive="/opt/splunk/var/run/all_splunk/all_deploymentclient-1491319340.bundle" installed="true"/>

Note that the checksum is stored in this file. This value is not recalculated after app installation, and will not come into effect until a new version of the app in question is added to the deployment server (and the deployment server is reloaded).

Any local changes made to the app after it is deployed from the deployment server will not result in this stored checksum changing or the app being re-downloaded from the deployment server. The only exception to this is when the app is removed, it will re-download on next check-in.

Why Does This Matter?

The fact that checksums are only checked for changes upon a modification to the app on the deployment server can lead to configuration issues down the road. Consider this scenario:

  • You deploy an app from the deployment server to a forwarder.
  • Another administrator fails to recognize this app is managed by the deployment server, and makes a local configuration change to an inputs.conf file.
  • This configuration will work for a very long time without any issues, until months down the road…
  • The app in question is updated (which changes several configuration files) on the deployment server, then the admin reloads the serverclass.
  • The next time the deployment client checks in, everything that was working stops working. This is because the entire app on the deployment client is replaced and the local configuration is lost.

This example illustrates the importance of tracking changes appropriately, and ensuring that any apps managed by the deployment server are not modified locally.

What If I Don’t Want It to Work This Way?

There’s a config file for that! Splunk allows for specific configurations to excluded from deployment server management. This is covered in Splunk Docs here: https://docs.splunk.com/Documentation/Splunk/7.2.1/Updating/Excludecontent

How It Works Under the Hood

For those of you who like technical details and explanations, let’s dig into the config files a little bit deeper:

A Look at serverclass.xml

As mentioned, the file that controls all of this behavior is $SPLUNK_HOME/var/run/serverclass.xml. This file is included in a Splunk diag, and also can be accessed locally on any system running Splunk or a Splunk Universal Forwarder.  

Digging into this file can provide a ton of information about what our deployment client is (supposed to be) doing. Let’s explore a somewhat simplified example:

~# cat SPLUNK_HOME/var/run/serverclass.xml
<?xml version="1.0" encoding="UTF-8"?>
<deployResponse restartSplunkd="false" restartSplunkWeb="false" stateOnClient="enabled" issueReload="false" repositoryLocation="$SPLUNK_HOME/etc/apps" endpoint="$deploymentServerUri$/services/streams/deployment?name=$tenantName$:$serverClassName$:$appName$">
  <serverClass name="all_HeavyForwarders">
    <app name="if_syslog_inputs" checksum="5279169331485055964" restartSplunkd="true" restartSplunkWeb="false" stateOnClient="enabled" localArchive="/opt/splunk/var/run/all_HeavyForwarders/if_syslog_inputs-1491319339.bundle" installed="true"/>
    <app name="infra_outputs" checksum="12306752662921759682" restartSplunkd="true" restartSplunkWeb="false" stateOnClient="enabled" localArchive="/opt/splunk/var/run/all_HeavyForwarders/infra_outputs-1491319339.bundle" installed="true"/>
  </serverClass>
  <serverClass name="all_SplunkInfrastructure">
    <app name="infra_license" checksum="12359899191485497807" restartSplunkd="true" restartSplunkWeb="false" stateOnClient="enabled" localArchive="/opt/splunk/var/run/all_SplunkInfrastructure/infra_license-1491319339.bundle" installed="true"/>
    <app name="infra_authentication" checksum="14278720479217993410" restartSplunkd="true" restartSplunkWeb="false" stateOnClient="enabled" localArchive="/opt/splunk/var/run/all_SplunkInfrastructure/infra_authentication-1491319339.bundle" installed="true"/>
  </serverClass>
  <serverClass name="all_linux_servers">
    <app name="baseline_linux_inputs" checksum="227089178054871350" restartSplunkd="true" restartSplunkWeb="false" stateOnClient="enabled" localArchive="/opt/splunk/var/run/all_linux_servers/baseline_linux_inputs-1533928043.bundle" installed="true"/>
  </serverClass>
  <serverClass name="all_splunk">
    <app name="all_deploymentclient" checksum="9052369574012309262" restartSplunkd="true" restartSplunkWeb="false" stateOnClient="enabled" localArchive="/opt/splunk/var/run/all_splunk/all_deploymentclient-1491319340.bundle" installed="true"/>
  </serverClass>
</deployResponse>

Each serverClass stanza of this XML file represents a server class in the serverclass.conf configuration on the deployment server. We can see that this example system is a member of the following server classes:

  • all_HeavyForwarders
  • all_SplunkInfrastructure
  • all_linux_servers
  • all_splunk

Within each serverClass stanza, we see the associated apps. In this example, this works out to the following structure:

  • ServerClass: all_HeavyForwarders
    • Includes app: if_syslog_inputs
    • Includes app: infra_outputs
  • ServerClass: all_SplunkInfrastructure
    • Includes app: infra_license
    • Includes app: infra_authentication
  • ServerClass: all_linux_servers
    • Includes app: baseline_linux_inputs
  • ServerClass: all_splunk
    • Includes app: all_splunk

As you can see, this is a really straightforward way to see what apps on a given deployment client are managed by the deployment server and what configuration parameters are applied.

A Word on Checksums

Based on our testing, the checksum calculated on the deployment server is highly dependent on the timestamps within a given app. This means that something as innocuous as touching a file (that is, updating the timestamp) will lead to this checksum changing. Be aware of this when working with deployment apps, as it could lead to unintended Splunk restarts depending on how your server classes and apps are configured (only if you reload the server class or restart the deployment server though)!

For those of you who like proof, here you go (thanks to Brian Glenn for doing the testing). Note that simply changing the timestamp on app.conf is enough to cause a new bundle to be generated:

splunk@splunk:/opt/splunk/etc/system/local$ /opt/splunk/bin/splunk reload deploy-server
Splunk username: admin
Password:
Reloading serverclass(es).
splunk@splunk:/opt/splunk/etc/system/local$ cd /opt/splunk/var/run/tmp/test/
splunk@splunk:/opt/splunk/var/run/tmp/test$ ls -la
total 24
drwx------ 2 splunk splunk  4096 Nov 20 20:21 .
drwx------ 3 splunk splunk  4096 Nov 20 20:21 ..
-rw------- 1 splunk splunk 10240 Nov 20 20:21 test_app-1542745315.bundle
-rw------- 1 splunk splunk   412 Nov 20 20:21 test_app-1542745315.bundle.gz
splunk@splunk:/opt/splunk/var/run/tmp/test$ md5sum ./test_app-1542745315.bundle
f8cb981541713b71d8fbd3084e2b4a3f  ./test_app-1542745315.bundle
splunk@splunk:/opt/splunk/var/run/tmp/test$ touch /opt/splunk/etc/deployment-apps/test_app/local/app.conf
splunk@splunk:/opt/splunk/var/run/tmp/test$ /opt/splunk/bin/splunk reload deploy-server
Reloading serverclass(es).
splunk@splunk:/opt/splunk/var/run/tmp/test$ ls -la
total 24
drwx------ 2 splunk splunk  4096 Nov 20 20:22 .
drwx------ 3 splunk splunk  4096 Nov 20 20:21 ..
-rw------- 1 splunk splunk 10240 Nov 20 20:22 test_app-1542745373.bundle
-rw------- 1 splunk splunk   414 Nov 20 20:22 test_app-1542745373.bundle.gz
splunk@splunk:/opt/splunk/var/run/tmp/test$ md5sum ./test_app-1542745373.bundle
3899c2f7bc42d9df20fef457d27bf88a  ./test_app-1542745373.bundle
splunk@splunk:/opt/splunk/var/run/tmp/test$

If this functionality is not desired, it can be manipulated with the crossServerChecksum option in serverclass.conf.  Setting this to true will result in the md5sum not changing if only the timestamp on a file in a deployment app is modified, as demonstrated below:

root@hdf-template:/opt/splunk/var/run/tmp/test# head -2 /opt/splunk/etc/system/local/serverclass.conf
[global]
crossServerChecksum = true
root@hdf-template:/opt/splunk/var/run/tmp/test# /opt/splunk/bin/splunk reload deploy-server
Reloading serverclass(es).
root@hdf-template:/opt/splunk/var/run/tmp/test# ls -la
total 16
drwx------ 2 root root    77 Dec  3 10:54 .
drwx------ 3 root root    18 Dec  3 10:47 ..
-rw------- 1 root root 10240 Dec  3 10:54 test_app-1543852487.bundle
-rw------- 1 root root   403 Dec  3 10:54 test_app-1543852487.bundle.gz
root@hdf-template:/opt/splunk/var/run/tmp/test# md5sum test_app-1543852487.bundle
22bd50b83c9892371675c494b356e6ae  test_app-1543852487.bundle
root@hdf-template:/opt/splunk/var/run/tmp/test# touch /opt/splunk/etc/deployment-apps/test_app/local/app.conf
root@hdf-template:/opt/splunk/var/run/tmp/test# /opt/splunk/bin/splunk reload deploy-server
Reloading serverclass(es).
root@hdf-template:/opt/splunk/var/run/tmp/test# ls -la
total 16
drwx------ 2 root root    77 Dec  3 10:55 .
drwx------ 3 root root    18 Dec  3 10:47 ..
-rw------- 1 root root 10240 Dec  3 10:55 test_app-1543852517.bundle
-rw------- 1 root root   403 Dec  3 10:55 test_app-1543852517.bundle.gz
root@hdf-template:/opt/splunk/var/run/tmp/test# md5sum test_app-1543852517.bundle
22bd50b83c9892371675c494b356e6ae  test_app-1543852517.bundle

While the bundle ID is regenerated (the .bundle file has a new name), the checksum remains the same.  This would result in the app not being redeployed due to a difference in checksum (because there isn’t one).

In Conclusion

Hopefully this write-up will save you some pain and suffering when making changes to your deployment apps (or at least help you understand the cause of your pain and suffering if you find out about this too late). If you found this tutorial helpful and there’s another aspect of Splunk configuration you would like to learn more about, let me know!




Close off Canvas Menu