Splunk Searching with REST API

There are multiple ways to interact with Splunk in addition to the standard web interface. This tutorial will show you a simple use case for searching and returning results with Splunk's REST API and cURL.



As a way to justify essentially useless equipment around my house, I wanted to make a Raspberry Pi driven display board.

This display board would be simple enough to just present a number of Splunk dashboards on the display, while being able to avoid running a window environment, web browser, and all of the associated overhead on my relatively weak Pi Zero W. Therefore, I wanted a way to display all of the data with the console.

I was able to complete this task utilizing the documentation that Splunk has provided for searching via the REST API. I don't think they had a good proof of concept that showed a fully working use case; however, their documentation on all the available features is quite in-depth:

One of the things I wanted to display was the count of accepted and blocked connections through my firewall. This data is already indexed on my local Splunk instance so all I have to do is search for it. The local Splunk instance is running on IP address 192.168.0.70 with the default REST interface running HTTPS on TCP 8089.

We can accomplish my goal one of two ways. We can run the search on a schedule and then pull the results right away, or we can pull the results of a scheduled saved search.

I wanted to implement the gathering of results with a cron-scheduled bash script, so I decided to write the script with the scheduled search method. Once completed, I conducted tests by also running the searches via the REST API so I have documented that method as well.

Run an ad-hoc search with REST API

$ curl -s -k -u robert https://192.168.0.70:8089/services/search/jobs -d search=" search index="network" sourcetype="pfsense:filterlog" |timechart span=1h count by action" -d earliest_time="-4h@h" -d latest_time="-0h@h"
Enter host password for user 'robert':
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <sid>1546656881.136</sid>
</response>

The returned response contains a “SID” (search ID) which we need to utilize in our call to get the search results back.

Retrieve ad-hoc search status with REST API

We want to know if our job is done so we need to check on its “dispatchState”.

$ curl -s -k -u robert https://192.168.0.70:8089/services/search/jobs/1546656881.136 | grep dispatchState
Enter host password for user 'robert':
      <s:key name="dispatchState">DONE</s:key>

Retrieve ad-hoc search results with REST API

We can grab the results from the job and by default, our response will be XML.

$ curl -s -k -u robert https://192.168.0.70:8089/services/search/jobs/1546656881.136/results
Enter host password for user 'robert':
<?xml version='1.0' encoding='UTF-8'?>
<results preview='0'>
<meta>
<fieldOrder>
<field groupby_rank="0">_time</field>
<field data_source="count" splitby_field="action" splitby_value="allowed">allowed</field>
<field data_source="count" splitby_field="action" splitby_value="blocked">blocked</field>
<field>_span</field>
</fieldOrder>
</meta>
	<result offset='0'>
		<field k='_time'>
			<value><text>2019-01-04T17:00:00.000-05:00</text></value>
		</field>
		<field k='allowed'>
			<value><text>12029</text></value>
		</field>
		<field k='blocked'>
			<value><text>2181</text></value>
		</field>
		<field k='_span'>
			<value><text>3600</text></value>
		</field>
	</result>
	<result offset='1'>
		<field k='_time'>
			<value><text>2019-01-04T18:00:00.000-05:00</text></value>
		</field>
		<field k='allowed'>
			<value><text>9327</text></value>
		</field>
		<field k='blocked'>
			<value><text>1812</text></value>
		</field>
		<field k='_span'>
			<value><text>3600</text></value>
		</field>
	</result>
	<result offset='2'>
		<field k='_time'>
			<value><text>2019-01-04T19:00:00.000-05:00</text></value>
		</field>
		<field k='allowed'>
			<value><text>9217</text></value>
		</field>
		<field k='blocked'>
			<value><text>1992</text></value>
		</field>
		<field k='_span'>
			<value><text>3600</text></value>
		</field>
	</result>
	<result offset='3'>
		<field k='_time'>
			<value><text>2019-01-04T20:00:00.000-05:00</text></value>
		</field>
		<field k='allowed'>
			<value><text>7845</text></value>
		</field>
		<field k='blocked'>
			<value><text>1881</text></value>
		</field>
		<field k='_span'>
			<value><text>3600</text></value>
		</field>
	</result>
</results>

For my use case, it would be better to use csv. So we will need to convert the query to a GET request and specify the output method.

$ curl -s -k -u robert https://192.168.0.70:8089/services/search/jobs/1546656881.136/results --get -d output_mode=csv
Enter host password for user 'robert':
"_time",allowed,blocked,"_span"
"2019-01-04T17:00:00.000-05:00",12029,2181,3600
"2019-01-04T18:00:00.000-05:00",9327,1812,3600
"2019-01-04T19:00:00.000-05:00",9217,1992,3600
"2019-01-04T20:00:00.000-05:00",7845,1881,3600

Creating a Saved Search via the REST API

I do not want to have to run the search every single time I need to pull the results, so I will schedule a saved search that runs automatically.

$ curl -s -k -u robert https://192.168.0.70:8089/servicesNS/robert/search/saved/searches -d name=Recent_Network_Traffic -d search="index="network" sourcetype="pfsense:filterlog" |timechart span=1h count by action| fields - _span" -d dispatch.earliest_time="-4h@h" -d dispatch.latest_time="-0h@h" -d cron_schedule="0 */1 * * *" -d is_scheduled=1

A successful entry will return the parameters of the saved search in the response.

<?xml version="1.0" encoding="UTF-8"?>
...Abbreviated output...
  <entry>
    <title>Recent_Network_Traffic</title>
    <id>https://192.168.0.70:8089/servicesNS/robert/search/saved/searches/Recent_Network_Traffic</id>
    <updated>2019-01-04T23:37:14-05:00</updated>
    <link href="/servicesNS/robert/search/saved/searches/Recent_Network_Traffic" rel="alternate"/>
...Abbreviated output...

Retrieve Saved Search SID via the REST API

Again, we need to retrieve the SID to output the results. But this time, since we don’t run the job ad-hoc, we need to query the history of the saved search to retrieve the SID. Since the output is quite verbose, all we need is the newest SID. I am using regex to parse the output to find the correct, and last URL within the <id></id> brackets.

$ curl -s -k -u robert https://192.168.0.70:8089/services/saved/searches/Recent_Network_Traffic/history | grep -Po '(?<=<id>)(https:\/\/.*\/servicesNS\/.*\/jobs\/scheduler.*\d)(?<!<\/id>$)'| tail -1
Enter host password for user 'robert':
https://192.168.0.70:8089/servicesNS/nobody/search/search/jobs/scheduler__robert__search__RMD56fd3b97c2d5a938e_at_1546743600_46

Retrieve Saved Search Results via the REST API

Now I want the results in a flat csv file so I will request the results of the previously retrieved link with output_mode set to csv.

$ curl -s -k -u robert https://192.168.0.70:8089/servicesNS/nobody/search/search/jobs/scheduler__robert__search__RMD56fd3b97c2d5a938e_at_1546743600_46/results --get -d output_mode=csv
Enter host password for user 'robert':
"_time",allowed,blocked
"2019-01-05T18:00:00.000-0500",8239,2099
"2019-01-05T19:00:00.000-0500",14562,2646
"2019-01-05T20:00:00.000-0500",11820,1969
"2019-01-05T21:00:00.000-0500",17764,3382

Bash Script to Update a Local CSV with the Results

Finally, I combined the manual steps above into a simple script that can be executed by a scheduled cron job. I am using the lastpass-cli to load the credentials into the script so they are not hardcoded.

In this particular case, I have the credentials saved in a site called “testsplunk.” This outputs the CSV results to a file called “output” in the local directory. I would not recommend using this script in a production environment as there is no error checking or input parsing steps beyond what the Splunk REST API does automatically. There is also no mechanism in this script to maintain an active login to the lastpass-cli, and that would need to be accomplished outside of the script.  

#!/bin/bash
user=$(lpass show testsplunk --username)
password=$(lpass show testsplunk --password)
sid=$(curl -s -k -u "$user":"$password" https://192.168.0.70:8089/services/saved/searches/Recent_Network_Traffic/history | grep -Po '(?<=<id>)(https:\/\/.*\/servicesNS\/.*\/jobs\/scheduler.*\d)(?<!<\/id>$)'| tail -1)
output=$(curl -s -k -u "$user":"$password" "$sid"/results --get -d output_mode=csv)
echo "$output" > output

Conclusion

While nothing in this exercise was particularly challenging, I found it to be fun to interact with Splunk in a way I had not previously been tasked with. I found the REST API to be easy to work with and quite extensible. It makes for an excellent way to get results out of Splunk without relying on the Web GUI.




Close off Canvas Menu