How to Build an SSH Honeypot with Splunk

Put a system on the Internet and the Internet will try to log into it for you. This tutorial shows you how to build an SSH honeypot, and capture and analyze the data (including usernames, passwords, and IP addresses) in a Splunk installation.


  • Tom Kopchak
  • Feb 28, 2019
  • Tested on Splunk Version: 7.2

Introduction

Whenever I’m teaching I prefer to use real examples when possible as opposed to contrived ones. This is why in my online training class, Getting to Know Splunk: The Hands-On Administration Guide, there are no sample log files that you are asked to work with. Instead - you capture the actual log files from your training system and work with them. I believe this is a much more valuable teaching tool, since it drives home the point that attackers are constantly trying to break into systems on the Internet.

I was recently teaching some of our new SOC analysts the basics of Splunk (every new Hurricane Labs employee goes through the same training course - they just get the in-person variety), and was really impressed with their analysis of the events logged by their AWS instances.

This got me thinking - we’re already basically creating an SSH honeypot in the lab. Why don’t we add a few more features to make this even more interesting? Which has lead me to this tutorial.

Let’s start by getting some more data, and then we’ll do something with it.

Capturing SSH passwords

It’s one thing to know what users are logging in, and that they’re not successful in doing so, but what if you could also find out what passwords they’re trying?

This was accomplished using the process outlined in this Hacker Noon article: "How I’ve captured all passwords trying to ssh into my server!"

(It should go without saying, but please don’t run this on a system that you care about).

I needed to add a few extra steps from my lab instance to account for tools that weren’t already installed, such as:

apt-get install gcc
apt-get install make
apt-get install libz-dev
apt-get install libssl-dev

I also just ended up moving the system’s existing sshd binary and symlinking the new one to the original location (this is a lab system, after all):

mv /usr/sbin/sshd /usr/sbin/sshd.old
ln -s /opt/openssh2/dist/sbin/sshd /usr/sbin/sshd

If you’re not using a Ubuntu install, these likely will be a bit different, so your mileage may vary.

Using the data in Splunk

At this point, if you’ve done everything correctly, you’ll see events in /var/log/auth.log that look like this:

Feb  8 15:35:30 ip-10-15-0-42 sshd[23416]: Honey: Username: root Password: jby
Feb  8 15:35:30 ip-10-15-0-42 sshd[23416]: Failed password for root from 218.92.1.163 port 8664 ssh2
Feb  8 15:35:31 ip-10-15-0-42 sshd[23416]: Honey: Username: root Password: jbz
Feb  8 15:35:31 ip-10-15-0-42 sshd[23416]: Failed password for root from 218.92.1.163 port 8664 ssh2

And likewise, these results will also be displayed in Splunk when searching the linux_secure sourcetype:

When looking at the events, you’ll notice that you don’t have field extractions for the Honey events. This makes sense, because this is a non-standard linux_secure log and Splunk doesn’t know what to do with it.

Fortunately, Splunk has a simple method called the Interactive Field Extractor (IFX) which can help with this scenario. Here’s a quick rundown of how to use it:

To recap, begin by selecting Event Actions -> Extract Fields:

Choose “Regular Expression” as the method for this type of data.

Select fields and assign them names. In this example, we’re also setting the text “Honey” as a required word to make the extraction specific to this event.

Validate that the extractions are working as expected. NOTE: The regex generated by IFX is generally far from optimal. This method is great for prototyping fields, but you can often create a much more efficient regex manually.

Save the extraction. Bonus: if you’re a Splunk admin, make this extraction available to all apps for everyone to use, so that anyone can benefit from your work.

Now run another search, and note that you now have values for both the username and password for these Honey events.

Analyzing the data

Now that we’ve captured this data, there’s a bunch of cool analysis that we can do with it. While a lot of this will be review (such as 123456 is a terrible password), it does a nice job reinforcing these concepts.

First and foremost, you’ll notice that root by far is the most popular choice for username:

Search: sourcetype=linux_secure | top user limit=10

This in itself isn’t all that interesting, since this basically is noise. What would be more interesting would be looking for users that match those in your organization or your account naming convention. The top and rare commands can be useful for this analysis.

sourcetype=linux_secure NOT user=root | top user limit=100

We can do something similar for passwords:

sourcetype=linux_secure | top password limit=100

This is an excellent way of knowing what passwords are being targeted in the wild. While you’d expect that most of these are on a password list, I was a bit surprised to see !@ being included as well. It’s pretty safe to say that if your password is based on any combination involving the QWERTY layout, it’s not a safe one to be using regardless of how secure you think it might look.

Additionally, with some Splunk magic, we also have the option to determine what IP addresses are attacking our system, and what username/password combinations they are trying.

Recall that these events are logged separately - the password logging line does not capture the source IP address:

This Splunk search glues these events together (without using a join or transaction command). The end result is a table with the source IP, username and password attempted (and some geolocation for fun).

sourcetype=linux_secure user=* | stats values(src_ip) as src_ip, values(user) as user, values(password) as password by pid | iplocation src_ip | table pid src_ip user password Country | fields - pid

Finally, let’s say we wanted to produce a table where each line corresponds to a source IP address, username, and password, for some threat intelligence data. This search does the trick with a mvexpand added:

sourcetype=linux_secure user=* | stats values(src_ip) as src_ip, values(user) as user, values(password) as password by pid | mvexpand password | iplocation src_ip | table pid src_ip user password Country | fields - pid

You can also dump this out to a CSV and end up with a dataset or lookup table to use later.

Conclusion

I think this is a pretty neat example of using Splunk to make sense of real-world attack data. I didn’t have to do anything special to create these examples other than leave a system on the Internet with SSH exposed and wait a bit. I encourage you to try the same thing at home, just not on a production system. 

Also, if you found this example intriguing and want to learn more about Splunk (including setting up the Splunk environment I used for this demo), check out my training course, where I’ll walk you through all the details. Happy Splunking!




Close off Canvas Menu