Sentiment Analysis with AWS & Splunk: Because all the cool kids are doing it

If you're like me and you enjoy AWS things, natural language processing, and Splunk, then this blog post is for you. This post was inspired by a real life situation that caused me to want to explore the sentiment of one of our customer's ratings in a ticket, as well as learn more about AWS Lambda and Splunk's HTTP Event Collector.


I have a few things in life I really enjoy and AWS is taking them away from me. I enjoy writing code to do sentiment analysis, but I HATE training the models. So, enter Amazon Comprehend, which is (one of) AWS’ many machine learning voodoo things. You toss some text at it, it groks the text, and spits out a score broken down by neutral, positive, or negative ratings.

Now, we wanted to do some basic sentiment analysis on the last message in a ticket. Why? Great question. We wanted to see if a customer was overly negative about the ticket and, frankly, Tom Kopchak told me to do it so I did it. I also wanted to learn more about AWS Lambda and Splunk’s HTTP Event Collector. This project combined all these things, so I was completely on board. This isn’t meant to be a tutorial of any one of these things, each vendor has their own (better) tutorials available and you’re welcome to check those out.

Here we go.

Our tickets exist in Zendesk and they have a pretty handy API, so we could get what we needed easily and into AWS’ S3 for processing later. Now I could’ve just streamed one to the other, but I will likely do some other things with this data that will require S3 so that’s why I chose that route.

S3 lets you define a lifecycle policy, so this data is pretty transient - it will pretty much die after it is created. Once it hits S3 we get an “object created” event that calls our AWS Lambda function, this is called, cleverly, a “trigger”. This tells our Lambda function what S3 bucket and “key” (really filename) to use for this call. The Lambda function will then call AWS’ Comprehend and pull back the results of the analysis (takes seconds) and then fling those results over to our Splunk HTTP Event Collector where I built a really fancy dashboard to show the overall sentiment analysis.

The code is available here, but let’s take a closer look at what’s going on.

After we do our imports, we load up our necessary Boto 3 clients. Boto 3 is one of the things that Lambda just provides, so you don’t have to do anything special you wouldn’t do in Python anyway.

s3 = boto3.client('s3')
comprehend = boto3.client('comprehend')

The next section parses out the data from our S3 trigger and populates our bucket and key variables.

def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))

Here we grab our AWS Lambda Environment Variables, which allow us to encrypt our sensitive configuration information. Here that’s just our Splunk server information and our HEC token.

    splServer = os.environ['splServer']
    splToken = os.environ['splToken']

This line grabs the S3 object for us and loads it into the response object so we can operate on it.

response = s3.get_object(Bucket=bucket, Key=key)

Once we have that object, we only need to grab the body of it, since we want to analyze the whole thing and not really parse it in any way.

#grab the ticket post from the S3 response
ticketBody = response['Body'].read()

This line, simply, sends the body of the ticket we grabbed above and loads the response into the sentiments dictionary. It’s a dictionary because Python says so, don’t argue.  

#run the ticket post through AWS Comprehend to get sentiment analysis
sentiments = comprehend.detect_sentiment(Text=ticketBody, LanguageCode='en')

Now because I didn’t do any parsing and my S3 “key” is just the ticket number, it was pretty trivial, without any parsing to add a key (the ticket number) to the sentiments dictionary so that I had something unique to identify the analysis with.

#add the ticket number to the dict so we have a unique key in splunk
sentiments['ticketNum'] = key

At this point all we need to do is hurl the complete analysis results over to our Splunk HTTP Event Collector, which lets us do some analysis and build fancy dashboards. Please note: I made “host” be “lambda-” and then the name of the function. This isn’t a requirement by any stretch, but it makes it easier to identify in Splunk that this is where the data came from. I could’ve also done this in Source, but I didn’t so whatever you’re more comfortable with is fine with me - I don’t judge.

#fling the data to Splunk HTTP Event Collector for further analysis
r = requests.post(splServer, headers={"Authorization": "Splunk " + splToken}, json={"host":"lambda-runSentimentAnalysis","event":sentiments})

The raw data in Splunk looks something like this, with ticket numbers changed to protect the innocent:

After I built an ugly dashboard, Ian (our Dashboard Overlord) made it better and less “Bill like” so it looks pretty good now:

Now keep in mind, this isn’t data we have to keep “fresh”, that is all automated and this is a fairly generic function so it could be used with any type of data, not just tickets. The hardest part for me was figuring out all the Lambda nomenclature, the code is very straightforward and simple. This allowed me to combine three of my favorite things: AWS anything, natural language processing (even though I didn’t have to actually write any of that), and last but never least Splunk. It also allowed me to learn some new stuff about my favorite things that I will be able to apply to other more broad projects. Now that’s what I call a worthwhile couple of hours of work.

Enjoy!