The Coffee Report Part 2: Machine Learning, Alexa, and... More Coffee!

In Ryan’s previous blog post about Splunk and coffee, he was discussing data visualization and showing how a dashboard is used to help tell a story. In Part 2 of this blog series, Ryan will show you how this type of data can make for a great Machine Learning Toolkit example.

A few really neat things have happened in the world of Splunk, IoT, and Coffee since the last time I blogged on this topic.

  1. I built a Predictive Model about Coffee using Splunk’s all new Machine Learning Toolkit 3.0.0
  2. I now have a larger coffee sample size
  3. Integration with Amazon Alexa

Integration with Splunk’s Machine Learning Toolkit

One thing I wanted to do with this coffee data was to come up with an algorithm to classify coffee being brewed. Though this is a very simple use-case for MLTK, I think it starts to highlight an example of Machine Learning for someone who may not have done Machine Learning before. I’ll talk about this analysis using the SEMMA Method.

Sample

In this blog, I won’t discuss actually mining or sampling the data around my coffee intake. I did that in my previous blog. I will tell you though that having this data is obviously fundamental to what I’m about to explain. Without a data sample, there is no data analysis.

Explore

Exploring the data was fairly straight forward. Once I had the data coming in at a steady interval, I used the stats command to take a look at all of my data points in a time series using the Trellis layout in Splunk. Of course I made a cup of coffee during that timeframe and took a look at what all of my data looked like.

I then used slightly more advanced techniques to analyze my data. The assistants in MLTK are actually really nice. I basically wanted to see how my data was shaped. So was there a trend to these cups of coffee being made? I was able to do that by looking at how my data clustered together. I took a very basic search of my data over a given period of time (when I knew I had samples of coffee being made and not being made). As you can see from the results below, the data clustered quite nicely. Amps and Watts were noticeably higher for some subset of my data. As someone with a very brief electrical engineering background, this makes sense. But even if you don’t know much about electricity, you could still infer what is probably going on. The numbers are higher when the coffee is being produced.

If I wanted to see some raw values for those clusters, I could used the KMeans algorithm with the Splunk stats command to take a look at the average values for each cluster. This is where I find what is essentially a cup of coffee being brewed (Cluster 1) and a Keurig sitting idle (Cluster 0). This also tells me that these two values may be excellent predictors for a cup of coffee being brewed.

Modifying the data

Modifying the data was actually something I covered in the previous blog as well. Generally, you may need to perform some modifications to your data to get it in a format that you can analyze. This is one of the harder steps that you’ll encounter in Machine Learning. Splunk makes this very easy though, especially with things like CIM (Common Information Model). For me, the only things I did was first, to make sure my data was extracting properly. Which it is. I could easily place my data into a table format. Second, I did some rounding to 1 significant figure. This was just to make the model have to do less work. I just didn’t have the need for all of those significant figures based on my cluster analysis.

Model

Last but not least, I needed to model my data. For this model I chose a Logistic Regression. This is a classification algorithm which should be able to handle this data. The only trick left was that I actually had to “classify” my training data set. By looking at my clusters above (and rounding them to 1 significant figure), I saw that average Amps for Cluster 0 was 0.0. Average Amps for Cluster 1 was 10.3. As a result I added one more modification to my search for training it. I basically used the eval command to say, if Amps > 0.025, call that coffee being brewed. I realize this may seem like a gross oversimplification, but for this specific data set, it works. This field “brewing” was my predictor. A “yes” in this column meant coffee was brewing. A “No” meant it was not.

So I ran my search, I selected my algorithm, I chose my field to predict, and I selected my predictors. Now it was time to assess. Did I build a simple model for coffee?

Assess

According to my results, yes. I can see that this model actually has 100% precision for my sample of the last 7 days.

The nice part about the Machine Learning Toolkit is that I can schedule this training of the model. So weekly I will be ensuring this model is staying precise. Though it looks like 100% precision now, data changes. As always, nothing in IT should follow a “set it and forget” approach.

I realize this doesn’t answer the question people are probably most excited about. No, I cannot predict “When is coffee going to be made?” yet. And I say yet because it’s not out of the picture necessarily. It’s just that I realized something in this analysis. My sample size is just far too small.

Larger Sample Size

I make a good amount of Coffee when I am home, but for a remote worker, the amount I am on the road is kind of surprising. There are at least 2 days a week, for example, that I am at the University of Connecticut working from my office there. I also was in Madagascar for a big part of October/November. If I wanted to really start doing some even more awesome Machine Learning stuff, I would need more data. That’s when I started to think “Gee, why don’t I try and measure UConn coffee too?”

I certainly was met with some challenges in capturing UConn coffee data. Much like everywhere else, IoT isn’t exactly welcomed with open arms. When first setting this up, some days when I got to the office I would find my glorious TP-Link HS110 unplugged and sitting next to the Keurig. I later found out it was because someone feared it would “fail closed” at some point and thus prevent coffee from being made. Eventually, I was successful. I employed the same technique I used in my previous blog to collect data and as of this week - we’ve measured coffee being made by the shared Keurig in the OPIM department for a total of 6 days straight.

We’ve found some interesting insights so far. For starters, this department drinks a lot of coffee. On average, that Keurig is pumping out 20 cups a day and like other machine data, there are lulls on the weekend.

What is surprising, is the times that there is coffee being made that were unexpected. You’ll see some peaks and some valleys In the linechart below. Specifically you’ll see me highlighting a coffee being made around 4:00AM with some others being made around 12:00 and 1:00AM. I call that area of the chart “The Valley of Grad School”. The reason being, I came in early on Thursday to get some work done and found a PhD student passed out in our innovation lab on a stack of homework. I can only assume these peaks are the last ditch effort to stay awake and finish an important assignment for our grad students.

Integration with Amazon Alexa

Lastly, wouldn’t it be cool if you could ask Splunk to tell you how many cups of coffee were made today? We thought so. It’s actually already given us some insights into the OPIM department at UConn. When I came in at 7:00AM and asked Alexa “Alexa, ask Splunk how many cups of coffee were made today?” and she told me “There were 3 cups of coffee made today”, I was shocked. I had to make sure Alexa was working properly so I gave a test similar to the one in the video below.

If you haven't heard yet, Amazon has a new type of offering called Alexa for Business. Damien Dallimore has been gracious enough to update his documentation for the Alexa App to discuss how Splunk can integrate with this offering. That’s extremely exciting for a lot of reasons. Primarily, it brings IoT into a practical scenario. Some of the biggest backlash IoT receives is around how practical or impractical it may be. With Alexa being integrated into businesses, it not only will make doing business easier, but it will also highlight an extremely great use case for IoT.

Now that I’ve plugged that recent announcement, I’m here to tell you that is not the integration that was done here - though it’s probably extremely similar. The integration performed as part of this blog was with the “Talk to Splunk with Amazon Alexa” App.

I’ll skip the details of setting up the Talk to Splunk with Amazon Alexa app, but I will provide the documentation for it here. In all honesty, it was extremely easy to follow. This app was implemented by one of my former students, Tyler Lauretti. Tyler is just starting out with using Splunk, but the instructions were so clear that he has been able to fly through setting it up with only a couple minor questions for me. Kudos to Tyler! I also have to give a shout out to the leader of the Hartford CT Splunk User Group, Ant Lefebvre, for an awesome presentation on Alexa at the November User Group meeting.

So that is the latest news on Splunk, IoT, and Coffee. Hopefully through caffeine, everyone can learn something about Machine Learning and IoT. As always, if you have any questions, comments, concerns please feel free to reach out!