Logging Your Python Apps: Make better use of your logs with JSON

Ever been stuck spending too much time sifting through your logs in order to find the specific messages you care about? This blog post goes over over a simple strategy that can be employed to assure that you don’t have to deal with that again, at least with the applications you write.

Ever been stuck spending too much time sifting through your logs in order to find the specific messages you care about? (I’m confident we all have at some point). Right now, we're going to go over a simple strategy that can be employed to assure that you don’t have to deal with that again, at least with the applications you write. What I’m referring to here is machine parseable logging. Logging in a format that's easy for programs to read, has the ability to be loaded into a database simply, or will import smoothly into a log management tool, opens the door to a whole new world of ways to query and filter your log data.

Our favorite way to log in a machine parseable format is to log in JSON. This allows a lot of flexibility, and also fits the structure of a Python log record well. The library we wrote, and are using to handle the conversion of log record's to JSON, is called Driftwood. We decided, after looking into another alternative, that we wanted something extensible, with clean source code and thorough tests. Driftwood is a collection of Python logging extensions, meaning its utilities can be subclassed and customized - it also has plenty of goodies beyond just logging in JSON. Driftwood is available on GitHubPyPi, and ReadTheDocs.

Here’s a piece of example code, using the JSONFormatter from Driftwood to output log records in JSON:

import logging

from driftwood.formatters import JSONFormatter

log = logging.getLogger("test")
log.setLevel(logging.INFO)
stream_handler = logging.StreamHandler()

formatter = JSONFormatter(extra_attrs = ["loop_num"])

stream_handler.setFormatter(formatter)
log.addHandler(stream_handler)

for loop_num in range(0, 5):
   log.info("doing work", extra={"loop_num":loop_num})

You can see that the only requirements to utilize the JSONFormatter are importing it and assigning it to your handler. You might have also noticed the "extra_attrs" keyword we are passing to the constructor of the JSONFormatter. The JSONFormatter, by default, looks for all of the built-in attributes of each log record and also makes them keys in the JSON objects it outputs. extra_attrs tells it to look for extra attributes on the log record (ones created by passing the extra keyword to your logger calls), and includes those in the JSON objects as well.

To test the logging functionality, we are going to loop from 0 through 4, logging a message each time. We are going to include the iteration number as a part of the extra dict passed to the call to the logger. If we run this code, we can see that it outputs the expected JSON: one object for each call to the logger, including the extra loop_num key.

$ python log_in_json.py
{"relativeCreated": 54.77547645568848, "levelno": 20, "levelname": "INFO", 
"threadName": "MainThread", "funcName": "<module>", "message": "doing work", 
"process": 13503, "msecs": 698.8754272460938, "created": 1426698137.6988754, 
"filename": "log_in_json.py", "pathname": "log_in_json.py", "name": "test", "lineno": 16,
"thread": 140258547439424, "module": "log_in_json", "loop_num": 0}
{"relativeCreated": 55.3743839263916, "levelno": 20, "levelname": "INFO", 
"threadName": "MainThread", "funcName": "<module>", "message": "doing work", 
"process": 13503, "msecs": 699.4743347167969, "created": 1426698137.6994743, 
"filename": "log_in_json.py", "pathname": "log_in_json.py", "name": "test", "lineno": 16, 
"thread": 140258547439424, "module": "log_in_json", "loop_num": 1}
{"relativeCreated": 55.626630783081055, "levelno": 20, "levelname": "INFO", 
"threadName": "MainThread", "funcName": "<module>", "message": "doing work", 
"process": 13503, "msecs": 699.7265815734863, "created": 1426698137.6997266, 
"filename": "log_in_json.py", "pathname": "log_in_json.py", "name": "test", "lineno": 16, 
"thread": 140258547439424, "module": "log_in_json", "loop_num": 2}
{"relativeCreated": 55.856943130493164, "levelno": 20, "levelname": "INFO", 
"threadName": "MainThread", "funcName": "<module>", "message": "doing work", 
"process": 13503, "msecs": 699.9568939208984, "created": 1426698137.699957, 
"filename": "log_in_json.py", "pathname": "log_in_json.py", "name": "test", "lineno": 16, 
"thread": 140258547439424, "module": "log_in_json", "loop_num": 3}
{"relativeCreated": 56.085824966430664, "levelno": 20, "levelname": "INFO", 
"threadName": "MainThread", "funcName": "<module>", "message": "doing work", 
"process": 13503, "msecs": 700.1857757568359, "created": 1426698137.7001858, 
"filename": "log_in_json.py", "pathname": "log_in_json.py", "name": "test", "lineno": 16, 
"thread": 140258547439424, "module": "log_in_json", "loop_num": 4}

It's ugly, but that's not the point. The important thing is how easily we were able to log all of this information in JSON, which can now be queried and filtered using analytics tools.

So, how do we go about querying and filtering this data? Although this talk won't go very far down that road, I do want to give you a brief example. At Hurricane, our favorite method is to log directly into MongoDB (as opposed to sending the JSON to a file), which is another feature driftwood supports. For this example, however, I'm just going to show how you could write a quick script to load up some JSON log records and make use of our ability to programmatically filter this data. Take a look at the following code sample:

import json

records = []
with open("test.json", "r") as log_file:
  for raw_log_record in log_file:
     records.append(json.loads(raw_log_record))

print(records[0].keys())
print("nn")

for record in records:
  print("{0}, {1}".format(record["message"], record["loop_num"]))

Here we are going to open a file called test.json, which presumably contains some JSON log records. Then we will read each line from the file, decode it into a Python dictionary, and append it to a list. We will print the keys from one of the records, just so we can see which keys are available (it wasn't very clear in the output from the first example). Then we will loop over each record, printing out a couple pieces of information - namely the logged message and the iteration that the message was logged during (our loop_num from the first example).

$ python log_in_json.py > test.json 2>&1
$ python read_json.py
dict_keys(['module', 'funcName', 'lineno', 'pathname', 'name', 'loop_num', 'levelno', 'created', 'levelname', 'relativeCreated', 'message', 'process', 'filename', 'thread', 'msecs', 'threadName'])




doing work, 0
doing work, 1
doing work, 2
doing work, 3
doing work, 4

This shows how easy it is to load those records up and do something useful with them. However, it's missing a critical point: We now have the ability to only output log records we care to output. Let's modify the example a little to only output records which occurred after loop number 2 (the third loop, starting from 0):

import json

records = []
with open("test.json", "r") as log_file:
  for raw_log_record in log_file:
     records.append(json.loads(raw_log_record))

for record in records:
  if int(record["loop_num"]) > 2:
     print("{0}, {1}".format(record["message"], record["loop_num"]))

Notice the change on the second to last line. Here is the output now:

$ python read_json.py
doing work, 3
doing work, 4

As you can see, we were able to easily and programmatically filter our log messages. This is not a very realistic example, so let me explain where something like this might be more useful. At Hurricane, we have an application we wrote which performs backups. Each backup gets its own unique ID. We include that ID in the extra keyword when making calls to our logger. This means if a backup fails, we can easily pull the logs that relate just to that backup, without having to sift through everything else that was going on at that time. THAT is the power of logging in a machine parseable format.



Close off Canvas Menu