The Myth of the Three Member Search Head Cluster

If the captain of your three member search head cluster were to fail, have you ever wondered what would happen? Would the cluster come crumbling down, or would it recover? Find out in this blog about the Myth of the Three Member Search Head Cluster. 


Introduction

A three member search head cluster can tolerate a one member failure. There, I said it. Maybe in some future version of Splunk this functionality will change - but in Splunk 6.6.3, this is a fact. For some reason, this myth seems to persist throughout the Splunk community and so I wanted to clear it up once and for all. But not without backing up my claim with supporting evidence and Internet research, of course.

For those of you who aren’t familiar with search head clustering, the idea is that instead of having a single search head handle scheduling and executing of jobs, you can utilize multiple search heads that stay in-sync with each other. One of the members of your search head cluster will have the role of the search head cluster captain to coordinate the scheduling of jobs. This captain role is set dynamically via an election process. The way this technology is designed will get you three key benefits: horizontal scaling, high availability, and no single point of failure. This is where we first encounter some evidence that a three member search head cluster has the potential to survive one member going down, and it’s found at the very start of Splunk’s documentation on search head clustering.

If Splunk wasn’t able to tolerate a single member failure, then we’d have to assume that it didn’t provide one of the key benefits: “high availability”. As a result, instead of proving that Splunk can’t tolerate a single member failure, I’d like to take the approach of proving that Splunk can, in fact, tolerate a single member failure. So let’s look at the evidence.

The explanation I get for this myth about three member search head clusters not tolerating a single member failure, is generally something along the lines of:

“If the captain goes down, only 2 members remain. Both members will be greedy and constantly vote for themselves since there is no majority. As a result, a new captain can not be elected.”

There are two things that need to be broken down to understand why the above is a myth:

  1. The understanding of how many members are needed to elect a captain.

  2. The captain election process.

How many members are needed for election

If one search head cluster member was to go down in a 3 member cluster, 2 members would remain. That would mean that 66% of members remained and according to the docs, enough members exist for a new and successful election. Now, if 2 members went down, only 33% of members would remain, and yes - the failure would not be tolerated. My source for this statement is this excerpt from Splunk’s very own documentation on Captain Election:

"To become captain, a member needs to win a majority vote of all members. For example, in a seven-member cluster, election requires four votes. Similarly, a six-member cluster also requires four votes. The majority must be a majority of all members, not just of the members currently running. So, if four members of a seven-member cluster fail, the cluster cannot elect a new captain, because the remaining three members are fewer than the required majority of four."

Since captain election requires 51% of all members of the cluster, we can rule out point number 1 and know that in a 3 member search head cluster, 1 failure would be tolerated for this criterion.

The captain election process

In order to elect a new captain, Splunk has to go through an election process. In our three member example, we want to first make sure we have enough members for a successful election. So, with our 66% we should be fine.

In order to elect a new captain, Splunk considers a couple of factors for all running members. For starters, the cluster will want to elect a preferred member and one that is in-sync. We’re going to assume that all of our cluster members are synchronizing and preferred. If you have a cluster where members aren’t synchronizing, then please see a different blog because that is likely a whole host of other issues. If you don’t know what I’m talking about when I say “preferred captain”, then very likely this setting is default in your environment and all members are preferred. There is typically no need to change this setting.

preferred_captain = <bool>
* The cluster tries to assign captaincy to a member with preferred_captain=true.
* Note that it is not always possible to assign captaincy to a member with
  preferred_captain=true - for example, if none of the preferred members is
  reachable over the network. In that case, captaincy might remain on a
  member with preferred_captain=false.
* Defaults to true

(https://docs.splunk.com/Docume...)

In our specific case, according to the first criteria for captain election, each of our remaining members has the potential to become captain when one member fails because they are all preferred and all in sync. But there is a second criteria here, and one that will differentiate these two remaining members and allow one to become the almighty new captain in the case of a failure. That configuration setting is election_timeout_ms.

election_timeout_ms = <positive_integer>
* The amount of time that a member will wait before trying to become the
  captain.
* Note that modifying this value can alter the heartbeat period (See
  election_timeout_2_hb_ratio for further details)
* A very low value of election_timeout_ms can lead to unnecessary captain
  elections.
* The default is 60000ms, or 1 minute.

(https://docs.splunk.com/Docume...)

I’ve never seen an instance in my Splunk career where I’ve had to alter this setting, but it’s the crux of how a three member search head cluster will tolerate one member going down. search head clustering uses something called the Raft consensus algorithm. If you’re unfamiliar with this, and you’d like to know more, then I’d suggest checking out this really handy visualization on The Secret Lives of Raft. In short though, this election_timeout_ms is what is going to give preference to one of your two remaining members. In order to demonstrate this, I’ve given you an example of what a cluster would look like in three different states.

Breakdown of a Captain Failure

Before Failure

This is our perfect working cluster. Everything is synchronized and happy. You’ll notice that ccnprodshc03 is the captain. So to simulate an outage, I will shut off Splunk on ccnprodshc03.

During Failure with No Captain Elected

When ccnprodshc03 goes down, we are left with two remaining members. Each is preferred and synchronized so we need to hold an election to choose one of them as a new captain. What’s interesting about this scenario is that the next member up for election will be the member with the lowest election_timeout_ms value. In the screenshot below, you’ll notice that for our cluster, ccnprodshc02 has the lowest election_timeout_ms value when members all check back in. This means that it will be up for election. I was able to turn up DEBUG logging in Splunk to watch this taking place.

We can see that election take place below. You’ll see that ccnprodshc02 starts to request votes. It receives a 200 from ccnprodshc01 but a 502 responsive from ccnprodshc03 since that member is down. Since ccnprodshc02 was the first to go up for election, ccnprodshc01 is going to respond very politely with something along the lines of “that’s alright, you be captain this time, I’ll get it next time”.

During Failure with Captain Elected

Now that our election has taken place, we can see that a show shcluster-status gives us our output. ccnprodshc02 won the election, and thus proved that a three member search head cluster can handle a one node failure.

Remaining Questions on Splunk’s Raft Algorithm Implementation

The Raft consensus algorithm is an awesome piece of technology. One portion that may still be up for debate in some people’s minds is how the timer is instantiated for each member. Typically with the Raft algorithm this timer is randomized so that there isn’t ever a scenario where two boxes both try to go up for election at the same time. I would imagine Splunk was smart enough to implement this, especially since I’ve never seen a case where two Search Head Cluster members try to go up for election at the same time in all of my time installing Search Head Clusters. If anyone has any additional information on the randomization of the time for the election_timeout_ms setting please reach out. Suffice it to say, the odds of two boxes being up for election at the same exact time is obviously extremely rare, especially if that randomization is in place.

Conclusion

I’m hopeful that this clears up some confusion regarding three member search head clusters.

I’d encourage anyone who wants a little bit of a deeper understanding of search head clustering to dive right into the Raft algorithm and understand it, especially since it’s certainly not a Splunk specific piece of technology. If this sort of blog is too technical for you, than simply find solace in the fact that your three member search head cluster is more than likely pretty tolerant to one member failing. If anyone has any questions, comments, or concerns please feel free to reach out.