Exploiting Insufficient Entropy in ExpressionEngine CVE-2017-0897

 FoxGlove Security released a detailed blog post about an ExpressionEngine vulnerability. This inspired me to write my own exploit, which resulted in the discovery of another vulnerability.

Introduction

Back in February, FoxGlove Security released an amazing blog post detailing the exploitation of an ExpressionEngine vulnerability. The blog was a wild ride from start to finish - type juggling, arbitrary object serialization, autoloading and SQL injection... It sounds like a course syllabus on PHP exploitation. So, I decided to attend the class and write my own exploit. As a result, I found another vulnerability: CVE-2017-0897.

Note: Despite being a subtly difficult fix, ExpressionEngine reacted quickly to fix this problem.

The Vulnerability

FoxGlove Security was able to get remote code execution by bypassing a signature check. You can read their blog for the details. What we are concerned with is the "signing $secret" (as I will refer to it). FoxGlove tricked the signing check with type juggling, but what if we could guess the $secret? If we could, then we can sign the same serialized data FoxGlove used and get remote code execution. This is how the $secret is created at install time.

	# notice: almost all the code displayed will be simplified for clarity
	$secret = sha1(uniqid(mt_rand(), TRUE)); 

PHP docs says uniqid and mt_rand are not to be used for cryptography. So, we have a vulnerability, but how bad is it? Does it take a crypto/math genius and the computing power a government to get remote code execution? Or, can a college kid with a gaming rig break it over the weekend?

I think the best way to answer this is to build a Proof of Concept (POC) attack. Sadly, the POC I created does not result in a clear answer. However, it does imply that, under some circumstances, this vulnerability is exploitable by a determined adversary with reasonable computational power.

Attack Overview and Some Background

Upon a failed login we are given a cookie created by:

	$sig    = md5($serialized_data . $secret);
	$cookie = $serialized_data . $sig;

We are tasked with discovering the value of $secret. We know $secret was created at install time by:

	$secret = sha1(uniqid(mt_rand(), TRUE)); 

We thus have an offline brute force attack as follows. We make a $guess as to what $secret is, then check it via:

	$sig_guess = md5($serialized_data . $guess);
	if ( $sig === $sig_guess ){
		echo "We found the secret " . $guess;
	}

$secret is a 160 bit SHA1 hash, attempting all possible SHA1 hashes is infeasible. Instead we will try to enumerate likely output of:

	uniqid(mt_rand(), TRUE); 

We can reduce this further to attacking the possible state of each pseudorandom number generator (PRNG) used here. If we can guess the state of the PRNGs at install time, then $secret will be implied. For my POC, I reproduced the uniqid, mt_rand and php_combined_lcg functions in C.

Note: Strictly speaking, uniqid is not a PRNG. However, I may lop it in there with the other PRNGs to avoid cumbersome speech.

Breaking Down Each PRNG

For the sake of this attack we need only know that PRNGs are black boxes that produce seemingly random numbers. Those numbers are not actually picked out of the computers dreams, but rather are directly determined according to the PRNGs current state. The state is implied by the number of times the PRNG was called and the seed it was initialized with. If we know the state we know the output. The seed is what we will be brute forcing. We will see the number of possible seeds is small and correctly guessing the seed for mt_rand almost directly implies all the other information we need.

To reduce the problem of knowing how many times the PRNG was called we must make an assumption here. We assume ExpressionEngine was installed in less then 5 requests to admin.php. On a normal install, if a sysadmin spun up a LAMP server, downloads the ExpressionEngine source, and followed the install instructions. Then that Admin can make 3 mistakes and still satisfy this assumption. Apache seems to set the default number of forks to 5. This means there are 5 independent instances of PHP running with independent PRNG states. When an admin visits the install page, $secret is automatically made but not saved. This expends one of the 5 default forks because it's state was initialized now. Is this a safe assumption? Not really… It took me more than 5 attempts to install because I kept messing up little silly things. If someone has practice making installation or if the Apache server is configured to run with more than 5 forks, then chances are improved that this assumption will be satisfied.

Without this assumption things get more complicated. We would not only have to guess the number of calls to the PRNG, but also a delta in the time between calls due to uniqid. My implementation of mt_rand will have to change to return multiple calls, which increases its memory use. These obstacles could potentially be overcome with more computational power (GPU?) or finding a nice information leak. However, I make no attempt to do so in my POC.

Now we should have a quick walk through of each PRNG at play so we can duplicate them, and the seeding of them, in our brute force.

uniqid

string uniqid ([ string $prefix = "" [, bool $more_entropy = false ] ] )

Uniqid is the simplest. The following source explains it best:

https://github.com/php/php-src...

Uniqid is just the $prefix appended with time in hex and micro time in hex. If the $more_entropy value is set to true then "extra entropy" is added. This scared me at first, "extra entropy" sounds like it might ruin our fun. What does it do? The $more_entropy flag just appends php_combined_lcg()*10 to the end of the value. We will get to php_combined_lcg soon.

Uniqid is not really a PRNG, it has no state, so it has no seeding algorithm.

Uniqid directly implies the $secret. So far if we can guess php_combined_lcg, utime and mt_rand (the value of $prefix) then we can get RCE.

mt_rand

int mt_rand ( void  )

The mt_rand function is a Mersenne Twister Random Number Generator. The internal state of which is a large array of integers. The mt_rand function is only used once though (due to our assumption), so we don't need to keep the state around. This lets us reduce the memory use of our attack without increasing CPU load. Here is my simplified implementation of it:

https://github.com/swoops/php_...

Seeding mt_rand is easy enough, time in seconds multiplied by pid and then xored with multiple of php_combined_lcg.

https://github.com/php/php-src/blob/c8aa6f3a9a3d2c114d0c5e0c9fdd0a465dbb54a5/ext/standard/mt_rand.c#L163-L172

https://github.com/php/php-src/blob/c8aa6f3a9a3d2c114d0c5e0c9fdd0a465dbb54a5/ext/standard/php_rand.h#L69-L73

So, to guess it’s state we need time in seconds, pid and php_combined_lcg. Unless we get really unlucky it is doubtful a second will pass between mt_rand and uniqid, so we don’t have to worry about a delta time value.

php_combined_lcg

php_combined_lcg

The php_combined_lcg function is not in the PHP docs because you can't call it from PHP. It is an internal PRNG that is mostly used to add "entropy" to other PRNGs. It is a linear congruential generator, the simplest of PRNGs.

It is seeded with time in seconds, microseconds, and pid. If you peek at the source you will see gettimeofday is called twice (here and here) and microtime is used in both cases. This means microseconds could of passed between those calls. The delta time between calls must be accounted for.

So to break php_combined_lcg we need time in microseconds, pid and delta time between gettimeofday.

Putting it Together

Each of the PRNGs used are merely based off of some assortment of microtime, time, and pid. There is a delta time in microseconds between calls to gettime in the php_combined_lcg ($delta_lcg) and another delta time in microseconds elapsed between the first php_combined_lcg call in mt_rand and microtime used by uniqid ($delta_mt). Can we narrow any of these values down?

I attempted to find some information leak that is saved at install time to narrow down this information. Sadly (for our attack), nothing spectacular appeared. The only decent thing I found was an upper bound on time via a blog post. By default after you install ExpressionEngine, you are instructed to remove some files in the web root, then instructed to log in for the first time. As soon as you log in a new blog is created. The blog is public and displays the time it was created. If an attacker can find that blog, either on the site or in waybackmachine, then they can narrow down the install time to a few minutes.

Let’s say the admin knows what he is doing - he installed in less than 5 requests and it took him less than 5 minutes to remove a few files and log in. Five minutes comes out to 3*10^8 µs. Let's test 30 possible pids. The $delta_mt value should be be something like 20µs ± 5. The $delta_lcg should be about 15µs ± 5. So we have to test:

3*10^8 * 30 * 10 * 10 = 9*10^11

A big number, but how fast can we do the mt_rand, php_combined_lcg, uniqid, sha1 to make the secret and another md5 and compare to test the $sig against the $cookie? Time to fire up the POC. In order to demonstrate the attack, from start to finish, without it taking all day, we will cheat a bit. Then use math to predict the real thing.

We use the instructions here to install ExpressionEngine. The installation request is generated from this page:

The request looks something like:

POST /admin.php?C=wizard&M=do_install&language=english HTTP/1.1
Host: 127.0.0.1:8000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://127.0.0.1:8000/admin.php?C=wizard&M=do_install&language=english
Cookie: exp_last_visit=1505495046; exp_last_activity=1506973380; csrftoken=K19MFBni88pzBvPQ20dKOoTX4xxi1JhI; sessionid=u1bh6ldfhsde3rd55n5ujn0k1hx3j4nk; exp_tracker=%7B%220%22%3A%22index%22%2C%22token%22%3A%227767d698d30d32f7c6f4f649ead145e9%22%7D; exp_csrf_token=eddb7ef89620068f6c30e9a83732c4f3d8756fcf
Connection: close
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
Content-Length: 205

db_hostname=localhost&db_name=t&db_username=root&db_password=root&db_prefix=exp&install_default_theme=y&email_address=test%40test.com&username=root&password=alkdjf092ul3kjaf09sjdlfj0293&license_agreement=y

Response looks like this… kinda…

HTTP/1.1 200 OK
Date: Mon, 02 Oct 2017 19:55:49 GMT
Server: Apache/2.4.10 (Debian)
Vary: Accept-Encoding
Content-Length: 2298
Connection: close
Content-Type: text/html; charset=UTF-8

<!--
== CHEATING ==
pid: 2736
microtime before: 0.56968900 1506974149
microtime after: 0.56972700 1506974149
-->
<!doctype html>
<html>
	<head>
		<title>ExpressionEngine Core Core 3.5.4 Installed</title>

Yeah, okay, so I cheated! I can’t speed up the brute force through the “magic of blog posts” without cheating some. I added a getmypid() and a couple microtime() calls so that I could tune my brute force some. Check out the “Improving $delta_lcg and $delta_mt” section for a theory on how to tune your brute force to a specific server in the wild. We really just want to see how fast the POC runs so we can use some math and see how long it would take us in the real world.

Sending a login request without a password causes this response:

HTTP/1.1 302 Found
Date: Tue, 03 Oct 2017 17:01:24 GMT
Server: Apache/2.4.10 (Debian)
Set-Cookie: exp_last_activity=1507050084; expires=Wed, 03-Oct-2018 17:01:24 GMT; Max-Age=31536000; path=/; httponly
Set-Cookie: exp_csrf_token=8c32fe920d884d95c882d97d4e766968a62c49b1; expires=Tue, 03-Oct-2017 19:01:24 GMT; Max-Age=7200; path=/; httponly
Set-Cookie: exp_flash=a%3A1%3A%7Bs%3A12%3A%22%3Anew%3Amessage%22%3Bs%3A30%3A%22The+username+field+is+required%22%3B%7D896e42e7b65ed35f8d4032c183ef90f8; expires=Wed, 04-Oct-2017 17:03:04 GMT; Max-Age=86499; path=/; httponly
Location: admin.php?/cp/login
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8

The exp_flash cookie is the signed serialized data. URL decoded it looks like:

a:1:{s:12:":new:message";s:30:"The username field is required";}896e42e7b65ed35f8d4032c183ef90f8

Let's brute force it.

> time ./exploit -o 'a:1:{s:12:":new:message";s:30:"The username field is required";}896e42e7b65ed35f8d4032c183ef90f8' -p 2736 -P 2736 -t 1506974149 -T 1506974149
[**] Starting brute force with: 
OBJ: a:1:{s:12:":new:message";s:30:"The username field is required";}896e42e7b65ed35f8d4032c183ef90f8
Hash: 896e42e7b65ed35f8d4032c183ef90f8
START_PID:        2736	END_PID:        2736
START_TIME: 1506974149	END_TIME: 1506974149
attacking pid: 2736	sec: 1506974149
 count:   30121779 usec: 569697 udelt: 21 delta: 14 


WE GOT IT!!!
$secret = d5f3040748d4baff1605cf85d0566cc651c93cd7

see for yourself

echo -n 'a:1:{s:12:":new:message";s:30:"The username field is required";}d5f3040748d4baff1605cf85d0566cc651c93cd7' |md5sum 

real	1m16.359s
user	1m16.327s
sys	0m0.001s

Indeed running the above command does produce the MD5 checksum:

> echo -n 'a:1:{s:12:":new:message";s:30:"The username field is required";}d5f3040748d4baff1605cf85d0566cc651c93cd7' |md5sum
896e42e7b65ed35f8d4032c183ef90f8  -

We have the $secret. Before we do the math stuff let's have a bit of fun with it. We can use the discovered $secret value to sign any message we like:

GET /admin.php?/cp/login= HTTP/1.1
Host: 127.0.0.1:8000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://127.0.0.1:8000/admin.php?/cp/login&return=
Cookie: exp_last_visit=1506974463; exp_last_activity=1507046288; csrftoken=K19MFBni88pzBvPQ20dKOoTX4xxi1JhI; sessionid=u1bh6ldfhsde3rd55n5ujn0k1hx3j4nk; exp_csrf_token=b6ecda81cd38bb078dfd37ff4c6239468a6b1ebd; exp_flash=a:2:{s:13:":new:username"%3bs:4:"asdf"%3bs:12:":new:message"%3bs:25:"<marquee>YES!!!</marquee>"%3b}740499079c73abd743f0b6e82839af65
Connection: close
Upgrade-Insecure-Requests: 1

This looks like this:

So, our brute force cracked 30121779 possible seeds in 1 minute and 16.359 seconds. That comes out to 394475.818175 guesses per second. This was single threaded, if I used all 8 cores of my i7 I should get about 3155806.5454 guesses per second. How many days would it take to get the 9*10^11 guesses we think we need?

9*10^11 / (3155806.5454 * 60 * 60 * 24) = 3.3007

A little more than a weekend sadly. Keep in mind this has a lot of hand waving. Is 30 pids enough? Are the windows on the delta time values big enough? Is it likely our “assumption” is true? I don’t know really… However, it does give a better perspective that this may in fact be exploitable. If we just a bit more information out of the server it might very well be feasable without the assumption.

Improvements

GPU

I think rewriting my POC in OpenCL could speed things up a lot. All the math for the PRNGs are well suited for GPUs. The php_combined_lcg function does use floating point operations, but minimally.

The mt_rand function would normally suck up quite a bit of memory, which is bad for GPU cracking, but we reduced that. The function could further be altered to return two or three integers from the PRNG state without increasing the memory consumption much and almost no speed penalty (seeding is the slow part).

There is a string which must be put into the MD5 algorithm. Loading strings in and out of memory is slow, but this is the same string for every operation so it could be compiled into the kernel.

So I think this problem is well suited to GPU cracking. However, I know almost nothing about OpenCL programing, so take the above with a grain of salt.

Improving $delta_lcg and $delta_mt

The PHP library I created includes a test to get the average $delta_lcg and $delta_mt:

https://github.com/swoops/php_prngs/blob/master/uniqid/test.c#L84-L88

This seems like a good idea, but fails dramatically! I spent hours trying to find problems in my code of those tests. I seems that  Apache running PHP increased the $delta_lcg and $delta_mt dramatically. For example, over thousands of tests the above test produces a  $delta_lcg of 0.18µs on my computer. In those tests $delta_lcg never reached 2µs. If you check that actual exploit walkthrough above you will see a $delta_lcg of 14µs.

I intended to dismiss this problem with a lot of hand waving, but while writing this blog I found a solution. I came up with a way to statistically tune the unknown time delta values to the specific server you are attacking. This is done by breaking the CSRF tokens used by the site, they are created like so:

	$csrf = sha1(uniqid(mt_rand(), TRUE));

Note: The latest version of ExpressionEngine uses random_int to create it’s CSRF tokens. Otherwise, the attack below could be adapted to discover CSRF tokens in a very convoluted way.

Look familiar? $csrf is created the same way as $secret but we are in a much better place to break $csrf. I don’t want to write a whole other blog post about this, so let’s just walk through the attack.

  1. Find out what time the victim server thinks it is. Part of HTTP requires the server to send a date header. This publishes the server time to the second. We can do WAY better then seconds though, see this excellent talk by Chris Anley for getting server time in microseconds. His data implies, at worst, you can get the server time to within 40,000µs.
  2. Once your script has time narrowed down to an acceptable network jitter, send 128 HTTP requests all at once. Record all the timing information you can about these requests, and the $csrf token that server responded with in the HTTP header. Each of those requests will get an Apache fork. Once the default number of forks on Apache are consumed new forks appear. The new  forks have not seeded the their PRNGs, so they fit in with the “assumption” we made for the attack on $secret.
  3. Perform an offline brute force the $csrf tokens you receive. Load all 128 $csrf in memory in a sorted array so you can do a binary search of them. Find the earliest time a request was sent and the latest time a request responded and use those for your time max and min settings. Since we have 128 CSRF tokens, some of those, not all, will be created with fresh forks. Let's say only 30 are. That means our chances of guessing the pid correctly increases by a factor of 30 at the expense of 7 hash comparisons. Use this added speed to cast a broad net on possible time delta values. You don’t want to miss outliers.

Repeat the above, logging all the discovered $delta_mt and $delta_lcg until you can create a nice normal distribution. Then use that normal distribution to hone in the $delta_lcg and $delta_mt values for your brute force attack on $secret.

I don’t have a POC for the above to test its effectiveness. It is purely theory. I also left out some details. I hate leaving things half done, but this post is already long. So, if people express interest in the attack on $csrf I will do another blog post to demonstrate it in greater detail.