Book Excerpt: The Tangled Web: A Guide to Securing Modern Web Applications

Published: January 17th, 2012

Part III: A Glimpse of Things to Come

Following nearly a decade of stagnation, the world of browsers is once more a raging battlefield. In a manner all too reminiscent of the First Browser Wars in the late 1990s, vendors compete by bringing new features to market monthly. The main difference is that security is now seen as a clear selling point.

Of course, objectively measuring the robustness of any sufficiently complex piece of software is an unsolved problem in computing, doubly so if your codebase happens to carry almost two decades worth of bloat. Therefore, much of the competitive effort goes into inventing and then rapidly deploying new security-themed additions, often with little consideration for how well they actually solve the problem they were supposed to address.

In the meantime, standards bodies, mindful of their earlier misadventures, have ditched much of their academic rigor in favor of just letting a dedicated group of contributors tweak the specifications as they see fit. There is talk of making HTML5 the last numbered version of the standard and transitioning to a living document that changes every day—often radically. The relaxation of the requirement has helped keep ongoing much of the work around W3C and WHATWG, but it has also undermined some of the benefits of having a central organization to begin with. Many recent proposals gravitate toward quick, narrowly scoped hacks that do not even try to form a consistent and well-integrated framework. When this happens, no robust feedback mechanism is in place to allow external experts to review reasonably stable specifications and voice concerns before any implementation work takes place. The only way to stay on top of the changes is to immerse oneself in the day-to-day dynamics of the working group.

It is difficult to say if this new approach to standardization is a bad thing. In fact, its benefits may easily outweigh any of the speculative risks; for one, we now have a chance at a standard that is reasonably close to what browsers actually do. Nevertheless, the results of this frantic and largely unsupervised process can be unpredictable, and they require the security community to be very alert.

In this spirit, the last part of the book will explore some of the more plausible and advanced proposals that may shape the future of the Web . . . or that may just as likely end up in the dustbin of history a few years from now.
#!

16

New and Upcoming Security Features

You will soon find out that there is little rhyme and reason to how all the new browser features mesh, but we still need to organize the discussion in some way. Perhaps the best approach is to look at their intended purposes and begin with all the mechanisms created specifically to tweak the Web’s security model for a well-defined gain.

The dream of inventing a brand-new browser security model is strong within the community, but it is always followed by the realization that it would require rebuilding the entire Web. Therefore, much of the practical work focuses on more humble extensions to the existing approach, necessarily increasing the complexity of the security-critical sections of the browser codebase. This complexity is unwelcome, but its proponents invariably see it as justified, whether because they aim to mitigate a class of vulnerabilities, build a stopgap for some other hard problem that nobody wants to tackle right now, or simply enable new types of applications to be built in the future.

All these benefits usually trump the vague risk.

Security Model Extension Frameworks
Some of the most successful security enhancements proposed in the past few years boil down to adding flexibility to the original constraints imposed by the same-origin policy and its friends. For example, one formerly experimental proposal that has now crossed into the mainstream is the postMessage(…) API for communicating across origins, discussed in Chapter 9. Surprisingly, the act of relaxing SOP checks in certain carefully chosen scenarios is more intuitive and less likely to cause problems than locking the policy down. So, to begin on a lighter note, we’ll focus on this class of frameworks first.

Cross-Domain Requests
Under the original constraints of the same-origin policy, scripts associated with one origin have no clean and secure way to communicate with client-side scripts executing in any other origin and no safe way to retrieve potentially useful data from a willing third-party server.

Web developers have long complained about these constraints, and in recent years, browser vendors have begun to listen to their demands. As you recall, the more pressing task of arranging client-side communications between scripts was solved with postMessage(…). The client-to-server scenario was found to be less urgent and still awaits a canonical solution, but there has been some progress to report.

The most successful attempt to create a method for retrieving documents from non-same-origin servers began in 2005. Under the auspices of W3C, several developers working on VoiceXML, an obscure document format for building Interactive Voice Response (IVR) systems, drafted a proposal for Cross-Origin Resource Sharing (CORS). Between 2007 and 2009, their awkward, XML-based design gradually morphed into a much simpler and more widely useful scheme, which relied on HTTP header–level signaling to communicate consent to cross-origin content retrieval using a natural extension of the XMLHttpRequest API.

CORS Request Types
As specified today, CORS relies on differentiating between two types of calls to the XMLHttpRequest API. When the site attempts to load a cross-origin document through the API, the browser first needs to distinguish between simple requests, where the resulting HTTP traffic is deemed close enough to what can be generated through other, existing methods of navigation, and non-simple requests, which encompass everything else. The operation of these two classes of requests vary significantly, as we’ll see.

The current specification says that simple requests must have a method of GET, POST, or HEAD. Additionally, if any custom headers are specified by the caller, they must belong to the following set:

• Cache-Control
• Content-Language
• Content-Type
• Expires
• Last-Modified
• Pragma

Today, browsers that support CORS simply do not allow methods other than GET, POST, and HEAD. At the same time, they ignore the recommended whitelist of headers, unconditionally demoting any requests with custom header values to non-simple status. The implementation in WebKit also considers any payload-bearing requests to be non-simple. (It is not clear whether this is an intentional design decision or a bug.)
#!
Security Checks for Simple Requests
The CORS specification allows simple requests to be submitted to the destination server immediately, without attempting to confirm whether the destination is willing to engage in cross-domain communications to begin with. This decision is based on the fact that the attacker may initiate fairly similar cookie-authenticated traffic by other means (for example, by automatically submitting a form) and, therefore, that there is no point in introducing an additional handshake specifically for CORS.

The crucial security check is carried out only after the response is retrieved from the server: The data is revealed to the caller through the XMLHttpRequest API only if the response includes a suitable, well-formed Access-Control-Allow-Origin header. To assist the server, the original request will include a mandatory Origin header, specifying the origin associated with the calling script.

To illustrate this behavior, consider the following cross-domain XMLHttpRequest call performed from http://www.bunnyoutlet.com/:

var x = XMLHttpRequest();
x.open(‘GET’, ‘http://fuzzybunnies.com/get_message.php?id=42’, false);
x.send(null);

The result will be an HTTP request that looks roughly like this:

GET /get_message.php?id=42 HTTP/1.0
Host: fuzzybunnies.com
Cookie: FUZZYBUNNIES_SESSION_ID=EA7E8167CE8B6AD93D43AC5AA869A920
Origin: http://www.bunnyoutlet.com

To indicate that the response should be readable across domains, the server needs to respond with

HTTP/1.0 200 OK
Access-Control-Allow-Origin: http://www.bunnyoutlet.com

The secret message is: “It’s a cold day for pontooning.”

NOTE: It is possible to use a wildcard (“*”) in Access-Control-Allow-Origin, but do so with care. It is certainly unwise to indiscriminately set Access-Control-Allow-Origin: * on all HTTP responses, because this step largely eliminates any assurances of the same-origin policy in CORS-compliant browsers.
#!
Non-simple Requests and Preflight
In the early drafts of the CORS protocol, almost all requests were meant to be submitted without first checking to see if the server was actually willing to accept them. Unfortunately, this design undermined an interesting property leveraged by some web applications to prevent cross-site request forgery: Prior to CORS, attackers could not inject arbitrary HTTP headers into cross-domain requests, so the presence of a custom header often served as a proof that the request came from the same origin as the destination and was issued through XMLHttpRequest.

Later CORS revisions corrected this problem by requiring a more complicated two-step handshake for requests that did not meet the strict “simple request” criteria outlined in “CORS Request Types” on page 236. The handshake for non-simple requests aims to confirm that the destination server is CORS compliant and that it wants to receive nonstandard traffic from that particular caller. The handshake is implemented by sending a vanilla OPTIONS request (“preflight”) to the target URL containing an outline of the parameters of the underlying XMLHttpRequest call. The most important information is conveyed to the server in three self-explanatory headers: Origin, Access-Control-Request-Method, and Access-Control-Request-Headers.

This handshake is considered successful only if these parameters are properly acknowledged in the response through the use of Access-Control-Allow-Origin, Access-Control-Allow-Method, and Access-Control-Allow-Headers. Following a correct handshake, the actual request is made. For performance reasons, the result of the preflight check for a particular URL may be cached by the client for a set period of time.

Current Status of CORS
As of this writing, CORS is available only in Firefox and WebKit-based browsers and is notably absent in Opera or Internet Explorer. The most important factor hindering its adoption may be simply that the API is not as critical as postMessage(…), its client-side counterpart, because it can be often replaced by a content-fetching proxy on the server side. But the scheme is also facing three principal, if weak, criticisms, some of which come directly from one of the vendors. Obviously, these criticisms don’t help matters.
The first complaint, voiced chiefly by Microsoft developers and echoed by some academics, is that the scheme needlessly abuses ambient authority. They argue that there are very few cases where data shared across domains would need to be tailored based on the credentials available for the destination site. The critics believe that the risks of accidentally leaking sensitive information far outweigh any benefits and that a scheme permitting only nonauthenticated requests to be made would be preferable. In their view, any sites that need a form of authentication should instead rely on explicitly exchanged authentication tokens.

The other, more pragmatic criticism of CORS is that the scheme is needlessly complicated: It extends an already problematic and error-prone API without clearly explaining the benefits of some of the tweaks. In particular, it is not clear if the added complexity of preflight requests is worth the peripheral benefit of being able to issue cross-domain requests with unorthodox methods or random headers.

The last of the weak complaints hinges on the fact that CORS is susceptible to header injection. Unlike some other recently proposed browser features, such as WebSockets (Chapter 17), CORS does not require the server to echo back an unpredictable challenge string to complete the handshake. Particularly in conjunction with preflight caching, this may worsen the impact of certain header-splitting vulnerabilities in the server-side code.
#!
XDomainRequest
Microsoft’s objection to CORS appears to stem from the aforementioned concerns over the use of ambient authority, but it also bears subtle overtones of their dissatisfaction with interactions with W3C. In 2008, Sunava Dutta, a program manager at Microsoft, offered this somewhat cryptic insight:

During the [Internet Explorer 8] Beta 1 timeframe there were many security based concerns raised for cross domain access of third party data using cross site XMLHttpRequest and the Access Control framework. Since Beta 1, we had the chance to work with other browsers and attendees at a W3C face-to-face meeting to improve the server-side experience and security of the W3C’s Access Control framework.

Instead of embracing the CORS extensions to XMLHttpRequest, Microsoft decided to implement a counterproposal, dubbed XDomainRequest. This remarkably simple, new API differs from the variant available in other browsers in that the resulting requests are always anonymous (that is, devoid of any browser-managed credentials) and that it does not allow for any custom HTTP headers or methods to be used.

The use of Microsoft’s API is otherwise very similar to XMLHttpRequest:

var x = new XDomainRequest();
x.open(“GET”, “http://www.fuzzybunnies.com/get_data.php?id=1234”);
x.send();

Borrowing from W3C’s proposal, the resulting request will bear an Origin header, and the response data will be revealed to the caller only if a matching Access-Control-Allow-Origin header is present in the response. Preflight requests and permission caching are not a part of the design.

For all intents and purposes, Microsoft’s solution is far more reasonable than CORS: It is simpler, safer, and probably just as functional in all the plausible uses. That said, it is also unpopular. It is supported only in Internet Explorer 8 and up, and owing to W3C backing CORS, others have no reason to embrace XDomainRequest anytime soon.

In the meantime, a separate group of researchers have proposed a third solution, again acting under the auspices of W3C. Their design, known as Uniform Messaging Policy (complete with a corresponding UniformRequest API), embraces an approach nearly identical to Microsoft’s. It is not supported in any existing browser, but there is some talk of unifying it with CORS.
#!
Other Uses of the Origin Header
The Origin header is an essential part of CORS, XDomainRequest, and UMP, but it actually evolved somewhat independently with other uses in mind. In their 2008 paper, Adam Barth, Collin Jackson, and John C. Mitchell advocated the introduction of a new HTTP header that would offer a more reliable and privacy-conscious alternative to Referer. It would also serve as a way to prevent cross-site request vulnerabilities by providing the server with the information needed to identify the SOP-level origin of a request, without disclosing the potentially more sensitive path or query data.

Of course, it was unclear whether the subtle improvement between Referer and its proposed successor would actually make a difference for the small but nonnegligible population of users who block that first header on privacy grounds. The proposal consequently ended up in a virtual limbo, not being deployed in any existing browsers but also discouraging others from pursuing other solutions such as XSRF or XSSI. (To be fair, the concept was very recently revived under the new name of From-Origin and may not be completely dead yet.)

The fate of the original idea aside, the utility of the Origin header in specialized cases such as CORS was pretty clear. Around 2009, this led to Barth submitting an IETF draft specifying the syntax of the header, while shying away from making any statements about when the header should be sent, or what specific security problems it might solve:

The user agent MAY include an Origin header in any HTTP request.
[…]
Whenever a user agent issues an HTTP request from a “privacy-sensitive” context, the user agent MUST send the value “null” in the Origin header.
NOTE: This document does not define the notion of a privacy-sensitive context. Applications that generate HTTP requests can designate contexts as privacy-sensitive to impose restrictions on how user agents generate Origin headers.

The bottom line of this specification is that whatever the decision process is, once the client chooses to provide the header, the value is required to accurately represent the SOP origin from which the request is being made. For example, when a particular operation takes place from http://www .bunnyoutlet.com:1234/bunny_reports.php, the transmitted value should be.

Origin: http://www.bunnyoutlet.com:1234

For origins that do not meaningfully map to a protocol-host-port tuple, the browser must send the value of null instead.

Despite all of these plans, as of this writing only one browser includes the Origin header on non-CORS navigation: WebKit-based implementations send it when submitting HTML forms. Firefox seems to be considering a different approach, but nothing specific seems to have been implemented yet.

Article Tags

The Tangled Web: A Guide to Securing Modern Web Applications

About Michal Zalewski

View all posts by Michal Zalewski

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Book Excerpt: The Tangled Web: A Guide to Securing Modern Web Applications

Article Tags

Subscribe to SDTimes

About Michal Zalewski