If you run an open API program, the current controversy surrounding Cambridge Analytica’s use of Facebook data to create psychographic profiles of millions of Facebook users should concern you, and not just because of how your profile data may have been used.
I recall being very surprised at how much data I could access through Facebook’s application programming interface (API) back when they first released it. I could easily navigate through a specific user’s news feed and friends list and all but replicate that user’s web of social interactivity with only a handful of calls. Facebook opened this data to allow developers to create games and applications that enhanced the core purpose of Facebook at the time — connecting people and allowing them to share their lives with their friends online. While the terms of service made it clear that data was not intended to be captured and stored, there was also nothing stopping a developer from breaking those rules — and nothing Facebook could do to easily tell if the rules had been violated.
Subsequent updates to the Facebook API limited the access to much of that data, but the genie was already out of the bottle. It appears the data Cambridge Analytica used may have been gathered some time prior to 2015, before those limits were put in place.
It isn’t just Facebook
Facebook is taking a big hit on all this controversy, but there’s a part of me that feels it’s somewhat undeserved. The same data that may have been used to target specific audiences with messages of questionable veracity also allowed companies like Zynga to flourish, and helped Facebook evolve from a simple social bulletin board to a genuine social platform. I don’t believe any of this was malicious on Facebook’s part. I think it’s the unintended consequences of a drive toward radical openness marred by a culture of “move fast and break things.”
Now it’s time to move fast and fix things. If you run an API program that is open to the public, you should take this as a warning to audit your APIs now to understand exactly what data you’re exposing, who has access to it, and how that data is connected to other API endpoints in your system.
Why an API audit is important
As an example, the early Facebook API allowed a fair amount of a user’s friend data to be exposed as part of the user’s profile data. This meant if your friend granted a third-party app access to their data, that app would also give some limited access to your data, even if you didn’t grant access to that app. I’m aware of at least one other social network API that not only returned a user’s profile data in a single call, it also returned every one of their followers. Aside from the fat payload that created, it meant giving more data to the application than it actually required or requested.
Proper normalization of RESTful endpoints combined with endpoint-level access restrictions is one of the best ways to avoid this type of situation. For example, a user’s profile may be accessible from the endpoint ‘/users/rzazueta’. Rather than list all connected friends as part of that response, the data should contain a link to the friends list, i.e. ‘/users/rzazueta/friends’. When endpoint-level access controls are applied in the code, only those with the ability to read the friend’s endpoint would gain access to that information.
This, of course, means you need to set up your API packages, user roles, and endpoints in a way that allows for that level of control. Most API management systems make this relatively easy, but can only help if you’ve designed your API correctly.
If you have never done so, now is a time to perform an audit of your API to map what data is accessible to which users and ensure you’re not exposing more than you intend to — even if your API is internal only.
API Audit 101
Start by creating a map connecting which users and user roles have access to which endpoints. Ideally, all of your users will have consistent access through a set of roles rather than individual custom access. If that’s not the case, consider creating new roles that will suit those customers’ needs.
Next, look at the data in each of those endpoints. If you’re applying content filtering to limit what data is returned to a specific user or role, make sure you mark that down. In a well-designed RESTful API, your endpoints would return only the data they are responsible for. Any data related to other endpoints should only be accessible through those endpoints, referenced through a hyperlink, as in my user profile and friend list example above. It’s tempting to provide all of that data in a single response to cut down on the number of API requests, but it also opens the door to exposing more data than intended.
If your API is designed to return more data in fewer calls, you should consider moving that logic from the core API code to a layer that calls on the core APIs to consolidate and respond to those requests as a separate function. This pattern, called “Backend for Frontend” or “BFF,” has been adopted by companies such as Netflix to make it easier to create APIs that serve specific client needs. BFFs allow for an extra level of access control, as they should be limited by the same access levels as the customers using them.
Over the years, I’ve spoken with a number of companies who have hesitated in moving forward with an API program for fear it could be a vector of attack for hackers. The Cambridge Analytica case would seem to confirm those fears, though perhaps not in the ways once imagined. The situation serves as a clear warning to API providers that data security must go beyond basic access controls and firewalls. Good API management systems can significantly improve the security of your APIs. Those designing the APIs, however, must keep in mind the potential unintended consequences of their design decisions.