It is an exciting time to be in the world of software development, yet by the same token, it is a scary time. The pace of change seems to be ever-increasing, and things that were stable for a long, long time are experiencing revolutions. User interfaces and user interaction modalities are no exception and may be the areas that end up causing the most turmoil for development teams in the years ahead.
The question used to be whether an application was best suited to Windows, which required an installer, or if it would benefit from an installer-free Web application format. We still get to make that decision, though it is further complicated by client technology choices, but now there are extra dimensions that complicate decisions far more than extra choices. There is a batch of technologies pushing the possibilities of interface design to the next level.
If HTML5 jumped to mind, then you are on the tame side. While HTML5 does promise to bring media and game-like interfaces to the browser, it is not going to be nearly as disruptive as products like the Microsoft Surface and Kinect. They represent the wild side of things because they embody moving away from the keyboard and mouse input as the primary interface to the application. These are not just for games, as we will see by the examples of where these technologies are being leveraged already. (If you have not heard of the Surface before, in its first version, it is a coffee table-sized and shaped computer screen that lets users interact with it via as many as 20 discrete, simultaneous touch points.)
There is a strong temptation to consider these as amusing niche technologies that do not have any bearing on the future of business applications or even consumer applications beyond games. For another year or two, that is probably a low-cost strategy, but it assumes that this is not the direction of things to come, and that is a mistake. The danger for developers and project managers is that there is risk in adopting these new technologies immediately and there are risks in not embracing them early enough. If you ignore them, the world can—and likely will—pass you by. If you embrace them and try to just wedge them into your old designs, things are not going to work. The trick is to understand where they fit and how they can be used effectively.
Reach out and touch
The first of the next-generation interfaces that everyone would agree has fully infiltrated society—and done so seemingly with blinding speed—is touch interfaces on smartphones. It is easy to overlook this development since it seems to be niche and suited to purpose rather than the start of a change in the bigger picture. All of the smartphones out on the market that are vying for dominance use the touch interface made popular by Apple with the iPhone. Apple did not invent touch as an interface of course, but it certainly brought it mainstream.
Now, flick and pinch are part of the general computing vocabulary. If that last sentence contains two words that have no meaning for you, and English is your primary language, then you must get up to speed with touch interfaces and jargon immediately. Step one is to get a touch interface phone most likely running an OS from Microsoft, Apple or Google, and then use this guide to explore the touch interface. Even if you know what flick and pinch mean, you might find the guide useful.
The mistake is to assume that this revolution ends with phones. We are already seeing that tablets are participating in this touch revolution, and the coming Windows 8 has already sworn allegiance to supporting touch in a big way at its core. It is not just for games and it is not just for phones. If you have trouble imagining how this will spread to business apps or productivity applications in general, then join the club, but that is all the more reason to get ahead in understanding the potential. As you will see, these are some of the places where next-generation interfaces are already making inroads. Computers beyond tablets are the next step in the progression, and Microsoft’s Surface, which will soon be released as version 2, embodies where this interface can go.
If Microsoft appears to be playing catch-up in the next generation interface space because of the early success of others in the phone market, then you might be surprised to learn that it is actually a pioneer. While Apple unveiled the iPhone in January 2007, Microsoft started on the Surface in 2001.
I got the chance to talk to Bryan Coon, lead software engineer at InterKnowlogy, about the Surface and Kinect development. InterKnowlogy is one of the companies leading the charge to bring these technologies to business applications, and Coon has worked on a number of very interesting projects.
He said the Surface works well with “attract attention-type apps” because they grab attention and support from multiple users all poking and prodding from all angles. In the computer world, Surface produces something akin to the street performer effect where people want to walk up and play with it. Surface makes you want to touch it and interact with it. I have been around Surface for a while now and admit that is still my first inclination when I see one.
The primary interactions with the Surface are based on touch, including flicks and drags, though keyboards are typically on-screen and appear as needed. Coon said the InterKnowlogy approach is to “envision groups working with it together.” Touch is demanding for a developer, but very satisfying to the user when done well.
The Windows 8 Metro interface is instructive in this regard in ways I had not considered prior to the Build 2011 conference last September. Microsoft has put a lot of thought into how touch changes the game beyond phones and the kinds of applications that work—and the kinds that do not work—with touch. A key takeaway is that performance is critical in any touch-enabled system or application because the interface must react fluidly and instantaneously. This has driven many of the decisions that shaped the iPhone and subsequent iPad experience. The problem is that Apple was not at all transparent on why they did not allow multi-tasking and other tradeoffs meant to ensure responsiveness.
With a touch interface, any freeze in the UI—even for a second—is user-confounding and a deal-breaker. This is why Windows 8 development makes such a priority of asynchronous tasks, to prevent the user interface from blocking.
#!
Building from the ground up
The development tools and the infrastructure are catching up quickly to help us build touch-enabled applications. For example, software development tool vendor DevExpress is committed to touch for all products, supporting even a touch-capable grid control.
One of the key challenges is to not force touch support into a project by shoehorning the functionality into existing interfaces, but to take advantage of the new capability to enrich the experience and make using the application easier. InterKnowlogy bundled some of its experience into a scatter control that enables multi-touch (provided of course the hardware supports it). It can be found here. Knowing what tools are available is an important part to being in a position to use these new interfaces when the opportunity arises. A really great example of leveraging the strengths of the Surface is InterKnowlogy’s Warehouse Commander application, which takes data from Microsoft Dynamics and allows a shipping warehouse to optimize bin placement for greater efficiency, all with a very visual, touch-enabled application. You can see a full demo of that program to get a glimpse at this line-of-business application that leverages the strengths of the Surface quite well.
Touch interfaces are cool, but they have their limits. When you look beyond touch interfaces, things get a bit less defined. For example, swiping a finger across a display or interface-enabled surface is clearly touch, but what about putting an object on a tabletop device? This is a common element of the demos we see with future interfaces, and the Surface even has a tag system to support this kind of interaction. Then there are waves, pointing, and any number of other motions that are referred to as spatial gestures. In techie circles, we tend to call these “Minority Report”-style interfaces after the Steven Spielberg movie starring Tom Cruise that depicts characters using interfaces that are just one step short of “Star Trek’s” Holodeck.
The more correct term is to refer to it as a NUI (Natural User Interface), and the Holodeck is the best example of where things could be heading, though probably not in my lifetime. The common theme when NUI is discussed is invisible, or non-intrusive. By that definition, the Kinect gets us most of the way there. By this definition, the goggle screens with gloves from years past also fit the NUI category technically, but they fail on the non-intrusive point. Microsoft gets it, but as always, it needed competitors to force it down the road.
In fact, right around the time Microsoft was prototyping the first Surface device, Spielberg worked with Microsoft to help understand futuristic interfaces while making “Minority Report.” As previously mentioned, this film is widely referenced as the poster child for NUI interfaces, with depictions of the protagonist plucking virtual objects in a virtual reality interface (i.e. performing spatial gestures to control the system). Spatial gestures in the Microsoft world are the province of the Kinect, which will be discussed in detail later in the article.
Neil Roodyn, director of nsquared, talks about a system he has worked on where the user is immersed in a building and can manipulate fixtures, including swapping out doorknobs and moving lamps and other furniture with gestures. The system he displays supports touch through tablet integration, but it also goes beyond that to what he calls “vision systems.” By this, he means a system that understands the context on the environment as a person seeing it might—not only knowing that something is placed on a touch surface, but what was put there and by whom.
During a talk that Roodyn delivered, he used the Kinect as the provider of a rich vision system for accomplishing that environmental understanding thanks to the three cameras, an infrared array and a microphone array it provides. During that same presentation, he pointed out a pretty amazing video that Corning produced over a year ago titled “A Day Made of Glass.” It showed where the company imagines it could take its products in a way that leverages the touch interface. You can see it yourself.
The Surface is a different market than the other technologies discussed. It costs thousands of dollars, making it very expensive when compared to the Kinect or even a touch-enabled tablet, but it was never meant to be a mass-market device. Conversely, the Kinect was really never meant to be a general-purpose input device.
Kinect, a game-changer
The Kinect is a whole different animal, and is likely a major game-changer in much the same way the iPhone was for touch going mainstream. The entry level for the user on the Kinect is much lower, with a price point under US$200.
It was originally called Project Natal, and I first heard about it when someone pointed me at a video that showed it as the answer to the Nintendo Wii game system, but raises the bar by taking the controller out of the picture. It is turning out to be a game-changer that is surprising everyone, including Microsoft. It one-upped the Wii’s motion-based interface, but added the concept of not requiring a controller of any kind.
Nintendo certainly deserves some of the credit for putting us on this road, as Microsoft does its best innovation in response to a competitive threat, and the Wii control and balance board set the stage. The Kinect was dubbed the “fastest-selling consumer electronics device” after selling 8 million units in the first 60 days.
It was not long before developers started to hack at the device to figure out how it worked and how it could be leveraged outside of the Xbox 360. In fact, there are all kinds of videos and guides from about a year ago showing how to rig up the USB-like plug from the Kinect so that it could be connected to a PC USB port and be supplied external power. This all seemed to catch Microsoft by surprise, and since it all actually violated the terms of use, there was an expectation that there might be a backlash from Microsoft.
Rather than clamp down on these early innovators, Microsoft adapted by developing a plan to create and release a Windows SDK to let developers put the Kinect to work for Windows applications. The Kinect for Windows SDK, along with resources such as tutorials, is available at www.kinectforwindows.org. The current set of bits supports Windows 7 and the Windows 8 Developer Preview, and is for non-commercial purposes with the commercial version promised in “early 2012.”
We will discuss what exactly the SDK provides in a bit. There is one small bit of hardware beyond the Kinect sensor bar itself that you will need if you want to embark on playing with the Kinect SDK, and it has to do with the USB-like interface of the Kinect sensor. You do not have to break out your soldering iron to try it out because Microsoft and third-party provider Nyko have made available an adapter that lets you plug the Kinect into older Xbox 360s that do not have the proper connector for the Kinect.
I suspect that the vast majority of people buying these today are using them to adapt the Kinect to a PC for programming purposes rather than for older Xboxes. The adapters also provide power via a plug since the Kinect plug powers the sensor as well. That is a great deal for $25 to $35 since you can avoid messing with trying to build your own.
The Kinect for Windows SDK still has Microsoft Research’s fingerprints all over it. That is by no means a bad thing since the most jaw-dropping, cool things from Microsoft typically have their start at Microsoft Research. The first thing you have to do when starting a project that leverages the Kinect is to add a reference to the Windows.Research.Kinect DLL. The SDK comes with samples that are a great help in getting started, including the Shape Game and the Skeletal Viewer.
The source code for these two samples will be invaluable to jump-starting your understanding of how to make use of the capabilities provided by the SDK. The latest version of the Kinect for Windows SDK provides access to raw data streams from the depth sensor, color camera and the microphone array that consists of four microphones. This is the source of the ocean of data alluded to in my conversation with InterKnowlogy’s Coon.
One of the key functions is skeletal tracking, since whole-body tracking is a great way to provide control, and both of the sample programs make good use of it. Many of the improvements to the latest version are to the skeletal tracking system, including a speed and accuracy boost. There is also now support for losing connectivity with the Kinect without losing everything (a problem with the past version).
#!
Design moves at start
The ultimate question, as nsquared’s Roodyn put it, is “whether this is evolutionary or revolutionary.” I agree with his position that it is in fact revolutionary, which means that you cannot just forklift these interfaces into your old applications. That is the true hazard, since classic interfaces will be viable for the foreseeable future, but murdering the user of NUI interfaces will cause much more frustration for all involved.
When developing a system, the time to include these interface mechanisms in the application is at the very start of the design phase. Every week there is a new video available that shows how people are leveraging Kinect, and they seem to fall into two categories: those that seem like they just are using voice or gesture to do what a mouse and keyboard could do, and those that accelerate the user experience by making use of the extra dimensions.
As a user, I know that I want the system to intuit as much as possible, but no more. I want it to demand natural gestures rather than make me learn an abstract motion vocabulary. For example, if I am determining whether I want the system to continue to feed me data or stop, I think the gestures used in the card game blackjack for “hit” or “stay” would be easy to grok.
Failing to implement this might make the user raise his or her hand for more data and lower it to stop. For the second, there is no basis for remembering the commands, and if I need a lot of data, I am likely to get tired holding my hand up for a while. Either way though, it is this thoughtfulness that is needed by developers and designers that is annoying when not applied to a GUI application, but disastrous with these next-generation interfaces.
If we look at how the Kinect is being used we see it in some very innovative applications. For the medical field in general, the Kinect is huge. Some implementations are based on the Xbox 360, such as at the Royal Berkshire Hospital in England, which is using off-the-shelf, Kinect-enabled games to work on patient balance, coordination and movement in general.
For a more custom implementation that works on for rehabilitation, InterKnowlogy uses the PC with the Kinect, and bases its code on the Windows Kinect SDK, which is the most interesting way to go, in my opinion. Coon mentioned an application he referred to affectionately as the “rehab-o-matic application,” which helps people recover from surgery.
After surgery, there are a number of checkups to see how the patient is progressing. The idea is to use the Kinect to let the patient rehab at home and measure the progress, which is easier on the patient and saves money. The system can even report back to the doctor on how the patient is doing. InterKnowlogy has posted a video showing how it works and talking about some of the challenges in the development at that is worth checking out.
One of the coolest implementations for doctors to use while in surgery is by Tedesys in Spain, which allows them to manipulate scans without having to touch anything that is not sterile (or anything at all for that matter). I think it is safe to say that the medical field will likely be a leader in leveraging the Kinect in the next couple of years.
Think before you move
There are so many examples of applying the Kinect. I asked Coon if he found that the Kinect is only for projects that cannot be done some other way. He said, “Certain things it does very well, and some others not so well. A common mistake is to try to drive an old-style app with Kinect without rethinking it. The Kinect is not just a new mouse.”
Xbox 360 games have taken many different strategies to adapt to Kinect. For example, Dance Central has done it the right way, according to Coon, because it has limited the controls on screen to make it more of a wizard style. This limits distractions for users and lets them navigate obvious paths. Immediate and obvious feedback is critical, since it is a movement-based interface. One way to think about it is that it is like the old text-based games, where the user has a few choices to make and based on that choice is offered two to four new choices.
There is no expectation that you will drop what you are doing and start your current project over again with these new technologies. However, it is a good time to pay attention and look at where they might fit in your arsenal of tools. Keeping up with new development tools is par for the course, and you should treat these next-generation interfaces in the same way.
Opportunity favors the prepared mind, and if you are anything like me, you will need a good amount of time playing with these things, especially the Kinect, before you can leverage them in the right way for a real project.