Weaving solutions for multi-threading

Published: November 1st, 2012

- Alex Handy

Like rabbits left to their own devices, processing cores are breeding. But for all the proliferation of cores on desktops, in data centers and inside handheld devices, the fundamental problems for developers who have to deal with all those cores haven’t changed much.

For years, tools and frameworks have offered solutions to multi-threading, concurrency and parallelism. But unlike some other programming problems, where clear solutions have been selected already, the multicore problem remains one that is better solved through knowledge, skill and experience than through dollars and prepackaged bits. But that’s not to say there haven’t been some new developments in development.

Due to the complexity of the problems associated with multicore development, many vendors have had to be quite clever in how they offer their solutions. While there is no substitute for knowledge and experience, some companies, such as Microsoft, have found ways to get new multicore supports into their tools for developers who might not be so skilled in multicore development.

Brandon Bray, principal group program manager for .NET at Microsoft, said that there are three types of programmers for which they’ve addressed multi-core solutions. The first type of developers, which Bray said is quite small, knows how to develop multi-threaded applications, and doesn’t need much help. The second is similar, but wants concurrency to be easier. The third type is the largest, and is made up of developers who want nothing to do with multicore, multi-threading or concurrency.

For each type of these developers, Microsoft has managed to craft a way to make multicore development a little bit easier. For the first two types, .NET 4.5’s release in September included a new background garbage collector, which improved application performance thanks to the removal of garbage-collection pauses in the runtime.

“Background garbage collection does all the looking through memory while the program is running,” said Bray. “It’s making use of extra cores, and it’s not something we could have done 10 years ago because most computers only had one processor in them. It’s a default feature when people upgrade to .NET 4.5.”

.NET also includes multicore JIT, which uses data from previous application runs to optimize JIT compilation of .NET applications. This can improve startup time for long-running applications. This also uses extra cores automatically to do the compilation, thus offering developers a quicker build time.

While these two solutions are targeted at more experienced programmers, Bray said that developers disinterested in multicore can rely on .NET to offer them a faster way to do I/O and to take advantage of multicore for that: Async and Await.

“Async and Await are used with C#. The idea is that most developers really think about the code running on one processor,” said Bray. “If I write a loop, I understand how that works. They think of it synchronously, but most of the time when you’re doing network calls or file I/O, that I/O time is wasted time where another processor could do the work. Most of us experience that as the user interface being paused. That UI pausing is an opportunity for concurrency.

“The more I can use multicore in these cases, the better. It means I can keep the UI responsive. How do we get developers to fall into this pattern? The libraries you call that involve I/O, like calling a network or a REST API, you’d like that to be put into the background on another processor, and when the task is finished, it executes on the thread pool and the result is just passed back. I never have to pause the UI. This pattern we’ve built into the compiler, and it tells you where you need to write these keywords. If you call this API, it will force you to use this feature. Developers that don’t care are the ones that cause the pausing all the time. They still don’t have to care anymore; often, they don’t know they’re doing some of this concurrency.”

The trouble with Java
Jason van Zyl is in a unique position to observe how Java developers are approaching multicore development in Java. As CTO and founder of Sonatype, and creator of Maven, he has a front-row seat at the world’s largest Java repository, Maven Central.

And yet, van Zyl remained convinced that multicore, concurrent and parallel programming aren’t being made any easier by frameworks and tools in the Java space.

“I don’t think it’s ever going to change, in that you need to find pretty smart people who can do the concurrency code,” he said. “Any syntax or changes in the language, like Scala, you need to be extremely smart to use it, and it’s just too hard for some people. In the core parts of your organization, you’re never going to get around the need for people who have a lot of experience with it and love it.”

But van Zyl points out one area where a new approach to concurrent programming has been fomenting for the past few years: languages as the solution. Specifically, developers have been considering moves to functional languages like Clojure, Erlang, Haskell and Scala.

“There are some advantages to different languages like Erlang,” said van Zyl. “But you always need people who know how to use them. I am not sure you could make a language that’s expressible to every developer. I think Scala needs to get their binary compatibility story a little better, but I hear lots of people like it.”

And that is the central problem of choosing a functional language as your solution to the concurrency problem: It’s basically just as hard to train or hire functional people as it is to train or hire multi-threaded and concurrency people.

Of course, Intel has some incredibly talented concurrency and multicore people, and the company has long made its work available to the public through its Threading Building Blocks products. This year, however, Java has joined the party.

For many, Threading Building Blocks and Parallel Studio XE tools have been a lifesaver for concurrency and multicore development. In September, Intel updated its multicore tools with new capabilities.
#!
James Reinders, director of marketing and chief evangelist for Intel Software, said that, traditionally, Parallel Studio tools have focused on Fortran and C++, but this year one new language has made it into the tool chain.

“Java, to us, is something that shows up in applications. It’s mixed in,” he said. “Users want us to be able to tell them a little about that, our tools that do some of the debugging and find memory leak errors, and our performance tools have been extended to use Java. The Java runtimes today have hooks in them for performance tools to get info back. We can tell the computer is running something in the Java runtime. The users want to see which Java application was doing what. We’re able to do that now. We’ve had some Java support in the past, but it was always limited to one JVM.”

Now, Reinders added, Intel’s Parallel Studio 2013 can attach to multiple JVMs running on the same system. That gives developers more flexibility to find problems that may exist across a Java application ecosystem.

But Parallel Studio XE 2013, and the new HPC-focused Intel Cluster Studio 2013, both offer new capabilities for standard C++ and Fortran applications as well. To begin with, both software suites support C++11 and Fortran 2008, though there are still areas where said support is being filled in.

“No one has implemented C++11 and Fortran 2008 completely,” said Reinders. “We’re working very hard to implement features. We’re implementing them in order of customer feedback. We’ve made great strides, and we have most of C++11 and most of Fortran 2008 done, but we aren’t ready to say everything is done.”

Embedding threading
While traditional languages get attention from Intel, some developers have already taken the drastic step of moving to a functional language in order to gain concurrency. The company behind Scala and concurrency framework Akka understands that functional programming is a big mental shift, and it has been working to find ways to ease the transition.

Mark Brewer, president and CEO of Typesafe, said that Scala is a big jump from Java, but that the benefits are worth the move. “If you look at the masses of Java developers, they’re looking for something cool, something new, to write an application that can’t get done today. We’ll continue to see people make the adoption of Scala because they need the functional performance capabilities, but that’s not the only way they’ll adopt Scala,” he said.

They could also come to Scala because of Akka, a concurrency framework similar in scope to Erlang’s OTP. While concurrency and parallelism are the end goals for many development teams, the Akka framework also adds fault tolerance to the mix.

“Where we’re seeing it being picked up is in embedded: tablets, cars, phones,” said Brewer of recent uptake of Akka. “One of the case studies we want to get out there is a vendor who’s taken Akka and uses it to run streaming video on your DVR. Akka is sitting there as the service to monitor what you’re watching. If you’re not connected directly to the server, it’ll go to your neighbor’s box and figure out if they have it downloaded already, and your DVR will bring it in from there. It’s also a way to collect local metrics on people who are watching local TV shows.”

And this is the promise of ubiquitous concurrency: When every machine on your network can take a self-contained workload off the central stack, perform the work, then pop the information back into the stack, a host of new possibilities arise. As Brewer said, with a concurrency framework running on the end devices, processing can take place on those multicore processors that now live in mobile devices, sensors and embedded machines.

“The Dutch Border Patrol, as you’re driving up, takes your picture with sensors,” said Brewer, describing another use case for Scala and Akka. “All those sensors are sitting on a small piece of code in Scala on Akka, quickly processing your car tag number. By the time you hit the border, there is a guard there, and all of that processing happens in real time. They chose it because it was very lightweight. Akka is running on each of those sensors.”
#!
Same old story
Tobias Lindaaker is the concurrency expert at graph database company Neo Technologies. He said that he’s been speaking on concurrency for years, but has seen little success in imparting the wisdom he’s gathered through simple conference talks. The problem is that concurrency is just too complicated to abstract or simplify, he said.

“The problem is getting much harder,” he said. “Mainly, people aren’t using the tools. I’ve heard the Intel tools are really good, but they’re expensive. I try to do a few talks every year where I get people to understand how concurrency works, but so far I don’t think I’ve managed to be successful in that. Part of it is that it’s easy to ignore. Developers write single-threaded programs, and they kind of work. There’s a huge number of developers who don’t have a need to care about concurrency.

“For those kinds of people, they’re working on frameworks for making it easier to make sure you can still write your single-threaded application code, and our freeware code will take care of it for you. You have many of those single-threaded worlds that only go so far. At some point you’re going to have shared state. You create lots of data from reading out of a database, and you need multiple copies of the same data, one for each thread. They’re pretty much immutable, which is a waste of space, but, even worse, you go to the database to do the same query over and over and over again, when you could have consolidated that.

“Lots of things you’re dealing with in concurrency are about state. If state doesn’t change, then it’s safe to do multi-threads. The problem is when state changes. What I’ve been trying to do recently is to work around blocking calls. It’s not so much about state, but more around flow: saying that you don’t get to decide when this code executes, I am going to decide, and when it executes, I am going to make sure what you can do is safe. It’s very close to the functional way of thinking, just slightly broader.”

For Java developers, that functional way of thinking may soon be much more relevant. With Lambdas coming to OpenJDK 8, Java developers will have at least some functional language capabilities at their disposal.

“The things going on with parallel collections and Lambdas, those will make it easier for the people who already know what they’re doing,” said Lindaaker. “They wont be as appalled by Java as they are today. I’m not so sure that will reach the broader masses.”

And that’s because at the core of this multicore problem is the difficulty of extracting state, coordinating threads, and consolidating memory, all without crashing processes into each other over shared information. It’s a high-stakes, extremely complicated juggling act that can’t be shoehorned into a development process.

But there is at least some advice Lindaaker can offer. His secret to writing better concurrent code in Java? “Mainly, read source. Since I develop a lot of Java, I read the implementations of the JDK. I’ve read the OpenJDK source a number of times, so I know what’s going on and how my use of the code will relate to what happens on the CPU. Read the implementations of the concurrency library in Java. Also the read the compiler source and look at the method assembly code.”

CUDA 5
Nvidia updated its GPU compute SDK in October. In that update came a number of new capabilities and tools. You’ve already read about the new, simpler methods for invoking dynamic parallelism, so here’s info on the other big changes in this release.

Perhaps the change most related to dynamic parallelism is the new GPU-callable libraries. Until CUDA 5, libraries of GPU code had to be wrapped in CPU-understandable code. This also meant all library calls required the CPU and GPU to talk to one another. With GPU-callable libraries, the CPU isn’t needed to pull in routines and primitives at runtime, meaning less cross-chatter, thus better performance overall.

Another related change is the newly added GPU support for Remote Direct Memory Access (RDMA). This means GPUs can now communicate directly with the onboard network card, and use it to check out information held in RAM on another computer in the cluster, thus removing the CPU and its bus, cache, and memory from the RDMA process entirely.

Additionally, the 5.0 release of the CUDA SDK now includes an Eclipse-based development platform. This is the first time Nvidia has offered an Eclipse-based tool for CUDA developers, and the company said this version is based on existing Visual Studio plug-ins.

Article Tags

multicore

About Alex Handy

Alex Handy is the Senior Editor of Software Development Times.

View all posts by Alex Handy

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Weaving solutions for multi-threading

Article Tags

Subscribe to SDTimes

About Alex Handy

Related Articles

Checking in with OpenACC

The Trouble with Gerrold: Using all the petaflops

The Trouble with Gerrold: Predictions, predictions, predictions

Intel’s Parallel Studio adds Cilk Plus extensions for C/C++