Heterogeneous Processing Requires Data Parallelization: SYCL and DPC++ Are a Good Start

Heterogeneous Processing Requires Data Parallelization: SYCL and DPC++ Are a Good Start

James Reinders

Engineer at Intel, has more than three decades experience in parallel computing, and is an author/co-author/editor of ten technical books related to parallel programming. James works at Intel promoting parallel programming in a heterogeneous (XPU) world.

Here is his article published on The New Stack.

I like to say that “It’s all about XPUs.”

We are in a wonderful time when hardware innovation is leading to an explosion in CPUs, GPUs, FPGAs, DSPs, ASICs and more—which I simply abbreviate as XPUs. XPUs is a shorthand for any type of “processing unit” —any hardware that can help my application compute.

As developers, the onslaught of XPUs means that we are increasingly challenged to code for a larger collection of diverse processing units. We are tasked with factoring in extra time, and money, to rewrite and test code to boost application performance for new architectures.

More than ever, to preserve our sanity and the maintainability of our code, it is paramount that the code we write is applicable to as many XPUs as possible. Moving to cross-architecture models for application development has shown that this can save organizations significant time and money, and this becomes an even more pressing concern with the rise in popularity of heterogeneous computing.

Underway today is a rethinking because our world is rapidly becoming a world of XPUs that will eventually transform all of computing.

XPUs: Reinventing Software for Accelerated Compute

  • CUDA, a widely used proprietary software programming system, was designed and is effective for NVIDIA GPUs. OpenCL took an open approach and achieved a certain level of multivendor support.
  • OpenCL had its own shortcomings—most notably being C-centric and failing to address C++ needs well.

CUDA and OpenCL have served their purposes well. Going forward developers need a truly open and multivendor approach to help deliver on the promises of XPUs.

Why SYCL and Data Parallel C++ (DPC++) Offer the Best Path Forward

The learnings from both CUDA and OpenCL set the stage for the emergence of a truly popular and open solution for data parallelism based on C++ for heterogeneous systems.

That solution is SYCL, which is a higher-level programming model to improve programming productivity on multiple hardware accelerators. It has quickly gained broad multivendor support, widespread interest and the support of multiple serious compiler projects.

SYCL is important because effective programming in our increasingly heterogeneous world requires that we offer performant access for all XPUs. Only a truly open approach can provide that.

SYCL is an open standard for single-source C++ data-parallel programming of heterogeneous hardware, or XPUs. SYCL allows single-source compilation in C++ to target multiple devices on a system, rather than using C++ for the host and domain-specific kernel language(s) for the device(s).

SYCL brings to C++ both kernel-style programming and a mechanism to locate, query, and use accelerators in a system.

Kernel-based programming is an important programming style for harnessing data parallelism that was also supported in OpenCL and CUDA. An ability to enumerate and access accelerators, in a standard way, was previously introduced by OpenCL.

Also take a look at DPC++ (Data Parallel C++), which provides an open implementation to the LLVM community, with ambitions to upstream everything into LLVM C++ compilers. DPC++ aims to implement SYCL with some extensions.

DPC++ pioneered many features that are now in SYCL 2020, and therefore had a head start in implementing much of SYCL 2020 even before the ink was dry on the standard.

Work remains to complete alignment with the entire SYCL 2020 specification; all the work is easy to observe in the very active open source repository.

DPC++ is used by Intel to target Intel® CPUs, GPUs, and FPGAs.

DPC++ is also used by Codeplay to target NVIDIA GPUs.

Another SYCL compiler, hipSYCL, supports AMD CPUs and GPUs by connecting with AMD’s HIP/ROCm.

Having multiple open source compilers for SYCL is fantastic for the community, and it demonstrates that SYCL has broad, diverse, and open support.

Over the course of 2019 and 2020, I worked with a dedicated small team to create the first book about SYCL and DPC++. You can download a free copy from the Apress website.

Shortly after its publication, the Khronos Group announced the finalized specification for SYCL 2020.

The recent ratification of the SYCL 2020 specification is a significant milestone. It is truly an open specification with a bright future ahead; the specification is the product of years of specification development by many dedicated individuals from around the industry. Based on C++17, SYCL 2020 enables easier acceleration of standard C++ applications and drives a closer alignment with the ISO C++ roadmap.

The Khronos Group highlighted, in their SYCL 2020 announcement, a number of SYCL 2020 features including support for Unified Shared Memory (USM), built-in reductions, extensive use of CTAD and atomic operations that align with standard C++ atomics.

XPUs Are the Future, Let’s Keep It Open for the Benefits of XPU Diversity and Programming Sanity

SYCL and DPC++ will help us make effective use of XPUs. They are part of a broader push for support of XPUs that extends into libraries and all software development tools, building on the ambitions of SYCL and its compilers.

That is the origin of the oneAPI industry initiative, which I’m really passionate about and was excited to be a part of as I rejoined Intel

The support for this whole topic—of easing the challenges of using all XPUs openly—is driving interest in SYCL and oneAPI.

A solid example is the use of the oneAPI Deep Neural Network Library (oneDNN), initially highly optimized for Intel processors, which accelerates the world’s fastest computer (with ARM processors).

As a result, oneDNN has strong ARM support now, too. The openness of SYCL and oneAPI libraries and tools are helping usher in a new era for openness and performance to give us useful programming access to all XPUs.

Together, the software developer community has an opportunity to create standards, including SYCL, that serve the whole industry, and strongly support the adoption of heterogeneous programming (XPUs) and modern C++ as it embraces parallelism.

SYCL offers an open standard with broad support, lots of ability to participate, multiple open-source implementations, and seemingly infinite possibilities.

DPC++ provides an open LLVM-based compiler to reduce the effort to support SYCL and encourage strong compatibility across XPUs.

oneAPI offers a forum to discuss and drive open and performant access for XPUs into all aspects of software development.

I hope you’ll take the opportunity to get educated about SYCL, DPC++ and oneAPI because XPUs are the future of compute.

We should shape support for XPUs together, in the open, and enjoy the benefits of the enormous diversity in XPUs available for us to program effectively.