Click here to download the full example code
Back-end Support¶
Author: Yi-Hsiang Lai (seanlatias@github)
HeteroCL provides multiple back-end supports. Currently, we support both CPU
and FPGA flows. We will be extending to other back ends including ASICs and
PIMs (processing in memory). To set to different back ends, simply set the
API. In this tutorial, we will demonstrate how
to target different back ends in HeteroCL. The same program and schedule will
be used throughout the entire tutorial.
import heterocl as hcl
import numpy as np
A = hcl.placeholder((10, 10), "A")
def kernel(A):
return hcl.compute((8, 8), lambda y, x: A[y][x] + A[y+2][x+2], "B")
s = hcl.create_scheme(A, kernel)
s.downsize(kernel.B, hcl.UInt(4))
s = hcl.create_schedule_from_scheme(s)
CPU is the default back end of a HeteroCL program. If you want to be more
specific, set the target
to be llvm
. Note the some customization
primitives are ignored by the CPU back end. For instance, partition
have no effect. Instead, we can use parallel
f = # equivalent to, target="llvm")
We can execute the returned function as we demonstrated in other tutorials.
hcl_A = hcl.asarray(np.random.randint(0, 10, A.shape))
hcl_B = hcl.asarray(np.zeros((8, 8)), dtype=hcl.UInt(4))
f(hcl_A, hcl_B)
For FPGA, we provide several back ends.
Vivado HLS C++ Code Generation¶
To generate Vivado HLS code, simply set the target to vhls
. Note that
the returned function is a code instead of an executable.
f =, target="vhls")
#include <ap_int.h>
#include <ap_fixed.h>
#include <hls_stream.h>
#include <math.h>
#include <stdint.h>
void default_function(ap_int<32> A[10*10], ap_uint<4> B[8*8]) {
#pragma HLS array_partition variable=A complete dim=0
ap_int<32> _top;
for (ap_int<32> y = 0; y < 8; ++y) {
for (ap_int<32> x = 0; x < 8; ++x) {
#pragma HLS pipeline
B[(x + (y * 8))] = ((ap_uint<4>)(((ap_int<33>)A[(x + (y * 10))]) + ((ap_int<33>)A[((x + (y * 10)) + 22)])));
Vivado HLS C++ Code Simulation¶
HeteroCL provides users with the ability to simulation the generated HLS
code directly from the Python interface. To use this feature, you need to
have the Vivado HLS header files in your g++
include path. If this is
the case, then we can set target to vhls_csim
, which returns an
executable. We can then run it the same as what we do for the CPU back
The Vivado HLS program will not be triggered during the simulation. We only need the header files to be in the path.
import subprocess
import sys
proc = subprocess.Popen(
"g++ -E -Wp,-v -xc++ /dev/null",
stdout, stderr = proc.communicate()
if "Vivado_HLS" in str(stderr):
f =, target="vhls_csim")
f(hcl_A, hcl_B)
Intel HLS C++ Code Generation¶
HeteroCL can also generate Intel HLS code. However, due to certain
limitation, some directives cannot be generated. To generate the code, set
the target to ihls
f =, target="ihls")
#include <HLS/hls.h>
#include <HLS/ac_int.h>
#include <HLS/ac_fixed.h>
#include <HLS/ac_fixed_math.h>
#include <math.h>
component void default_function(ac_int<32, true>* A, ac_int<4, false>* B) {
#pragma HLS array_partition variable=A complete dim=0
ac_int<32, true> _top;
for (ac_int<32, true> y = 0; y < 8; ++y) {
#pragma ii 1
for (ac_int<32, true> x = 0; x < 8; ++x) {
B[(x + (y * 8))] = ((ac_int<4, false>)(((ac_int<33, true>)A[(x + (y * 10))]) + ((ac_int<33, true>)A[((x + (y * 10)) + 22)])));
Merlin C Code Generation¶
HeteroCL can generate C code that can be used along with
Merlin C compiler.
The generated Merlin C code has special support for several customization
primitives. For example, the unroll
primitive implies a fine-grained
parallelism, which unroll all sub-loops. The parallel
primitive implies
a coarse-grained parallelism that generates a PE array. Finally, the
primitive implies a coarse-grained pipeline operation.
f =, target="merlinc")
#include <string.h>
#include <math.h>
#include <assert.h>
#pragma ACCEL kernel
void default_function(int* A, unsigned char* B) {
int _top;
for (int y = 0; y < 8; ++y) {
#pragma ACCEL pipeline
for (int x = 0; x < 8; ++x) {
B[(x + (y * 8))] = ((unsigned char)(((long)A[(x + (y * 10))]) + ((long)A[((x + (y * 10)) + 22)])));
SODA Stencil Code Generation¶
HeteroCL incorporates the SODA framework for efficient stencil architecture generation. For more details, please refer to Use the Stencil Backend.
Total running time of the script: ( 0 minutes 0.086 seconds)