Note
Click here to download the full example code
Getting Started¶
Author: Yi-Hsiang Lai (seanlatias@github)
In this tutorial, we demonstrate the basic usage of HeteroCL.
Initialize the Environment¶
We need to initialize the environment for each HeteroCL application. We can
do this by calling the API hcl.init()
. We can also set the default data
type for every computation via this API. The default data type is 32-bit
integers.
Note
For more information on the data types, please see Data Type Customization.
hcl.init()
Algorithm Definition¶
After we initialize, we define the algorithm by using a Python function definition, where the arguments are the input tensors. The function can optionally return tensors as outputs. In this example, the two inputs are a scalar a and a tensor A, and the output is also a tensor B. The main difference between a scalar and a tensor is that a scalar cannot be updated.
Within the algorithm definition, we use HeteroCL APIs to describe the
operations. In this example, we use a tensor-based declarative-style
operation hcl.compute
. We also show the equivalent Python code.
Note
For more information on the APIs, please see HeteroCL Compute APIs
def simple_compute(a, A):
B = hcl.compute(A.shape, lambda x, y: A[x, y] + a, "B")
"""
The above API is equivalent to the following Python code.
for x in range(0, 10):
for y in range(0, 10):
B[x, y] = A[x, y] + a
"""
return B
Inputs/Outputs Definition¶
One of the advantages of such modularized algorithm definition is that we
can reuse the defined function with different input settings. We use
hcl.placeholder
to set the inputs, where we specify the shape, name,
and data type. The shape must be specified and should be in the form of a
tuple. If it is empty (i.e., ()), the returned object is a scalar.
Otherwise, the returned object is a tensor. The rest two fields are
optional. In this example, we define a scalar input a and a
two-dimensional tensor input A.
Note
For more information on the interfaces, please see
heterocl.placeholder
a = hcl.placeholder((), "a")
A = hcl.placeholder((10, 10), "A")
Apply Hardware Customization¶
Usually, our next step is apply various hardware customization techniques to
the application. In this tutorial, we skip this step which will be discussed
in the later tutorials. However, we still need to build a default schedule
by using hcl.create_schedule
whose inputs are a list of inputs and
the Python function that defines the algorithm.
s = hcl.create_schedule([a, A], simple_compute)
Inspect the Intermediate Representation (IR)¶
A HeteroCL program will be lowered to an IR before backend code generation. HeteroCL provides an API for users to inspect the lowered IR. This could be helpful for debugging.
print(hcl.lower(s))
Out:
// attr [_top] storage_scope = "global"
allocate _top[int32 * 1]
produce _top {
// attr [0] extern_scope = 0
produce B {
// attr [0] extern_scope = 0
for (x, 0, 10) {
for (y, 0, 10) {
B[(y + (x*10))] = int32((int33(A[(y + (x*10))]) + int33(a)))
}
}
}
}
Create the Executable¶
The next step is to build the executable by using hcl.build
. You can
define the target of the executable, where the default target is llvm.
Namely, the executable will be run on CPU. The input for this API is the
schedule we just created.
f = hcl.build(s)
Prepare the Inputs/Outputs for the Executable¶
To run the generated executable, we can feed it with Numpy arrays by using
hcl.asarray
. This API transforms a Numpy array to a HeteroCL container
that is used as inputs/outputs to the executable. In this tutorial, we
randomly generate the values for our input tensor A. Note that since we
return a new tensor at the end of our algorithm, we also need to prepare
an input array for tensor B.
import numpy as np
hcl_a = 10
np_A = np.random.randint(100, size = A.shape)
hcl_A = hcl.asarray(np_A)
hcl_B = hcl.asarray(np.zeros(A.shape))
Run the Executable¶
With the prepared inputs/outputs, we can finally feed them to our executable.
f(hcl_a, hcl_A, hcl_B)
View the Results¶
To view the results, we can transform the HeteroCL tensors back to Numpy
arrays by using asnumpy()
.
np_A = hcl_A.asnumpy()
np_B = hcl_B.asnumpy()
print(hcl_a)
print(np_A)
print(np_B)
Out:
10
[[18 9 4 4 91 23 15 50 6 25]
[72 3 67 20 35 45 27 60 52 44]
[ 3 99 37 87 24 52 16 92 9 10]
[84 67 36 68 49 54 29 28 22 1]
[22 52 45 0 25 23 68 3 28 17]
[14 69 71 64 94 70 98 55 33 30]
[15 97 99 41 47 95 18 45 8 64]
[65 74 99 6 31 4 64 21 1 22]
[13 19 9 23 93 74 59 34 18 83]
[45 78 89 48 98 31 70 12 69 14]]
[[ 28 19 14 14 101 33 25 60 16 35]
[ 82 13 77 30 45 55 37 70 62 54]
[ 13 109 47 97 34 62 26 102 19 20]
[ 94 77 46 78 59 64 39 38 32 11]
[ 32 62 55 10 35 33 78 13 38 27]
[ 24 79 81 74 104 80 108 65 43 40]
[ 25 107 109 51 57 105 28 55 18 74]
[ 75 84 109 16 41 14 74 31 11 32]
[ 23 29 19 33 103 84 69 44 28 93]
[ 55 88 99 58 108 41 80 22 79 24]]
Let’s run a test
assert np.array_equal(np_B, np_A + 10)
Total running time of the script: ( 0 minutes 0.065 seconds)