.. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_tutorial_02_imperative.py: Imperative Programming ====================== **Author**: Yi-Hsiang Lai (seanlatias@github) There exist many applications that cannot be described using only vectorized code such as `hcl.compute`. Thus, we introduce imperative programming in HeteroCL, which makes HeteroCL applications more expressive. In this tutorial, we will implement *insertion sort* in HeteroCL. .. code-block:: default import heterocl as hcl hcl.init() A = hcl.placeholder((10,), "A") Stages in HeteroCL ------------------ In HeteroCL, when users write an application, they are actually building a compute graph. Each node in a graph is a *stage*. Each edge is directed, which represents the data flow between two stages. Some HeteroCL APIs naturally form a stage, such as ``hcl.compute``. Since the imperative code we are going to write cannot be described using a HeteroCL API, we need to wrap it as a stage explicitly via ``hcl.Stage``. Users can specify the name of a stage, which is optional. Note that **a HeteroCL application must have at least one stage**. .. code-block:: default def insertion_sort(A): # Introduce a stage. with hcl.Stage("S"): # for i in range(1, A.shape[0]) # We can name the axis with hcl.for_(1, A.shape[0], name="i") as i: key = hcl.scalar(A[i], "key") j = hcl.scalar(i-1, "j") # while(j >= 0 && key < A[j]) with hcl.while_(hcl.and_(j >= 0, key < A[j])): A[j+1] = A[j] j.v -= 1 A[j+1] = key.v Imperative DSL -------------- To write imperative code in HeteroCL, we need to use a subset of HeteroCL DSL, which is *imperative DSL*. HeteroCL's imperative DSL supports a subset of Python's control flow statements, including conditional statements and control flows. In the above code, we show how we can use ``hcl.for_`` to write a `for` loop and ``hcl.while_`` to write a `while` loop. Moreover, we use ``hcl.and_`` for logical expressions. Here we also introduce a new API, which is ``hcl.scalar``. It is equivalent to ``hcl.compute((1,))`` Namely, it declares a tensor with exactly one element, which can be treated as a **stateful scalar**. Following we show the execution results of the implemented sorting algorithm. .. note:: Currently we support the following imperative DSLs. Logic operations: :obj:`heterocl.and_`, :obj:`heterocl.or_`. Control flow statements: :obj:`heterocl.if_`, :obj:`heterocl.else_`, :obj:`heterocl.elif_`, :obj:`heterocl.for_`, :obj:`heterocl.while_`, :obj:`heterocl.break_`. .. code-block:: default s = hcl.create_schedule([A], insertion_sort) We can inspect the generated IR. .. code-block:: default print(hcl.lower(s)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none // attr [_top] storage_scope = "global" allocate _top[int32 * 1] produce _top { // attr [0] extern_scope = 0 // attr [S] storage_scope = "global" allocate S[int32 * 1] produce S { // attr [0] extern_scope = 0 for (i, 0, 9) { // attr [key] storage_scope = "global" allocate key[int32 * 1] produce key { // attr [0] extern_scope = 0 key[0] = A[(i + 1)] } // attr [j] storage_scope = "global" allocate j[int32 * 1] produce j { // attr [0] extern_scope = 0 j[0] = i } while (((0 <= j[0]) && (key[0] < A[j[0]]))) { A[(j[0] + 1)] = A[j[0]] j[0] = (j[0] + -1) } A[(j[0] + 1)] = key[0] } } } Finally, we build the executable and feed it with Numpy arrays. .. code-block:: default f = hcl.build(s) import numpy as np hcl_A = hcl.asarray(np.random.randint(50, size=(10,))) print('Before sorting:') print(hcl_A) f(hcl_A) print('After sorting:') np_A = hcl_A.asnumpy() print(np_A) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Before sorting: [49 13 12 1 19 18 0 20 7 16] After sorting: [ 0 1 7 12 13 16 18 19 20 49] Let's run some tests for verification. .. code-block:: default for i in range(1, 10): assert np_A[i] >= np_A[i-1] Bit Operations -------------- HeteroCL also support bit operations including setting/getting a bit/slice from a number. This is useful for integer and fixed-point operations. Following we show some basic examples. .. code-block:: default hcl.init() A = hcl.placeholder((10,), "A") def kernel(A): # get the LSB of A B = hcl.compute(A.shape, lambda x: A[x][0], "B") # get the lower 4-bit of A C = hcl.compute(A.shape, lambda x: A[x][4:0], "C") return B, C Note that for the slicing operations, we follow the convention of Python, which is **left exclusive and right inclusive**. Now we can test the results. .. code-block:: default s = hcl.create_schedule(A, kernel) f = hcl.build(s) np_A = np.random.randint(0, 100, A.shape) hcl_A = hcl.asarray(np_A) hcl_B = hcl.asarray(np.zeros(A.shape)) hcl_C = hcl.asarray(np.zeros(A.shape)) f(hcl_A, hcl_B, hcl_C) print("Input array:") print(hcl_A) print("Least-significant bit:") print(hcl_B) print("Lower four bits:") print(hcl_C) # a simple test np_B = hcl_B.asnumpy() np_C = hcl_C.asnumpy() for i in range(0, 10): assert np_B[i] == np_A[i] % 2 assert np_C[i] == np_A[i] % 16 .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Input array: [74 27 36 28 15 68 77 69 61 64] Least-significant bit: [0 1 0 0 1 0 1 1 1 0] Lower four bits: [10 11 4 12 15 4 13 5 13 0] The operations for bit/slice setting is similar. The only difference is that we need to use imperative DSL. Following is an example. .. code-block:: default hcl.init() A = hcl.placeholder((10,), "A") B = hcl.placeholder((10,), "B") C = hcl.placeholder((10,), "C") def kernel(A, B, C): with hcl.Stage("S"): with hcl.for_(0, 10) as i: # set the LSB of B to be the same as A B[i][0] = A[i][0] # set the lower 4-bit of C C[i][4:0] = A[i] s = hcl.create_schedule([A, B, C], kernel) f = hcl.build(s) # note that we intentionally limit the range of A np_A = np.random.randint(0, 16, A.shape) np_B = np.random.randint(0, 100, A.shape) np_C = np.random.randint(0, 100, A.shape) hcl_A = hcl.asarray(np_A) hcl_B = hcl.asarray(np_B) hcl_C = hcl.asarray(np_C) f(hcl_A, hcl_B, hcl_C) print("Input array:") print(hcl_A) print("Before setting the least-significant bit:") print(np_B) print("After:") print(hcl_B) print("Before setting the lower four bits:") print(np_C) print("After:") print(hcl_C) # let's do some checks np_B2 = hcl_B.asnumpy() np_C2 = hcl_C.asnumpy() assert np.array_equal(np_B2 % 2, np_A % 2) assert np.array_equal(np_B2 // 2, np_B // 2) assert np.array_equal(np_C2 % 16, np_A) assert np.array_equal(np_C2 // 16, np_C // 16) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Input array: [ 6 9 2 7 14 9 11 9 2 12] Before setting the least-significant bit: [18 38 8 60 70 17 36 16 64 98] After: [18 39 8 61 70 17 37 17 64 98] Before setting the lower four bits: [28 25 91 60 87 74 20 49 64 68] After: [22 25 82 55 94 73 27 57 66 76] .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.124 seconds) .. _sphx_glr_download_tutorials_tutorial_02_imperative.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download :download:`Download Python source code: tutorial_02_imperative.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: tutorial_02_imperative.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_