.. note::
    :class: sphx-glr-download-link-note

    Click :ref:`here <sphx_glr_download_tutorials_tutorial_02_imperative.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_tutorial_02_imperative.py:


Imperative Programming
======================

**Author**: Yi-Hsiang Lai (seanlatias@github)

There exist many applications that cannot be described using only vectorized
code such as `hcl.compute`. Thus, we introduce imperative programming in
HeteroCL, which makes HeteroCL applications more expressive. In this tutorial,
we will implement *insertion sort* in HeteroCL.

.. code-block:: default


    import heterocl as hcl

    hcl.init()

    A = hcl.placeholder((10,), "A")


Stages in HeteroCL
------------------
In HeteroCL, when users write an application, they are actually building a
compute graph. Each node in a graph is a *stage*. Each edge is directed,
which represents the data flow between two stages. Some HeteroCL APIs
naturally form a stage, such as ``hcl.compute``. Since the imperative code
we are going to write cannot be described using a HeteroCL API, we need to
wrap it as a stage explicitly via ``hcl.Stage``. Users can specify the name
of a stage, which is optional. Note that **a HeteroCL application must have
at least one stage**.


.. code-block:: default


    def insertion_sort(A):

        # Introduce a stage.
        with hcl.Stage("S"):
            # for i in range(1, A.shape[0])
            # We can name the axis
            with hcl.for_(1, A.shape[0], name="i") as i:
                key = hcl.scalar(A[i], "key")
                j = hcl.scalar(i-1, "j")
                # while(j >= 0 && key < A[j])
                with hcl.while_(hcl.and_(j >= 0, key < A[j])):
                    A[j+1] = A[j]
                    j.v -= 1
                A[j+1] = key.v


Imperative DSL
--------------
To write imperative code in HeteroCL, we need to use a subset of HeteroCL
DSL, which is *imperative DSL*. HeteroCL's imperative DSL supports a subset
of Python's control flow statements, including conditional statements and
control flows. In the above code, we show how we can use ``hcl.for_`` to
write a `for` loop and ``hcl.while_`` to write a `while` loop. Moreover, we
use ``hcl.and_`` for logical expressions. Here we also introduce a new API,
which is ``hcl.scalar``. It is equivalent to

``hcl.compute((1,))``

Namely, it declares a tensor with exactly one element, which can be treated
as a **stateful scalar**. Following we show the execution results of the
implemented sorting algorithm.

.. note::

   Currently we support the following imperative DSLs. Logic operations:
   :obj:`heterocl.and_`, :obj:`heterocl.or_`. Control flow statements:
   :obj:`heterocl.if_`, :obj:`heterocl.else_`, :obj:`heterocl.elif_`,
   :obj:`heterocl.for_`, :obj:`heterocl.while_`, :obj:`heterocl.break_`.


.. code-block:: default


    s = hcl.create_schedule([A], insertion_sort)


We can inspect the generated IR.


.. code-block:: default

    print(hcl.lower(s))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    // attr [_top] storage_scope = "global"
    allocate _top[int32 * 1]
    produce _top {
      // attr [0] extern_scope = 0
      // attr [S] storage_scope = "global"
      allocate S[int32 * 1]
      produce S {
        // attr [0] extern_scope = 0
        for (i, 0, 9) {
          // attr [key] storage_scope = "global"
          allocate key[int32 * 1]
          produce key {
            // attr [0] extern_scope = 0
            key[0] = A[(i + 1)]
          }
          // attr [j] storage_scope = "global"
          allocate j[int32 * 1]
          produce j {
            // attr [0] extern_scope = 0
            j[0] = i
          }
          while (((0 <= j[0]) && (key[0] < A[j[0]]))) {
            A[(j[0] + 1)] = A[j[0]]
            j[0] = (j[0] + -1)
          }
          A[(j[0] + 1)] = key[0]
        }
      }
    }


Finally, we build the executable and feed it with Numpy arrays.


.. code-block:: default

    f = hcl.build(s)

    import numpy as np

    hcl_A = hcl.asarray(np.random.randint(50, size=(10,)))

    print('Before sorting:')
    print(hcl_A)

    f(hcl_A)

    print('After sorting:')
    np_A = hcl_A.asnumpy()
    print(np_A)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Before sorting:
    [49 13 12  1 19 18  0 20  7 16]
    After sorting:
    [ 0  1  7 12 13 16 18 19 20 49]


Let's run some tests for verification.


.. code-block:: default

    for i in range(1, 10):
        assert np_A[i] >= np_A[i-1]


Bit Operations
--------------
HeteroCL also support bit operations including setting/getting a bit/slice
from a number. This is useful for integer and fixed-point operations.
Following we show some basic examples.


.. code-block:: default

    hcl.init()
    A = hcl.placeholder((10,), "A")
    def kernel(A):
        # get the LSB of A
        B = hcl.compute(A.shape, lambda x: A[x][0], "B")
        # get the lower 4-bit of A
        C = hcl.compute(A.shape, lambda x: A[x][4:0], "C")
        return B, C


Note that for the slicing operations, we follow the convention of Python,
which is **left exclusive and right inclusive**. Now we can test the results.


.. code-block:: default

    s = hcl.create_schedule(A, kernel)
    f = hcl.build(s)

    np_A = np.random.randint(0, 100, A.shape)
    hcl_A = hcl.asarray(np_A)
    hcl_B = hcl.asarray(np.zeros(A.shape))
    hcl_C = hcl.asarray(np.zeros(A.shape))

    f(hcl_A, hcl_B, hcl_C)

    print("Input array:")
    print(hcl_A)
    print("Least-significant bit:")
    print(hcl_B)
    print("Lower four bits:")
    print(hcl_C)

    # a simple test
    np_B = hcl_B.asnumpy()
    np_C = hcl_C.asnumpy()
    for i in range(0, 10):
        assert np_B[i] == np_A[i] % 2
        assert np_C[i] == np_A[i] % 16


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Input array:
    [74 27 36 28 15 68 77 69 61 64]
    Least-significant bit:
    [0 1 0 0 1 0 1 1 1 0]
    Lower four bits:
    [10 11  4 12 15  4 13  5 13  0]


The operations for bit/slice setting is similar. The only difference is that
we need to use imperative DSL. Following is an example.


.. code-block:: default

    hcl.init()
    A = hcl.placeholder((10,), "A")
    B = hcl.placeholder((10,), "B")
    C = hcl.placeholder((10,), "C")
    def kernel(A, B, C):
        with hcl.Stage("S"):
            with hcl.for_(0, 10) as i:
                # set the LSB of B to be the same as A
                B[i][0] = A[i][0]
                # set the lower 4-bit of C
                C[i][4:0] = A[i]

    s = hcl.create_schedule([A, B, C], kernel)
    f = hcl.build(s)
    # note that we intentionally limit the range of A
    np_A = np.random.randint(0, 16, A.shape)
    np_B = np.random.randint(0, 100, A.shape)
    np_C = np.random.randint(0, 100, A.shape)
    hcl_A = hcl.asarray(np_A)
    hcl_B = hcl.asarray(np_B)
    hcl_C = hcl.asarray(np_C)

    f(hcl_A, hcl_B, hcl_C)

    print("Input array:")
    print(hcl_A)
    print("Before setting the least-significant bit:")
    print(np_B)
    print("After:")
    print(hcl_B)
    print("Before setting the lower four bits:")
    print(np_C)
    print("After:")
    print(hcl_C)

    # let's do some checks
    np_B2 = hcl_B.asnumpy()
    np_C2 = hcl_C.asnumpy()

    assert np.array_equal(np_B2 % 2, np_A % 2)
    assert np.array_equal(np_B2 // 2, np_B // 2)
    assert np.array_equal(np_C2 % 16, np_A)
    assert np.array_equal(np_C2 // 16, np_C // 16)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Input array:
    [ 6  9  2  7 14  9 11  9  2 12]
    Before setting the least-significant bit:
    [18 38  8 60 70 17 36 16 64 98]
    After:
    [18 39  8 61 70 17 37 17 64 98]
    Before setting the lower four bits:
    [28 25 91 60 87 74 20 49 64 68]
    After:
    [22 25 82 55 94 73 27 57 66 76]


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.124 seconds)


.. _sphx_glr_download_tutorials_tutorial_02_imperative.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download

     :download:`Download Python source code: tutorial_02_imperative.py <tutorial_02_imperative.py>`


  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: tutorial_02_imperative.ipynb <tutorial_02_imperative.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.readthedocs.io>`_