.. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_tutorial_07_module.py: Custom Module Definition ======================== **Author**: Yi-Hsiang Lai (seanlatias@github.com) In this tutorial, we will introduce a new API called ``module``, which allows users to define a hardware module. .. code-block:: default import heterocl as hcl import numpy as np Defining a Hardware Module -------------------------- It is important for users to define a hardware module. The main reason is that by reusing the defined hardware module, we can reduce the resource usage of the design. To define a module, what we need to do is to define a Python function. Then, apply the function with a decorator. Within the decorator, we need to specify the shapes of the arguments. Following we show an example of defining a hardware module that return the maximum value of two tensors with a given index. Note that in this example, we have three input arguments, which are `A`, `B`, and `x`. The first two arguments are tensors with shape `(10,)` while the last argument is a variable. To represent the shape of a variable, we use an empty tuple `()`. Another thing to be noted is that we use ``hcl.return_`` for the return value. We can see that we can have multiple `return` statements. Use the Defined Module ---------------------- To use the module, it is just like a normal Python call. There is nothing special here. Following we show an example of finding the element-wise maximum value of four tensors. .. code-block:: default hcl.init() def maximum(A, B, C, D): @hcl.def_([A.shape, B.shape, ()]) def find_max(A, B, x): with hcl.if_(A[x] > B[x]): hcl.return_(A[x]) with hcl.else_(): hcl.return_(B[x]) max_1 = hcl.compute(A.shape, lambda x: find_max(A, B, x), "max_1") max_2 = hcl.compute(A.shape, lambda x: find_max(C, D, x), "max_2") return hcl.compute(A.shape, lambda x: find_max(max_1, max_2, x), "max_o") We can first inspect the generated IR. You can see that for each computation, we reuse the same module to find the maximum. .. code-block:: default A = hcl.placeholder((10,), "A") B = hcl.placeholder((10,), "B") C = hcl.placeholder((10,), "C") D = hcl.placeholder((10,), "D") s = hcl.create_schedule([A, B, C, D], maximum) print(hcl.lower(s)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none // attr [_top] storage_scope = "global" allocate _top[int32 * 1] produce _top { // attr [0] extern_scope = 0 // attr [find_max] storage_scope = "global" allocate find_max[int32 * 1] produce find_max { // attr [0] extern_scope = 0 def find_max(handle64(_top.find_max.A), handle64(_top.find_max.B), int32(_top.find_max.x)) { if ((_top.find_max.B[_top.find_max.x] < _top.find_max.A[_top.find_max.x])) { return _top.find_max.A[_top.find_max.x] } else { return _top.find_max.B[_top.find_max.x] } } } // attr [max_1] storage_scope = "global" allocate max_1[int32 * 10] produce max_1 { // attr [0] extern_scope = 0 for (x, 0, 10) { max_1[x] = find_max(A, B, x) } } // attr [max_2] storage_scope = "global" allocate max_2[int32 * 10] produce max_2 { // attr [0] extern_scope = 0 for (x, 0, 10) { max_2[x] = find_max(C, D, x) } } produce max_o { // attr [0] extern_scope = 0 for (x, 0, 10) { max_o[x] = find_max(max_1, max_2, x) } } } Finally, let's run the algorithm and check the results .. code-block:: default f = hcl.build(s) a = np.random.randint(100, size=(10,)) b = np.random.randint(100, size=(10,)) c = np.random.randint(100, size=(10,)) d = np.random.randint(100, size=(10,)) o = np.zeros(10) hcl_A = hcl.asarray(a) hcl_B = hcl.asarray(b) hcl_C = hcl.asarray(c) hcl_D = hcl.asarray(d) hcl_O = hcl.asarray(o, dtype=hcl.Int()) f(hcl_A, hcl_B, hcl_C, hcl_D, hcl_O) print("Input tensors:") print(hcl_A) print(hcl_B) print(hcl_C) print(hcl_D) print("Output tensor:") print(hcl_O) # Test the correctness m1 = np.maximum(a, b) m2 = np.maximum(c, d) m = np.maximum(m1, m2) assert np.array_equal(hcl_O.asnumpy(), m) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Input tensors: [69 32 99 77 89 32 48 29 0 77] [86 81 11 56 19 29 80 84 65 86] [52 11 78 78 20 91 11 52 48 85] [98 74 72 59 7 75 94 32 66 14] Output tensor: [98 81 99 78 89 91 94 84 66 86] Modules Without Return Statement -------------------------------- HeteroCL also allows users to define modules without a return statement. The usage is exactly the same as what we just introduced. The only differece is that the module can be called in a stand-alone way. Namely, it does not need to be contained in any HeteroCL APIs. Let's use the same example of finding the maximum. However, this time we update the output directly. .. code-block:: default hcl.init() def maximum2(A, B, C, D): # B will be the tensor that holds the maximum values @hcl.def_([A.shape, B.shape]) def find_max(A, B): with hcl.for_(0, A.shape[0]) as i: with hcl.if_(A[i] > B[i]): B[i] = A[i] find_max(A, B) find_max(C, D) find_max(B, D) s = hcl.create_schedule([A, B, C, D], maximum2) f = hcl.build(s) In the above example, we can see that now without the return value, we can directly call the defined module. Let's check the results. They should be the same as our first example. .. code-block:: default f(hcl_A, hcl_B, hcl_C, hcl_D) print("Output tensor:") print(hcl_D) # Test the correctness m1 = np.maximum(a, b) m2 = np.maximum(c, d) m = np.maximum(m1, m2) assert np.array_equal(hcl_D.asnumpy(), m) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Output tensor: [98 81 99 78 89 91 94 84 66 86] Data Type Customization for Modules ----------------------------------- We can also apply data type customization to our defined modules. There are two ways to do that. First, you can specify the data types directly in the module decorator. Second, you can use the ``quantize`` and ``downsize`` APIs. Let's show how we can downsize the first example. .. code-block:: default A = hcl.placeholder((10,), dtype=hcl.UInt(4)) B = hcl.placeholder((10,), dtype=hcl.UInt(4)) C = hcl.placeholder((10,), dtype=hcl.UInt(4)) D = hcl.placeholder((10,), dtype=hcl.UInt(4)) s = hcl.create_scheme([A, B, C, D], maximum) # Downsize the input arguments and also the return value s.downsize([maximum.find_max.A, maximum.find_max.B, maximum.find_max], hcl.UInt(4)) # We also need to downsize the intermediate results s.downsize([maximum.max_1, maximum.max_2], hcl.UInt(4)) s = hcl.create_schedule_from_scheme(s) f = hcl.build(s) Let's run it. .. code-block:: default hcl_A = hcl.asarray(a, hcl.UInt(4)) hcl_B = hcl.asarray(b, hcl.UInt(4)) hcl_C = hcl.asarray(c, hcl.UInt(4)) hcl_D = hcl.asarray(d, hcl.UInt(4)) hcl_O = hcl.asarray(o) f(hcl_A, hcl_B, hcl_C, hcl_D, hcl_O) print("Downsized output tensor:") print(hcl_O) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Downsized output tensor: [ 6 11 14 14 9 13 14 13 2 14] We can see that the results are downsized to 4-bit numbers. We can double check this. .. code-block:: default # Test the correctness m1 = np.maximum(a%16, b%16) m2 = np.maximum(c%16, d%16) m = np.maximum(m1%16, m2%16) assert np.array_equal(hcl_O.asnumpy(), m) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.174 seconds) .. _sphx_glr_download_tutorials_tutorial_07_module.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download :download:`Download Python source code: tutorial_07_module.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: tutorial_07_module.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_