44Writing Data Parallel Kernels
55=============================
66
7- Kernel declaration
7+ Kernel Declaration
88------------------
99A kernel function is a device function that is meant to be called from host
1010code, where a device can be any SYCL supported device such as a GPU, CPU, or an
@@ -19,55 +19,93 @@ FPGA. The main characteristics of a kernel function are:
1919- **Kernels cannot explicitly return a value **. All result data must be written to
2020 ``dpnp `` array passed as a function's argument.
2121
22+ Here is an example of a kernel that computes sum of two vectors ``a `` and ``b ``.
23+ Arguments are two input vectors ``a `` and ``b `` and one output vector ``c `` for
24+ storing the result of vector summation:
25+
2226.. literalinclude :: ./../../../../numba_dpex/examples/kernel/vector_sum.py
2327 :language: python
24- :lines: 14-18
28+ :lines: 8-9, 11-15
2529 :caption: **EXAMPLE: ** Data parallel kernel implementing the vector sum a+b
2630 :name: ex_kernel_declaration_vector_sum
2731
2832
29- Kernel invocation
33+ Kernel Invocation
3034------------------
3135
32- When a kernel is launched you must specify the *global size * and the *local size *, which determine
33- the hierarchy of threads, that is the order in which kernels will be invoked.
36+ When a kernel is launched you must specify the *global size * and the *local size *,
37+ which determine the hierarchy of threads, that is the order in which kernels
38+ will be invoked.
3439
35- The following syntax is used in ``numba-dpex `` for kernel invocation with specified global and local sizes:
40+ The following syntax is used in ``numba-dpex `` for kernel invocation with
41+ specified global and local sizes:
3642
3743``kernel_function_name[global_size, local_size](kernel arguments) ``
3844
39- In the following example we invoke kernel ``kernel_vector_sum `` with global size specified via variable
40- ``global_size ``, and use ``numba_dpex.DEFAULT_LOCAL_SIZE `` constant for setting local size to some
41- default value. Arguments are two input vectors ``a `` and ``b `` and one output vector ``c `` for storing the
42- result of vector summation:
45+ In the following example we invoke kernel ``kernel_vector_sum `` with global size
46+ specified via variable ``global_size ``, and use ``numba_dpex.DEFAULT_LOCAL_SIZE ``
47+ constant for setting local size to some default value:
4348
44- .. literalinclude :: ./../../../../numba_dpex/examples/kernel/vector_sum.py
45- :language: python
46- :lines: 11-15
47- :caption: **EXAMPLE: ** Invocation of the vector sum kernel
48- :name: ex_kernel_invocation_vector_sum
49+ .. code-block :: python
50+
51+ import numba_dpex as ndpx
52+
53+ global_size = 10
54+ kernel_vector_sum[global_size, ndpx.DEFAULT_LOCAL_SIZE ](a, b, c)
4955
5056 .. note ::
5157 Each kernel is compiled once, but it can be called multiple times with different global and local sizes settings.
5258
5359
54- Kernel invocation (New Syntax)
60+ Kernel Invocation (New Syntax)
5561------------------------------
5662
57- Since the release 0.20.0 (Phoenix), we have introduced new kernel launch parameter
58- syntax for specifying ``global_size `` and ``local_size `` that similar to ``SYCL ``'s
59- ``range `` and ``ndrange `` classes. The ``global_size `` and ``local_size `` can now
60- be specified with ``numba_dpex ``'s ``Range `` and ``NdRange `` classes.
63+ Since the release 0.20.0 (Phoenix), we have introduced new kernel launch
64+ parameter syntax for specifying global and local sizes that are similar to
65+ ``SYCL ``'s ``range `` and ``ndrange `` classes. The global and local sizes can
66+ now be specified with ``numba_dpex ``'s ``Range `` and ``NdRange `` classes.
67+
68+ For example, we have a following kernel that computes a sum of two vectors:
69+
70+ .. literalinclude :: ./../../../../numba_dpex/examples/kernel/vector_sum.py
71+ :language: python
72+ :lines: 8-9, 11-15
73+ :caption: **EXAMPLE: ** A vector sum kernel
74+ :name: vector_sum_kernel
75+
76+ In order to run and if we need to specify a global size, we can do
77+ it like this (where ``global_size `` is an ``int ``):
6178
6279.. literalinclude :: ./../../../../numba_dpex/examples/kernel/vector_sum.py
6380 :language: python
64- :lines: 11-15
65- :caption: **EXAMPLE: ** Invocation of the vector sum kernel
66- :name: ex_kernel_invocation_vector_sum
81+ :lines: 8-9, 18-24
82+ :emphasize-lines: 3
83+ :caption: **EXAMPLE: ** A vector sum kernel with a global size/range
84+ :name: vector_sum_kernel_with_launch_param
85+
86+ If we need both local and global ranges, we can specify them using two instances
87+ of ``Range `` inside an ``NdRange `` object. For example, let's consider a kernel
88+ to compute pair-wise Euclidean distances of n-dimensional data points:
89+
90+ .. literalinclude :: ./../../../../numba_dpex/examples/kernel/pairwise_distance.py
91+ :language: python
92+ :lines: 14-15, 36-51
93+ :caption: **EXAMPLE: ** A kernel to compute pair-wise Euclidean distances
94+ :name: pairwise_distance_kernel
95+
96+ Now we can specify the local and global sizes like below (here both ``args.n ``
97+ and ``args.l `` are ``int ``):
6798
99+ .. literalinclude :: ./../../../../numba_dpex/examples/kernel/pairwise_distance.py
100+ :language: python
101+ :lines: 14-15, 27-31, 54-67
102+ :emphasize-lines: 4,6,13
103+ :caption: **EXAMPLE: ** A kernel to compute pair-wise Euclidean distances with
104+ a global and a local size/range
105+ :name: pairwise_distance_kernel_with_launch_param
68106
69107
70- Kernel indexing functions
108+ Kernel Indexing Functions
71109-------------------------
72110
73111In *data parallel kernel programming * all work items are enumerated and accessed by their index.
@@ -79,3 +117,5 @@ in the current work group are accessed by calling ``numba_dpex.get_local_id()``.
79117
80118The total number of work groups are determined by calling ``numba_dpex.get_num_groups() `` function.
81119The current work group index is obtained by calling ``numba_dpex.get_group_id() `` function.
120+
121+ .. _Black Scholes : https://en.wikipedia.org/wiki/Black%E2%80%93Scholes_model
0 commit comments