Skip to content

Commit 0863598

Browse files
author
Diptorup Deb
authored
Merge pull request #1090 from chudur-budur/docs/release-0.20-more
WIP on revised and improved documentation
2 parents f8a8d24 + b23c203 commit 0863598

File tree

1 file changed

+64
-24
lines changed

1 file changed

+64
-24
lines changed

docs/sources/user_manual/kernel_programming/writing_kernels.rst

Lines changed: 64 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
Writing Data Parallel Kernels
55
=============================
66

7-
Kernel declaration
7+
Kernel Declaration
88
------------------
99
A kernel function is a device function that is meant to be called from host
1010
code, where a device can be any SYCL supported device such as a GPU, CPU, or an
@@ -19,55 +19,93 @@ FPGA. The main characteristics of a kernel function are:
1919
- **Kernels cannot explicitly return a value**. All result data must be written to
2020
``dpnp`` array passed as a function's argument.
2121

22+
Here is an example of a kernel that computes sum of two vectors ``a`` and ``b``.
23+
Arguments are two input vectors ``a`` and ``b`` and one output vector ``c`` for
24+
storing the result of vector summation:
25+
2226
.. literalinclude:: ./../../../../numba_dpex/examples/kernel/vector_sum.py
2327
:language: python
24-
:lines: 14-18
28+
:lines: 8-9, 11-15
2529
:caption: **EXAMPLE:** Data parallel kernel implementing the vector sum a+b
2630
:name: ex_kernel_declaration_vector_sum
2731

2832

29-
Kernel invocation
33+
Kernel Invocation
3034
------------------
3135

32-
When a kernel is launched you must specify the *global size* and the *local size*, which determine
33-
the hierarchy of threads, that is the order in which kernels will be invoked.
36+
When a kernel is launched you must specify the *global size* and the *local size*,
37+
which determine the hierarchy of threads, that is the order in which kernels
38+
will be invoked.
3439

35-
The following syntax is used in ``numba-dpex`` for kernel invocation with specified global and local sizes:
40+
The following syntax is used in ``numba-dpex`` for kernel invocation with
41+
specified global and local sizes:
3642

3743
``kernel_function_name[global_size, local_size](kernel arguments)``
3844

39-
In the following example we invoke kernel ``kernel_vector_sum`` with global size specified via variable
40-
``global_size``, and use ``numba_dpex.DEFAULT_LOCAL_SIZE`` constant for setting local size to some
41-
default value. Arguments are two input vectors ``a`` and ``b`` and one output vector ``c`` for storing the
42-
result of vector summation:
45+
In the following example we invoke kernel ``kernel_vector_sum`` with global size
46+
specified via variable ``global_size``, and use ``numba_dpex.DEFAULT_LOCAL_SIZE``
47+
constant for setting local size to some default value:
4348

44-
.. literalinclude:: ./../../../../numba_dpex/examples/kernel/vector_sum.py
45-
:language: python
46-
:lines: 11-15
47-
:caption: **EXAMPLE:** Invocation of the vector sum kernel
48-
:name: ex_kernel_invocation_vector_sum
49+
.. code-block:: python
50+
51+
import numba_dpex as ndpx
52+
53+
global_size = 10
54+
kernel_vector_sum[global_size, ndpx.DEFAULT_LOCAL_SIZE](a, b, c)
4955
5056
.. note::
5157
Each kernel is compiled once, but it can be called multiple times with different global and local sizes settings.
5258

5359

54-
Kernel invocation (New Syntax)
60+
Kernel Invocation (New Syntax)
5561
------------------------------
5662

57-
Since the release 0.20.0 (Phoenix), we have introduced new kernel launch parameter
58-
syntax for specifying ``global_size`` and ``local_size`` that similar to ``SYCL``'s
59-
``range`` and ``ndrange`` classes. The ``global_size`` and ``local_size`` can now
60-
be specified with ``numba_dpex``'s ``Range`` and ``NdRange`` classes.
63+
Since the release 0.20.0 (Phoenix), we have introduced new kernel launch
64+
parameter syntax for specifying global and local sizes that are similar to
65+
``SYCL``'s ``range`` and ``ndrange`` classes. The global and local sizes can
66+
now be specified with ``numba_dpex``'s ``Range`` and ``NdRange`` classes.
67+
68+
For example, we have a following kernel that computes a sum of two vectors:
69+
70+
.. literalinclude:: ./../../../../numba_dpex/examples/kernel/vector_sum.py
71+
:language: python
72+
:lines: 8-9, 11-15
73+
:caption: **EXAMPLE:** A vector sum kernel
74+
:name: vector_sum_kernel
75+
76+
In order to run and if we need to specify a global size, we can do
77+
it like this (where ``global_size`` is an ``int``):
6178

6279
.. literalinclude:: ./../../../../numba_dpex/examples/kernel/vector_sum.py
6380
:language: python
64-
:lines: 11-15
65-
:caption: **EXAMPLE:** Invocation of the vector sum kernel
66-
:name: ex_kernel_invocation_vector_sum
81+
:lines: 8-9, 18-24
82+
:emphasize-lines: 3
83+
:caption: **EXAMPLE:** A vector sum kernel with a global size/range
84+
:name: vector_sum_kernel_with_launch_param
85+
86+
If we need both local and global ranges, we can specify them using two instances
87+
of ``Range`` inside an ``NdRange`` object. For example, let's consider a kernel
88+
to compute pair-wise Euclidean distances of n-dimensional data points:
89+
90+
.. literalinclude:: ./../../../../numba_dpex/examples/kernel/pairwise_distance.py
91+
:language: python
92+
:lines: 14-15, 36-51
93+
:caption: **EXAMPLE:** A kernel to compute pair-wise Euclidean distances
94+
:name: pairwise_distance_kernel
95+
96+
Now we can specify the local and global sizes like below (here both ``args.n``
97+
and ``args.l`` are ``int``):
6798

99+
.. literalinclude:: ./../../../../numba_dpex/examples/kernel/pairwise_distance.py
100+
:language: python
101+
:lines: 14-15, 27-31, 54-67
102+
:emphasize-lines: 4,6,13
103+
:caption: **EXAMPLE:** A kernel to compute pair-wise Euclidean distances with
104+
a global and a local size/range
105+
:name: pairwise_distance_kernel_with_launch_param
68106

69107

70-
Kernel indexing functions
108+
Kernel Indexing Functions
71109
-------------------------
72110

73111
In *data parallel kernel programming* all work items are enumerated and accessed by their index.
@@ -79,3 +117,5 @@ in the current work group are accessed by calling ``numba_dpex.get_local_id()``.
79117

80118
The total number of work groups are determined by calling ``numba_dpex.get_num_groups()`` function.
81119
The current work group index is obtained by calling ``numba_dpex.get_group_id()`` function.
120+
121+
.. _Black Scholes: https://en.wikipedia.org/wiki/Black%E2%80%93Scholes_model

0 commit comments

Comments
 (0)