You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: gsoc-2020/gpu.md
+17-15
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,9 @@ The primary search methods provided by the GPU octree module are listed below
29
29
4. Synchronous (CPU based) Radius Search
30
30
- C. Asynchronous (GPU based) K Nearest Search
31
31
32
-
While the initial plan was not to spend an extensive amount of time on the GPU octree module, upon closer inspection it was discovered that there were many irregularities and errors within the GPU octree module. Specifically, two of the three primary methods offered by the GPU octree module, namely K Nearest Neighbours search (C) and Asynchronous Approximate Nearest Neighbors search(A-1) were both returning incorrect results while one of the implementations of the Radius Search (B-2) was also returning incorrect results. The two 'synchronous' versions of the radius search and approximate nearest search methods listed above (A-2 & B-4) provide CPU based implementations (i.e. non parallelized versions that do not use CUDA kernels) of their GPU based counterparts.
32
+
The two 'synchronous' versions of the radius search and approximate nearest search methods listed above (A-2 & B-4) provide CPU based implementations (i.e. non parallelized versions that do not use CUDA kernels) of their GPU based counterparts.
33
+
34
+
While the initial plan was not to spend an extensive amount of time on the GPU octree module, upon closer inspection it was discovered that there were many irregularities and errors within the GPU octree module. Specifically, two of the three primary methods offered by the GPU octree module, namely K Nearest Neighbours search (C) and Asynchronous Approximate Nearest Neighbors search(A-1) were both returning incorrect results. Furthermore, one of of Radius Search's variants (B-2) was also returning incorrect results.
33
35
34
36
All of these functions were utilizing outdated CUDA primitives and idioms, risking deprecation in the near future. When diving into the code, it was also discovered that the GPU approximate nearest neighbours algorithm used a completely different traversal methodology from it's CPU counterpart.
35
37
@@ -40,12 +42,12 @@ Due to these discoveries, the scope of the GPU modernization effort was expande
40
42
Related PRs: [[4146]](https://github.com/PointCloudLibrary/pcl/pull/4146)[[4306]](https://github.com/PointCloudLibrary/pcl/pull/4306)[[4313]](https://github.com/PointCloudLibrary/pcl/pull/4313)
41
43
42
44
After comprehensively going through the GPU search methods to investigate their functionality and the causes of the above issues, we identified two separate bugs as the underlying cause:
43
-
1. In approximate nearest search and K nearest search, an outdated method was being used to synchronize data between threads in order to sort distances across warp threads. This was fixed by replacing the functionality with warp level primitives introduced in CUDA 9.0 detailed [here.](https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/)
45
+
1. In approximate nearest search and K nearest search, an outdated method was being used to synchronize data between threads in order to sort distances across warp threads. This was fixed by replacing the functionality with [warp level primitives introduced in CUDA 9.0](https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/).
44
46
2. In radius search, the correct radius was not shared between warp threads. Thus the search was being conducted for incorrect radius values. Synchronizing the radius values across the threads fixed this issue.
45
47
46
-
Since much of the code inside the above functions utilized an outdated concept of using volatile memory for sharing data between threads, they were also replaced by utilizing warp primitives to synchronize thread data.
48
+
Since much of the code inside the above functions utilized an outdated concept of using volatile memory for sharing data between threads, they were also replaced by newer warp primitives to synchronize thread data.
47
49
48
-
### Implementation of new traversal mechanism of approximate nearest search
50
+
### Implementation of a new traversal mechanism for approximate nearest search
49
51
50
52
Related PRs: [[4294]](https://github.com/PointCloudLibrary/pcl/pull/4294)
51
53
@@ -106,8 +108,8 @@ This flexibility can be offered to the user by transitioning the PCL library’s
106
108
Related PRs: [[4166]](https://github.com/PointCloudLibrary/pcl/pull/4166)
107
109
108
110
CMake options were added to allow users to select:
109
-
-Type of index (signed / unsigned – signed by default);
110
-
-Sign of index (8 / 16 / 32 / 64 – 32 by default);
111
+
-Signedness of index (signed / unsigned – signed by default);
112
+
-Size of index (8 / 16 / 32 / 64 – 32 by default);
111
113
at compile-time, from PCL 1.12 onwards.
112
114
113
115
### Adding a CI job for testing 64bit unsigned index type
@@ -125,17 +127,17 @@ A set of fundamental classes such as `pcl::PointCloud` lie at the core of PCL. T
125
127
For situations where unsigned indices were required, a new type called `uindex_t` was also introduced, which acts as an unsigned version of the `index_t`.
126
128
127
129
This transition was carried out for the following classes:
128
-
- PointCloud
129
-
- PCLPointCloud2
130
-
- PCLBase
131
-
- PCLPointField
132
-
- Correspondences
133
-
- Vertices
134
-
- PCLImage
130
+
-`PointCloud`
131
+
-`PCLPointCloud2`
132
+
-`PCLBase`
133
+
-`PCLPointField`
134
+
-`Correspondences`
135
+
-`Vertices`
136
+
-`PCLImage`
135
137
136
138
During the above transition process, it was discovered that significant additional work was required to address the numerous sign comparison warnings and other errors that arose from the transition in some of the above classes, which took up considerable time.
137
139
138
-
Furthermore, any changes beyond transitioning the above fundamental classes would have required additional workarounds to carry on, if they were to be carried out before the changes to the fundamental classes have been merged. (These features were planned to be merged in in PCL 1.12). Thus, work was shifted to the GPU module at this point.
140
+
Furthermore, any changes beyond transitioning the above fundamental classes would have required additional workarounds to carry on, if they were to be carried out before the changes to the fundamental classes have been merged. (These features were planned to be merged in PCL 1.12). Thus, work was shifted to the GPU module at this point.
139
141
140
142
In addition, while the common module had already been modified to make it compatible with `index_t`, the tests for this module had not been modified. This was achieved with a very straightforward replacement of integer vectors with `index_t` vectors.
141
143
@@ -159,4 +161,4 @@ The work carried out during the period ensure that the GPU octree search functio
Copy file name to clipboardExpand all lines: gsoc-2020/index.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ After a long hiatus, PCL is once more participating in the Google Summer of Code
22
22
23
23
Extending PCL's use case by generating bindings for its use with interface languages like Python, for rapid development and maximal speed. The approach makes use of Pybind11 to expose PCL's C++ code and generate bindings in the form of python modules by using necessary type information. It supports automatic regeneration of the bindings when the underlying C++ code changes, to work with PCL's active development cycle.
24
24
25
-
### [Refactoring, Modernisation & Feature Addition with Emphasis on GU Module](/gsoc-2020/gpu)
25
+
### [Refactoring, Modernisation & Feature Addition with Emphasis on GPU Module](/gsoc-2020/gpu)
26
26
27
27
**Student:**[Haritha Jayasinghe][haritha]
28
28
@@ -71,4 +71,4 @@ This project aims to transition the existing API to forward-compatible unified A
0 commit comments