Parallel Rapidly Exploring Random Trees
Team Members
Zixin Shen, Haoran Zhang
Checkpoint
Click for our checkpoint report
checkpointFinal
Click for our final report
finalSummary
We propose to implement a parallel version of RRT algorithm, compare its performance against the serial version and benchmark it on different number of cores.
Background
RRT is Rapidly Exploring Random Trees algorithm typically used in motion planning in autonomous vehicles therefore it's quite important to speed up the computation time to reduce the decision time of autonomous vehicle. RRT-based algorithms are relatively easy to implement without establishing a detailed map and also they are efficient for high dimensional state space.
Challenge
1. Difficult to parallelize outer iterative loop because each iteration has strong data dependency with the previous iteration because they need to update the kd-tree, insert the node.
2. Hard to parallelize both finding nearby nodes and collision detection at the same time due to the nature of the algorithm.
Resources
We will start by a serial implementation of RRT and we will use the OpenMP to accelerate the RRT and analyze the speedup of the algorithm. And also we found some implementations using Cuda or OpenMP without optimal data structure, we would like to evaluate their performance and compare our performance with theirs.
Goals and Deliverables
Basic Goal
Our basic goal is to have a working version of the RRT algorithm, as well as the benchmark script to measure the performance, then implement a basic parallel version RRT and compare the performance differences.
Ideal Goal
Our ideal goal is to improve our parallel version RRT with better parallel strategy as well as using data structures such as k-d tree and compare the performance difference with version 1 and speedup with different number of cores.
Extra Goal
If things go well and we still have time, we will try to mitigate our algorithm on GPU using cuda and measure performance difference. We will also try to make our k-d tree lock free and achieve even better parallelism.
Platform Choice:
We will use the multi-core CPU (the GHC machines) for our basic implementation. And we will try to benchmark it on the PSC machine as well. Then if we have time, we will try to mitigate it on the GPU.
Timeline
W1: 11/09-11/15 | Implement Serial RRT and reproduce the parallelized version of RRT other paper shows |
W2: 11/16-11/22 | Devise the Parallelized RRT, analyze the performance, figure out the bottleneck |
W3: 11/23-11/29 | Improve our Parallelized RRT with k-d tree, compared with cuda version, and initial parallelized RRT version. |
11/30 | Milestone Report |
W4: 11/30-12/08 | Visualize the path generation and results, finalize the final report. |
11/09 | Poster Session |