Download PDFOpen PDF in browser

Performance Analysis of Parallel Programs with RAPIDS as a Framework of Execution

25 pagesPublished: December 11, 2024

Abstract

In this age where data is growing at an astronomical rate, with unfettered access to digital information, complexities have been introduced to scientific computations, analysis, and inferences. This is because such data could not be easily processed with traditional approaches. However, with innovative designs brought to the fore by NVIDIA and other market players in recent times, there have been productions of state-of-the-art GPUs such as NVIDIA A100 Tensor Core GPU, Tesla V100, and NVIDIA H100 that seamlessly handle complex mathematical simulations and computations, artificial intelligence, machine learning, and high-performance computing, producing highly improved speed and efficiency, with room for scalability. These innovations have made it possible to efficiently deploy many parallel programming models like shared memory, distributed memory, data parallelization, and Partitioned Global Address Space (PGAS) with high-performance metrics. In this work, we analyzed the parquet-formatted New York City yellow taxi dataset on a RAPIDS and DASK supported distributed data-parallel training platform using a high-performance cluster of 7 NVIDIA TITAN RTX GPUs (24GB GDDR6 each) running CUDA 12.4. The dataset was used to train Extreme Gradient Boosting (XGBoost), RandomForest Regressor, and Elastic Net models for trip fare predictions. Our models achieved notable performance metrics. The XGBoost achieved a mean squared error (MSE) of 10.87, R2 of 96.9%, and a training time of 21.1 seconds despite the huge size of the training dataset, showing how computationally efficient the system was. Random- Forest achieved MSE of 27.46, R2 of 92.2% and a training time of 25.9 seconds. In the bid to show the scalability and versatility of our experimental design to different machine learning domains, our multi-GPU accelerated training was extended to image classification tasks by using MobileNet-V3-Large pre-trained architecture on a CIFAR-100 dataset. The following parallelization results were achieved: a low Karp-Flatt metric of 0.013, indicating minimal serialization, 98.7% parallel fraction, demonstrating excellent parallelization, and only 7.1% communication overhead relative to computation time. For the model performance, we achieved a ROC AUC of over 95% for the implementation. This work advances the state-of-the-art in parallel computing through implementation of RAPIDS and DASK frameworks on a distributed data-parallel training platform making use of NVIDIA multi-GPUs. The work is built on a well established theoretical framework using Amdahl and Gustafson’s laws on parallel computation. By integrating RAPIDS and DASK, we contribute to advancing parallel computing capabilities, offering potential applications in smart city development and the field of logistics and transportation management services where rapid fare predictions are very important. The contribution could also be extended to the field of image classification, vision systems, object detection and embedded systems for mobile applications.

Keyphrases: cifar 100, cuda programming, cudf, cuml, cupy, dask framework, data science infrastructure, distributed data processing, elastic net, emerging technology, gpu (graphics processing unit), high performance computing, image classification, karp flatt metric, machine learning optimization, mathematical simulations, mobilenet v3 large, parallel computing, parallel efficiency, parallelism, pytorch integration, randomforest, scalability, smart city development, speedup analysis, taxi fare prediction, transportation analytics, xgboost

In: Varvara L Turova, Andrey E Kovtanyuk and Johannes Zimmer (editors). Proceedings of 3rd International Workshop on Mathematical Modeling and Scientific Computing, vol 104, pages 243-267.

BibTeX entry
@inproceedings{MMSC2024:Performance_Analysis_Parallel_Programs,
  author    = {Seyi Ogunji and Moises Sanchez Adame and Oscar Montiel Ross and Juan Tapia},
  title     = {Performance Analysis of Parallel Programs with RAPIDS as a Framework of Execution},
  booktitle = {Proceedings of  3rd International Workshop on Mathematical Modeling and Scientific Computing},
  editor    = {Varvara L Turova and Andrey E Kovtanyuk and Johannes Zimmer},
  series    = {EPiC Series in Computing},
  volume    = {104},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {/publications/paper/qsdP},
  doi       = {10.29007/zpqx},
  pages     = {243-267},
  year      = {2024}}
Download PDFOpen PDF in browser