Skip to content

UVA-LavaLab/TriPIM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

TriPIM (PIM)

TriPIM is an Extension of the TriCORE Approach Using UPMEM and PIM concepts. The TriCORE method introduced an innovative technique for triangle counting in graph analytics, utilizing a binary search-driven mechanism to improve thread parallelism and memory efficiency. In this study, we present TriPIM, which builds upon the foundational principles of TriCORE and integrates with UPMEM PIM technology. This integration aims to further optimize the graph triangle infrastructure by leveraging the advantages of both the TriCORE approach and the capabilities offered by UPMEM.

Project Build Instructions

This document provides instructions on how to build and run various components of the project using the provided Makefile. The Makefile simplifies the compilation and execution process for both CPU and GPU targets, as well as for DPU (Data Processing Units) targets.

Prerequisites

Before building and running the benchmarks, ensure you have the following installed:

  • GNU Compiler Collection (GCC) for C++ compilation
  • NVIDIA CUDA Toolkit for GPU code compilation
  • Python3 for running GPU benchmarks
  • UPMEM DPU Toolchain for compiling and executing DPU benchmarks

Make sure that the g++ and nvcc compilers are accessible in your system's PATH. Additionally, the DPU toolchain must be properly configured if you intend to run the DPU benchmarks.

Available Commands

The Makefile includes several targets to facilitate building, running, and managing the project components:

General Targets

  • all: Compiles all benchmarks, including GAP benchmark suite components and the TriPIM benchmark for CPU, GPU, and DPU platforms.

    make all
  • clean: Removes all build artifacts, including binaries and intermediate files, from the bin and lib directories, along with Python cache files.

    make clean
  • help: Lists all available Makefile commands along with a brief description of each.

    make help

GAP Benchmark Suite

  • tc_cpu: Compiles the Triangle Counting (TC) CPU version of the GAP benchmark suite.

    make tc_cpu
  • gap converter: Builds the graph format converter utility, part of the GAP benchmark suite.

    make converter

TriPIM Benchmark

  • tc_upmem: Builds the Host side and DPU task of the TriPIM benchmark.

    make tc_upmem

Running Benchmarks

  • run-tc_cpu: Executes the CPU version of the TriPIM benchmark with predefined input parameters.

    make run-tc_cpu
  • run-tc_upmem: Simulates the TriPIM benchmark on the host system, ideal for DPU functional simulation.

    make run-tc_upmem

Running Benchmarks

The Makefile provides several targets for building and running specific benchmarks:

  • make run-tc_cpu: Compiles and runs the tc_cpu benchmark (CPU-based)
  • make run-tc_upmem: Compiles and runs the tc_upmem benchmark (DPU-based)
  • make run-%: Runs a specified GAP benchmark (replace % with tc_cpu or tc_upmem)

Each benchmark offers various flags for customization. Refer to the specific benchmark's help message for details:

Graph Loading

All of the binaries use the same command-line options for loading graphs:

  • -g 20 generates a Kronecker graph with 2^20 vertices (Graph500 specifications)
  • -u 20 generates a uniform random graph with 2^20 vertices (degree 16)
  • -f graph.el loads graph from file graph.el
  • -sf graph.el symmetrizes graph loaded from file graph.el

The graph loading infrastructure understands the following formats:

Makefile Targets

The Makefile defines various targets for managing the build process, cleaning, and running benchmarks. Here's a summary of some key targets:

  • all: Builds all targets (including GAP benchmarks)
  • clean: Removes all build artifacts
  • clean-all: Removes build artifacts and results directories
  • scrub-all: Performs a more extensive cleanup (including backups)
  • run-%: Runs a specified GAP benchmark
  • help: Displays a list of available make commands and their descriptions
  • help-%: Provides help for a specific benchmark

Configuration

Several environment variables and Makefile settings control the build process:

  • CXX: C++ compiler (default: g++)
  • UPMEM_NR_TASKLETS: Number of Upmem tasklets (default: 16)
  • UPMEM_NR_DPUS: Number of DPUs (default: 1)
  • UPMEM_PROBLEM_SIZE: Problem size (default: 2)
  • CXXFLAGS_GAP: Compiler flags for GAP benchmarks
  • UPMEM_HOST_FLAGS: Compiler flags for Upmem host code
  • UPMEM_DPU_FLAGS: Compiler flags for Upmem DPU code

Additional Information

For more detailed information about each command and how to use the benchmarks, refer to the help command (make help) or the individual benchmark documentation provided within the project.

Directory Structure

  • bin/: Contains compiled executables for the GAP benchmark suite and the TriPIM CPU benchmark.
  • lib/: Contains the shared library for the TriPIM GPU benchmark.
  • src/: Contains source code for the project, including the GAP benchmark suite and the TriPIM benchmark.

GAP Benchmark Suite (CPU)

GAP Benchmark Suite is designed to be a portable high-performance baseline that only requires a compiler with support for C++11. It uses OpenMP for parallelism, but it can be compiled without OpenMP to run serially. The details of the benchmark can be found in the specification.

The GAP Benchmark Suite is intended to help graph processing research by standardizing evaluations. Fewer differences between graph processing evaluations will make it easier to compare different research efforts and quantify improvements. The benchmark not only specifies graph kernels, input graphs, and evaluation methodologies, but it also provides an optimized baseline implementation (this repo). These baseline implementations are representative of state-of-the-art performance, and thus new contributions should outperform them to demonstrate an improvement.

TRICORE (GPU)

  • TRICORE, a GPU-optimized triangle counting system distinguished by three core techniques:

    • Binary Search Algorithm: Designed to bolster thread parallelism and memory efficiency on GPUs, filling gaps from earlier models.
    • Graph Representation Streamlining: Unlike previous methods that demanded various graph representations (like CSR, edge list, and bitmap) in the GPU memory, TRICORE uniquely distributes partitioned CSR data among GPUs. Additionally, it employs a streaming buffer, allowing edge lists to be fetched directly from CPU memory. This strategy empowers TRICORE to handle graphs substantially larger than typical GPU memory capacities.
    • Dynamic Workload Management: Crafted to ensure a balanced GPU workload distribution.
  • Performance Insights:

    • TRICORE processed the billion-edge Twitter graph in just 24 seconds on a single GPU—a staggering 22 times faster compared to leading CPU-based methods, even when those CPUs cost 8 times more.
    • For expansive graphs (up to 33.4 billion edges) that dwarf a single GPU's memory by about 22 times, TRICORE achieves a 24-fold performance increase as the system scales from 1 to 32 GPUs.

TriPIM (UPMEM)

TriPIM is based on PrIM the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is developed to evaluate, analyze, and characterize the first publicly-available real-world processing-in-memory (PIM) architecture, the UPMEM PIM architecture. The UPMEM PIM architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip.

PrIM provides a common set of workloads to evaluate the UPMEM PIM architecture with and can be useful for programming, architecture and system researchers all alike to improve multiple aspects of future PIM hardware and software. The workloads have different characteristics, exhibiting heterogeneity in their memory access patterns, operations and data types, and communication patterns. This repository also contains baseline CPU and GPU implementations of PrIM benchmarks for comparison purposes.

PrIM also includes a set of microbenchmarks can be used to assess various architecture limits such as compute throughput and memory bandwidth.

Kernels Included

  • Triangle Counting (TC) - Order invariant with possible relabelling
    • CPU - GAP
    • GPU - TRICORE
    • PIM - UPMEM

How to Cite

Please cite the following papers if you find this repository useful.

About

A CPU/GPU/PIM implementations for Intersection and Triangle Counting operations using Binary Search

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •