TriPIM (PIM)

TriPIM is an Extension of the TriCORE Approach Using UPMEM and PIM concepts. The TriCORE method introduced an innovative technique for triangle counting in graph analytics, utilizing a binary search-driven mechanism to improve thread parallelism and memory efficiency. In this study, we present TriPIM, which builds upon the foundational principles of TriCORE and integrates with UPMEM PIM technology. This integration aims to further optimize the graph triangle infrastructure by leveraging the advantages of both the TriCORE approach and the capabilities offered by UPMEM.

Project Build Instructions

This document provides instructions on how to build and run various components of the project using the provided Makefile. The Makefile simplifies the compilation and execution process for both CPU and GPU targets, as well as for DPU (Data Processing Units) targets.

Prerequisites

Before building and running the benchmarks, ensure you have the following installed:

GNU Compiler Collection (GCC) for C++ compilation
NVIDIA CUDA Toolkit for GPU code compilation
Python3 for running GPU benchmarks
UPMEM DPU Toolchain for compiling and executing DPU benchmarks

Make sure that the g++ and nvcc compilers are accessible in your system's PATH. Additionally, the DPU toolchain must be properly configured if you intend to run the DPU benchmarks.

Available Commands

The Makefile includes several targets to facilitate building, running, and managing the project components:

General Targets

all: Compiles all benchmarks, including GAP benchmark suite components and the TriPIM benchmark for CPU, GPU, and DPU platforms.
```
make all
```
clean: Removes all build artifacts, including binaries and intermediate files, from the bin and lib directories, along with Python cache files.
```
make clean
```
help: Lists all available Makefile commands along with a brief description of each.
```
make help
```

GAP Benchmark Suite

tc_cpu: Compiles the Triangle Counting (TC) CPU version of the GAP benchmark suite.
```
make tc_cpu
```
gap converter: Builds the graph format converter utility, part of the GAP benchmark suite.
```
make converter
```

TriPIM Benchmark

tc_upmem: Builds the Host side and DPU task of the TriPIM benchmark.
```
make tc_upmem
```

Running Benchmarks

run-tc_cpu: Executes the CPU version of the TriPIM benchmark with predefined input parameters.
```
make run-tc_cpu
```
run-tc_upmem: Simulates the TriPIM benchmark on the host system, ideal for DPU functional simulation.
```
make run-tc_upmem
```

Running Benchmarks

The Makefile provides several targets for building and running specific benchmarks:

make run-tc_cpu: Compiles and runs the tc_cpu benchmark (CPU-based)
make run-tc_upmem: Compiles and runs the tc_upmem benchmark (DPU-based)
make run-%: Runs a specified GAP benchmark (replace % with tc_cpu or tc_upmem)

Each benchmark offers various flags for customization. Refer to the specific benchmark's help message for details:

Graph Loading

All of the binaries use the same command-line options for loading graphs:

-g 20 generates a Kronecker graph with 2^20 vertices (Graph500 specifications)
-u 20 generates a uniform random graph with 2^20 vertices (degree 16)
-f graph.el loads graph from file graph.el
-sf graph.el symmetrizes graph loaded from file graph.el

The graph loading infrastructure understands the following formats:

.el plain-text edge-list with an edge per line as node1 node2
.wel plain-text weighted edge-list with an edge per line as node1 node2 weight
.gr 9th DIMACS Implementation Challenge format
.graph Metis format (used in 10th DIMACS Implementation Challenge)
.mtx Matrix Market format
.sg serialized pre-built graph (use converter to make)
.wsg weighted serialized pre-built graph (use converter to make)

Makefile Targets

The Makefile defines various targets for managing the build process, cleaning, and running benchmarks. Here's a summary of some key targets:

all: Builds all targets (including GAP benchmarks)
clean: Removes all build artifacts
clean-all: Removes build artifacts and results directories
scrub-all: Performs a more extensive cleanup (including backups)
run-%: Runs a specified GAP benchmark
help: Displays a list of available make commands and their descriptions
help-%: Provides help for a specific benchmark

Configuration

Several environment variables and Makefile settings control the build process:

CXX: C++ compiler (default: g++)
UPMEM_NR_TASKLETS: Number of Upmem tasklets (default: 16)
UPMEM_NR_DPUS: Number of DPUs (default: 1)
UPMEM_PROBLEM_SIZE: Problem size (default: 2)
CXXFLAGS_GAP: Compiler flags for GAP benchmarks
UPMEM_HOST_FLAGS: Compiler flags for Upmem host code
UPMEM_DPU_FLAGS: Compiler flags for Upmem DPU code

Additional Information

For more detailed information about each command and how to use the benchmarks, refer to the help command (make help) or the individual benchmark documentation provided within the project.

Directory Structure

bin/: Contains compiled executables for the GAP benchmark suite and the TriPIM CPU benchmark.
lib/: Contains the shared library for the TriPIM GPU benchmark.
src/: Contains source code for the project, including the GAP benchmark suite and the TriPIM benchmark.

GAP Benchmark Suite (CPU)

GAP Benchmark Suite is designed to be a portable high-performance baseline that only requires a compiler with support for C++11. It uses OpenMP for parallelism, but it can be compiled without OpenMP to run serially. The details of the benchmark can be found in the specification.

The GAP Benchmark Suite is intended to help graph processing research by standardizing evaluations. Fewer differences between graph processing evaluations will make it easier to compare different research efforts and quantify improvements. The benchmark not only specifies graph kernels, input graphs, and evaluation methodologies, but it also provides an optimized baseline implementation (this repo). These baseline implementations are representative of state-of-the-art performance, and thus new contributions should outperform them to demonstrate an improvement.

TRICORE (GPU)

TRICORE, a GPU-optimized triangle counting system distinguished by three core techniques:
- Binary Search Algorithm: Designed to bolster thread parallelism and memory efficiency on GPUs, filling gaps from earlier models.
- Graph Representation Streamlining: Unlike previous methods that demanded various graph representations (like CSR, edge list, and bitmap) in the GPU memory, TRICORE uniquely distributes partitioned CSR data among GPUs. Additionally, it employs a streaming buffer, allowing edge lists to be fetched directly from CPU memory. This strategy empowers TRICORE to handle graphs substantially larger than typical GPU memory capacities.
- Dynamic Workload Management: Crafted to ensure a balanced GPU workload distribution.
Performance Insights:
- TRICORE processed the billion-edge Twitter graph in just 24 seconds on a single GPU—a staggering 22 times faster compared to leading CPU-based methods, even when those CPUs cost 8 times more.
- For expansive graphs (up to 33.4 billion edges) that dwarf a single GPU's memory by about 22 times, TRICORE achieves a 24-fold performance increase as the system scales from 1 to 32 GPUs.

TriPIM (UPMEM)

TriPIM is based on PrIM the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is developed to evaluate, analyze, and characterize the first publicly-available real-world processing-in-memory (PIM) architecture, the UPMEM PIM architecture. The UPMEM PIM architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip.

PrIM provides a common set of workloads to evaluate the UPMEM PIM architecture with and can be useful for programming, architecture and system researchers all alike to improve multiple aspects of future PIM hardware and software. The workloads have different characteristics, exhibiting heterogeneity in their memory access patterns, operations and data types, and communication patterns. This repository also contains baseline CPU and GPU implementations of PrIM benchmarks for comparison purposes.

PrIM also includes a set of microbenchmarks can be used to assess various architecture limits such as compute throughput and memory bandwidth.

Kernels Included

Triangle Counting (TC) - Order invariant with possible relabelling
- CPU - GAP
- GPU - TRICORE
- PIM - UPMEM

How to Cite

Please cite the following papers if you find this repository useful.

Scott Beamer, Krste Asanović, David Patterson. "The GAP Benchmark Suite". arXiv:1508.03619 [cs.DC], 2015.
Hu, Yang, Hang Liu, and H. Howie Huang. "Tricore: Parallel triangle counting on gpus". SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2018.
Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu, "Benchmarking Memory-centric Computing Systems: Analysis of Real Processing-in-Memory Hardware". 2021 12th International Green and Sustainable Computing Conference (IGSC). IEEE, 2021.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
bench		bench
docs/figures		docs/figures
scripts/prim		scripts/prim
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TriPIM (PIM)

Project Build Instructions

Prerequisites

Available Commands

General Targets

GAP Benchmark Suite

TriPIM Benchmark

Running Benchmarks

Running Benchmarks

Graph Loading

Makefile Targets

Configuration

Additional Information

Directory Structure

GAP Benchmark Suite (CPU)

TRICORE (GPU)

TriPIM (UPMEM)

Kernels Included

How to Cite

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

UVA-LavaLab/TriPIM

Folders and files

Latest commit

History

Repository files navigation

TriPIM (PIM)

Project Build Instructions

Prerequisites

Available Commands

General Targets

GAP Benchmark Suite

TriPIM Benchmark

Running Benchmarks

Running Benchmarks

Graph Loading

Makefile Targets

Configuration

Additional Information

Directory Structure

GAP Benchmark Suite (CPU)

TRICORE (GPU)

TriPIM (UPMEM)

Kernels Included

How to Cite

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages