Skip to content

Commit 4e48dcc

Browse files
authored
Merge pull request #940 from CannonLock/jstathas-patch-5
Jstathas patch 5
2 parents 8e95b1a + c1a9b78 commit 4e48dcc

14 files changed

+53
-229
lines changed

Gemfile.lock

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,4 +159,4 @@ DEPENDENCIES
159159
webrick
160160

161161
BUNDLED WITH
162-
2.6.5
162+
2.7.2

_data/fellowships/classifying-user-contributed-images.yaml

Lines changed: 0 additions & 17 deletions
This file was deleted.

_data/fellowships/expanding-pelican-with-globus.yaml

Lines changed: 0 additions & 27 deletions
This file was deleted.

_data/fellowships/high-throughput-inference.yaml

Lines changed: 0 additions & 21 deletions
This file was deleted.

_data/fellowships/measuring-throughput-in-chtc.yaml

Lines changed: 0 additions & 15 deletions
This file was deleted.

_data/fellowships/measuring-throughput-in-CHTC.yml renamed to _data/fellowships/measuring-throughput-in-chtc.yml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
title: Classifying User Contributed Images
2-
type: Facilitation
2+
type: Research Facilitation
33
summary: |
44
CHTC’s High Throughput Computing (HTC) system supports hundreds of users and thousands of jobs each day. It is optimized for workloads or sets of jobs, where many jobs can run in parallel as computational capacity becomes available. This project aims to better understand the impact of workload size and requirements on overall throughput through empirical measurement of workloads in CHTC. A key component of the project will be developing tools to a) submit sample workloads and b) gather metrics about their performance. Once these tools are developed, they can be used to run experiments with different workload types.
55
@@ -14,6 +14,4 @@ summary: |
1414
1515
- Familiarity with unix and Python
1616
- Familiarity with git
17-
-Familiarity with HTCondor]
18-
19-
sort: 0
17+
- Familiarity with HTCondor

_data/fellowships/monitoring-chtc.yaml

Lines changed: 0 additions & 15 deletions
This file was deleted.

_data/fellowships/pelican-cache-monitoring.yaml

Lines changed: 0 additions & 24 deletions
This file was deleted.

_data/fellowships/pelican-client-request-tracking.yaml

Lines changed: 0 additions & 21 deletions
This file was deleted.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
title: Description Distributed Tracing and Log Aggregation for Pelican Request Lifecycle
2+
type: Software Development
3+
sort: 0
4+
summary: |
5+
In a distributed system with multiple services communicating with one another, a key challenge is correlating logging information from different services that handle a single job or client request. This project aims to design and implement a method for aggregating all logs generated during a client request by introducing a unique identifier that acts as a foreign key to link every log entry together. This focused approach will ensure administrators can precisely trace the path of a request through the system, identifying the services involved and pinpointing the exact location of errors or performance-related events recorded in the logs.
6+
7+
The primary objective of this project is to implement a system for auto-aggregation and tracing of request data across [Pelican’s](https://pelicanplatform.org/) distributed architecture. The goal is to move beyond siloed log files to ensure a complete picture of job execution is available for administrators. The core solution involves determining how to aggregate the logging data as well as creating a unique identifier that is generated and propagated throughout the system, acting as a foreign key to link every log entry together. The fellow will be responsible for defining the tracing methodology, augmenting the request ID throughout the application layers, and making critical adjustments in the Pelican code. The fellow will develop client tooling to utilize this trace ID for diagnostics and will learn to inject diagnostic information back into the result ad for retrospective analysis via HTCondor.
8+
9+
Questions the fellow will have to answer in the course of the project: How do we define the foreign key when one pelican command could translate to multiple transfers or jobs? How best can we aggregate the logs into a searchable system? How can the system handle the continuously growing size of the logs?
10+
11+
By the end of the fellowship, the fellow will acquire a comprehensive understanding of distributed data systems and gain hands-on experience designing and implementing a tracing system for log correlation. They will be responsible for defining the auto-aggregation and tracing methodology using this unique identifier, and for augmenting the request ID through all layers of the Pelican code. This work will include adjusting selective places in the Pelican code and developing client tooling to utilize the trace ID. Additionally, the fellow will solidify their practical skills in Python and Go programming.
12+
13+
#### Project Objectives:
14+
15+
The project's specific objectives are broken down to reflect both the high-level design and the necessary low-level implementation:
16+
17+
- Implement UUID-based Tracing: Establish the methodology for UUID generation/propagation and use it as a foreign key for log correlation across all services.
18+
- Augment Service Logs: Adjust selective places in the Pelican code to ensure the UUID is consistently captured.
19+
- Develop Client Tooling: Create tools that run on the client or service hosts to leverage the UUID for direct log retrieval and diagnostics.
20+
- System Integration: Create a system for client-side request tracking that leverages the aggregated data.
21+
22+
#### Prerequisite skills or education that would be good for the Fellow to have to work on the project:
23+
24+
- Python and Golang (required)
25+
- Linux/CLI (required)
26+
- HTTP development (preferred)
27+
- Distributed Computing (preferred)
28+
- Git/GitHub/GitHub Actions (preferred)
29+
- Docker/Kubernetes (preferred)

0 commit comments

Comments
 (0)