Athlete Injury Risk Analyzer

This project focuses on analyzing athlete performance data using symmetry metrics and building predictive models to classify athletes into different risk categories (Low Risk, Medium Risk, High Risk).

Roadmap

1. Data Preparation

Data Collection:
- Source athlete performance data with metrics such as leftAvgForce, rightAvgForce, ImpulseSymmetry, etc.
- Ensure data contains necessary identifiers (sbuid, testDateUtc).
Pivoting the Data:
- Reshape the dataset using a pivot table to organize metrics as columns for each athlete and test date.
Preprocessing:
- Calculate symmetry metrics (ForceSymmetry, ImpulseSymmetry, MaxForceSymmetry, TorqueSymmetry).
- Handle missing values, remove duplicates, and cap outliers using the IQR method.
Threshold Definition:
- Define thresholds for symmetry metrics based on domain knowledge or data distribution.
- Apply dynamic buffer logic for flexible risk categorization.

2. Hypothesis Testing

Group symmetry metrics by risk categories (Low Risk, Medium Risk, High Risk).
Perform statistical tests (e.g., ANOVA or Kruskal-Wallis) to assess if differences in metrics across categories are significant.
Document results and determine which metrics are most impactful for risk categorization.

3. Model Development

Define Features and Target:
- Features: Symmetry metrics (ForceSymmetry, MaxForceSymmetry, TorqueSymmetry).
- Target: RiskCategory (encoded as Low = 0, Medium = 1, High = 2).
Handle Class Imbalance:
- Experiment with different techniques:
  - SMOTE Oversampling
  - SMOTEENN (Combined Oversampling and Undersampling)
  - Class Weights in Random Forest
Train Models:
- Build Random Forest models for each technique.
- Compare results of the following:
  - Model 1: SMOTE Oversampling
  - Model 2: No Balancing
  - Model 3: SMOTEENN
  - Model 4: Class Weights
Evaluate Models:
- Metrics: Accuracy, F1-Score, Precision, Recall, Confusion Matrix.
- Visualize confusion matrices and accuracy comparisons for each model.

4. Prediction on Unseen Data

Preprocess new datasets to calculate symmetry metrics.
Use trained models to predict risk categories for new athletes.
Compare predictions across all models for consistency and reliability.

5. Visualizations and Insights

Plot risk distribution across metrics.
Visualize confusion matrices for all models.
Display quarterly trends for selected athletes (if temporal data is available).
Summarize hypothesis testing results and model comparisons in charts.

6. Deployment and Documentation

Deploy the final models and preprocessing pipeline using a tool like Streamlit.
Document the project with:
- A clear README file summarizing the project.
- Model insights and key findings.
Provide options for future enhancements:
- Expand to include additional symmetry metrics.
- Test on a broader dataset with different sports.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
research		research
trained-model		trained-model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_for_modelling.ipynb		data_for_modelling.ipynb
hypothesis_testing.ipynb		hypothesis_testing.ipynb
model_with_MLFlow.ipynb		model_with_MLFlow.ipynb
model_with_MLFlow.py		model_with_MLFlow.py
models.ipynb		models.ipynb
streamlit.py		streamlit.py
streamlit_with_mlflow.py		streamlit_with_mlflow.py
vald_analysis.ipynb		vald_analysis.ipynb
vald_data.ipynb		vald_data.ipynb
vald_pivot.ipynb		vald_pivot.ipynb
vald_preprocessing.ipynb		vald_preprocessing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Athlete Injury Risk Analyzer

Roadmap

1. Data Preparation

2. Hypothesis Testing

3. Model Development

4. Prediction on Unseen Data

5. Visualizations and Insights

6. Deployment and Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AzaRKazar/Athlete-Injury-Risk-Analyzer

Folders and files

Latest commit

History

Repository files navigation

Athlete Injury Risk Analyzer

Roadmap

1. Data Preparation

2. Hypothesis Testing

3. Model Development

4. Prediction on Unseen Data

5. Visualizations and Insights

6. Deployment and Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages