Skip to content

deepfield-ml/FusionForm

Repository files navigation

FusionForm: Tactical Formation Analytics

Developed by DeepField ML - Gordon.H

A sophisticated football analytics platform leveraging historical match data and machine learning to provide tactical formation recommendations.

Go License: GPL v2

Table of Contents

  1. Introduction
  2. Features
  3. Architecture Overview
  4. Data Architecture
  5. Installation
  6. Data Sources
  7. Data Processing Pipeline
  8. Formation Conversion Logic
  9. Machine Learning System
  10. Formation Recommendation System
  11. Integration Guide
  12. Performance Metrics
  13. Contributing
  14. License

Introduction

FusionForm is a sophisticated football analytics platform designed to provide data-driven tactical insights. It analyzes historical match outcomes and the formations used by teams to build a predictive model. The system aims to recommend the most effective tactical formation for a team to adopt when facing a specific opponent formation, thereby supporting tactical decision-making.

Features

  • Historical Data Processing: Ingests and prepares game results and formation data from diverse sources.
  • Dual Perspective Analysis: Transforms data to capture tactical interactions from the viewpoint of both competing teams.
  • Formation Conversion: Intelligent conversion mechanisms to adapt formations between different team sizes while preserving tactical intent.
  • Machine Learning Prediction: A neural network model predicts the likely outcome score of a matchup between two formations.
  • Formation Recommendation Engine: Identifies the statistically optimal formation against a given opponent formation based on model predictions.
  • Efficient Inference: Optimized model deployment for fast prediction generation.
  • Integration Ready: Designed for seamless integration into other applications and workflows.

Architecture Overview

The system operates as a pipeline, transforming raw data through several stages into actionable tactical recommendations.

Loading
flowchart TD
    A[Game Results Data] --> B[Data Preprocessing]
    C[Formation Data] --> B
    B --> D[Dual Perspective Transformation]
    D --> E[Formation Encoding]
    E --> F[Neural Network Training]
    F --> G[Formation Recommendation Engine]
    G --> H[Tactical Decision Support]

Data Architecture

The foundation of FusionForm's analysis is built upon game results and detailed formation data. Processed data, including encoded formations and the trained predictive model, fuels the recommendation engine.

Loading
erDiagram
    GAME_RESULTS {
        string date
        string match_date
        string home_team
        string away_team
        int goals_home
        int goals_away
        string result
    }

    FORMATION_DATA {
        int gameid
        string team
        string formation
        string players
    }

    ENCODED_FORMATIONS {
        int formation_id
        string formation_pattern
        vector embedding
    }

    MODEL {
        string name
        binary weights
        float accuracy
    }

    FORMATION_RECOMMENDATION {
        string opponent_formation
        string recommended_formation
        float predicted_outcome_score
    }

    GAME_RESULTS }|--|| FORMATION_DATA : contextualizes
    FORMATION_DATA }|--|| ENCODED_FORMATIONS : transforms_to
    ENCODED_FORMATIONS }|--|| MODEL : trains
    MODEL }|--|| FORMATION_RECOMMENDATION : generates

The system utilizes game results data (match outcomes, team details) stored in a parquet file, complemented by formation data from CSV files that provide specific tactical layouts used in matches. These sources are integrated and processed to derive insights.

Installation

Directly install the unix executables(mac and linux) and run chmod +X "NAME OF FILE" to be able to run it. Or you can clone the repo and run the codes and models.

Data Sources

  • Game Results: Contains fundamental match data including dates, teams, and scores. This data provides the outcome information used to train the predictive model.
  • Formation Data: Provides the tactical formations employed by each team in recorded matches, linked to the game results data.
  • Source: Great thanks for Mr.Schoch for providing the dataset. [https://github.com/schochastics/football-data]

Data Processing Pipeline

The data processing pipeline transforms raw historical data into a structured and clean format suitable for machine learning.

Data Loading

This initial stage involves reading raw data from its stored locations – game results from a parquet file and formation data from various CSV files. The data is loaded into a usable structure, typically in-memory dataframes.

Data Cleaning

Ensuring data consistency is crucial. This step involves standardizing entries such as team names, removing inconsistencies or variations (like parenthetical team origin notes), and cleaning formation strings to a uniform format. Handling missing or malformed data points also occurs here.

Perspective Transformation

A key analytical step where each match record is processed to generate two distinct data points: one from the perspective of the home team and one from the away team. This transformation frames the data around "Our Formation" vs. "Opponent Formation" and the "Outcome" from "Our" perspective, effectively doubling the dataset and enabling the model to learn formation matchups neutrally.

Loading
graph LR
    A[Original Match Data] --> B{Home Perspective}
    A --> C{Away Perspective}
    B --> D[Our Formation: 4-3-3]
    B --> E[Opponent Formation: 4-4-2]
    B --> F[Outcome: Win]
    C --> G[Our Formation: 4-4-2]
    C --> H[Opponent Formation: 4-3-3]
    C --> I[Outcome: Loss]

    A("Home Team: 4-3-3<br>Away Team: 4-4-2<br>Score: 2-1")
    D("Our Formation: 4-3-3")
    E("Opponent Formation: 4-4-2")
    F("Outcome: Win")
    G("Our Formation: 4-4-2")
    H("Opponent Formation: 4-3-3")
    I("Outcome: Loss")

Formation Conversion Logic

A unique capability allowing the system to analyze tactical setups across different team sizes (e.g., 11-a-side, 8-a-side, 7-a-side). This is achieved through rule-based conversion processes designed to maintain the tactical essence of the original formation.

11-to-7 Conversion

This logic condenses an 11-player formation into a 7-player structure. It employs a prioritized rule set focusing on retaining the central defensive core, midfield control, and attacking presence while adapting width and balance for the smaller team size.

8-to-11 Conversion

The reverse process expands an 8-player formation to an 11-player one. This involves strategically adding players, such as full-backs and wide midfielders, to enhance defensive solidity, midfield presence, and attacking options while preserving the strategic intent of the original 8-a-side formation.

Loading
graph TD
    A[8-a-side Formation] --> B{Categorize Roles}
    B --> C[Expand Defense]
    C --> D[4 Defenders]
    B --> E[Expand Midfield]
    E --> F[4 Midfielders]
    B --> G[Adjust Attack]
    G --> H[2-3 Attackers]
    D & F & H --> I[11-a-side Formation]

Position-Aware Transformations

Integral to the conversion logic is the consideration of specific player positions and roles (e.g., central vs. wide, defensive vs. attacking responsibilities). This ensures that conversions are not merely numerical reductions or expansions but maintain the tactical balance and functional layout of the formation.

Machine Learning System

The core of FusionForm's predictive power lies in its machine learning system, which learns from historical data to predict formation matchup outcomes.

Data Encoding

Before being fed into the neural network, formation strings are converted into a numerical format. This process involves mapping each unique formation string to a numerical identifier and then representing these identifiers as dense or sparse vectors (e.g., using one-hot encoding).

Model Architecture

The predictive model is a neural network designed to take the encoded representations of two formations (our team's and the opponent's) as input. It processes this information through layers of interconnected nodes to produce a single output value, representing the predicted outcome score for that formation matchup.

Loading
graph LR
    A[Input Layer: 2*num_classes] --> B[Hidden Layer 1: 64 neurons]
    B --> C[Hidden Layer 2: 32 neurons]
    C --> D[Output Layer: 1 neuron]

Training Process

The model is trained using the prepared and encoded historical data. The training process involves iterating through the data multiple times (epochs), feeding batches of formation matchups to the model, calculating the error between the model's prediction and the actual historical outcome, and adjusting the model's internal parameters (weights) to minimize this error using an optimization algorithm. A portion of the data is held out for validation during training.

Inference Engine

Once trained, the neural network model is exported to an efficient format like ONNX. This allows the model to be loaded and run quickly to generate predictions (inference) in deployment environments, separate from the training setup.

Formation Recommendation System

This system utilizes the trained machine learning model to provide tactical recommendations.

Algorithm

Given a specific opponent formation, the system evaluates all possible formations that the 'recommending' team could use. For each potential 'our' formation, it queries the machine learning model to predict the outcome score of that particular matchup (our formation vs. opponent formation).

Optimization Strategy

The system's optimization strategy is to identify and recommend the formation that yields the highest predicted outcome score against the specified opponent formation. This exhaustive evaluation ensures that the system considers all known tactical options to find the statistically most favorable setup. The system includes checks to ensure the opponent formation provided is valid and recognized.

Loading
graph TD
    A[Opponent Formation Input] --> B{Validate Format & Recognition}
    B -- Invalid --> C[Report Error]
    B -- Valid --> D[Encode Opponent Formation]
    D --> E[Generate All Possible Formations]
    E --> F{For Each Possible Formation}
    F --> G[Combine with Opponent Encoded]
    G --> H[Predict Outcome Score ONNX]
    H --> I{Compare Score}
    I -- New Best? --> J[Update Best Formation]
    I -- Not Best --> F
    J --> F
    F -- All Evaluated --> K[Return Best Formation]

Integration Guide

Integrating FusionForm's core recommendation capability involves loading the trained ONNX model and the formation encoding mapping (LabelEncoder). Once loaded, these components can be used to process an opponent formation input and retrieve the recommended formation output via a dedicated function.

Consult the relevant source files and documentation for specific API details on loading components and calling the recommendation function.

Performance Metrics

This section details the metrics used to evaluate the performance of the FormationPredictor model and the configuration used during its training.

Model Performance Metrics

The primary metric used to evaluate the effectiveness of the neural network model is Mean Squared Error (MSE). This metric measures the average squared difference between the predicted outcome score (representing the predicted result of a formation matchup) and the actual outcome value derived from the historical data.

Training Configuration

The model training process is configured with the following settings:

  • Data Split: The processed dataset is split into training and testing sets, with 80% used for training the model and the remaining 20% held out for evaluating its performance after each training iteration.
  • Batch Size: Training updates are performed using batches of 64 data samples.
  • Number of Epochs: The model is trained for 5 full passes over the entire training dataset.
  • Optimizer: The Adam optimization algorithm is used to adjust the model's parameters, with a learning rate set to 0.001.

Evaluation During Training

Model performance on the test dataset is evaluated after the completion of each training epoch. This involves using the model to predict outcomes for the held-out test data. The Mean Squared Error is calculated on these predictions compared to the actual test outcomes. This test loss provides an indication of the model's generalization capability and helps monitor for potential overfitting as training progresses.

Model Architecture

The performance metrics are calculated for the FormationPredictor neural network, which has a layered structure designed to process encoded formation inputs. It includes an input layer sized to accept the combined representation of two formations, followed by two hidden layers with 64 and 32 neurons respectively, and concludes with a single output neuron that produces the predicted outcome score.

Results Format

During the training process, the calculated test loss value for each epoch is printed to the console. The output format displays the current epoch number and the corresponding Mean Squared Error on the test dataset, typically formatted to four decimal places (e.g., Epoch X/5, Test Loss: Y.YYYY).

Contributing

We welcome contributions to FusionForm! Please submit pull requests or issues.

License

This project is licensed under the LICENSE file.

About

Evaluate/Convert best football formations based on opponent formations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published