This project contains a series of open-ended requirements which describe the project we’ll be building. There are many possible ways to correctly fulfill all of these requirements.
We will create a linear regression model that predicts the outcome for a tennis player based on their playing habits. By analyzing and modeling the Association of Tennis Professionals (ATP) data, we will determine what it takes to be one of the best tennis players in the world.
The ATP men’s tennis dataset includes a wide array of tennis statistics, which are described below:
Identifying Data
Player: name of the tennis playerYear: year data was recorded
Service Game Columns (Offensive)
Aces: number of serves by the player where the receiver does not touch the ballDoubleFaults: number of times player missed both first and second serve attemptsFirstServe: % of first-serve attempts madeFirstServePointsWon: % of first-serve attempt points won by the playerSecondServePointsWon: % of second-serve attempt points won by the playerBreakPointsFaced: number of times where the receiver could have won service game of the playerBreakPointsSaved: % of the time the player was able to stop the receiver from winning service game when they had the chanceServiceGamesPlayed: total number of games where the player servedServiceGamesWon: total number of games where the player served and wonTotalServicePointsWon: % of points in games where the player served that they won
Return Game Columns (Defensive)
FirstServeReturnPointsWon: % of opponents first-serve points the player was able to winSecondServeReturnPointsWon: % of opponents second-serve points the player was able to winBreakPointsOpportunities: number of times where the player could have won the service game of the opponentBreakPointsConverted: % of the time the player was able to win their opponent’s service game when they had the chanceReturnGamesPlayed: total number of games where the player’s opponent servedReturnGamesWon: total number of games where the player’s opponent served and the player wonReturnPointsWon: total number of points where the player’s opponent served and the player wonTotalPointsWon: % of points won by the player
Outcomes
Wins: number of matches won in a yearLosses: number of matches lost in a yearWinnings: total winnings in USD($) in a yearRanking: ranking at the end of year
- Perform exploratory analysis on the data by plotting different features against the different outcomes.
- Use one feature from the dataset to build a single feature linear regression model on the data.
- Create a few more linear regression models that use one feature to predict one of the outcomes.
- Create a few linear regression models that use two features to predict yearly earnings.
- Create a few linear regression models that use multiple features to predict yearly earnings.
The dataset is provided in tennis_stats.csv is data from the men’s professional tennis league, which is called the ATP (Association of Tennis Professionals). Data from the top 1500 ranked players in the ATP over the span of 2009 to 2017 are provided in file. The statistics recorded for each player in each year include service game (offensive) statistics, return game (defensive) statistics and outcomes.