Carnegie Mellon University
Centre College
Charlotte FC, External Advisor
Pressing: a defensive tactic where players apply coordinated pressure on the opponent with the ball to force mistakes, win back possession, and quickly transition to attack
Forced Turnover: when a player loses possession due to opponent pressure, resulting in the opposing team gaining control. This includes misplaced passes, interceptions, successful tackles, or losing control under pressure - all direct results of effective defensive pressure
Manchester City wins back possession seconds after losing it through pressing.
Source: SkillCorner
Dataset:
Three data types:
1. Identify when a press occurs using player tracking data
2. Define the criteria for an effective press
3. Modeling & Results
Problem: Doesn’t account for direction and oversimplifies pressing
Adopted from Andrienko et al. (2017).1
\[ L = D_{back} + (D_{front} - D_{back})(z^3 + 0.3z) / 1.3 \] where:
\(L\) = the maximum distance limit for effective pressure at angle \(\theta\)
\(D_{back}\) = the max. distance limit when the presser is positioned behind the ball carrier
\(D_{front}\) = the max. distance limit when the presser is positioned in front of the ball carrier
\(z\) = \((1 - cos \theta) / 2\)
\(\theta\) = the angle between the vector from the ball carrier to the center of the attacking goal (our threat direction) and the vector from the ball carrier to the presser
Andrienko et al. (2017) determined the distance thresholds \(D_{back}\) and \(D_{front}\) to be 3m and 9m, respectively, based on consultation with football (soccer) experts. He later performed an experiment to verify these parameters.
A defending player was classified as “pressing” if they were simultaneously
Pressing actions were grouped into sequences if at least one defender continued pressing within 1.5 seconds.
252,464 pressing sequences were identified across the 502 MLS matches.
A press is effective when there’s a forced turnover within 5s of pressing initiation.
Features: 31 features were extracted and used for training our model:
Spatial Context: Ball carrier position, distance to boundaries, field third, etc.
Pressing Dynamics: Number of defenders, approach velocity, passing options, etc.
Game Context: Score, game state (winning/losing/drawing), time remaining, etc.
Situational Factors: How the ball carrier gained possession (pass reception, interception, etc.), incoming pass characteristics (distance, height, range), etc.
We compared five models all evaluated using 10-fold cross-validation with match-based splits to prevent data leakage.
Hyperparameter tuning was done on a 50% stratified sample of the data to find the best XGBoost parameters.
*Calibration plots are available in the appendix.
The performance difference between XGBoost and logistic regression appears negligible.
\[ |z| = \frac{|\bar{x}_{xgboost} - \bar{x}_{logit}|}{\sqrt{SE^2_{xgboost} + SE^2_{logit}}} \]
\[ = \frac{|0.434 - 0.445|}{\sqrt{0.00263^2 + 0.00255^2}} \]
\[ = \frac{0.011}{0.00366} \]
\[ = 3.02 \text{ standard errors apart} \]
What’s Passes Per Defensive Action (PPDA)?: the number of opposition passes allowed outside of the pressing team’s own defensive third, divided by the number of defensive actions by the pressing team outside of their own defensive third. (Source: Opta Analyst)
A lower figure indicates a higher level of pressing, while a higher figure indicates a lower level of pressing.
In short:
PPDA measures pressing aggressiveness
Our model measures pressing effectiveness
So why even compare these?
PPDA values from Opta Analyst
The feature start_type contributed approximately 70% of total model importance.
This describes how the ball carrier got in possession of the ball.
Reception ← pass_reception, goal_kick_reception, throw_in_reception, corner_reception, free_kick_reception
Interception ← pass_interception, goal_kick_interception, throw_in_interception, corner_interception, free_kick_interception
Recovery ← recovery
Keep Possession ← keep_possession
Unknown ← unknown, missing values
Looking at actual turnovers, pressing the ball carrier when they got the ball from an interception led to a turnover approximately 74% of the time.
Limitations
Class imbalance in the dataset (23% turnovers vs. 77% no turnovers) led to models with high accuracy but lower recall for the minority class.
MLS-only data limits generalizability to leagues with different physical demands and player quality.
Grouping pressing actions into sequences means this approach cannot evaluate individual player pressing effectiveness
Missing values were flagged or labeled as ‘unknown’ rather than estimated, which may limit the model’s ability to capture underlying patterns.
Future Work
Add pressing intensity calculations and pitch control models.
Sensitivity analysis on the 5-second window for pressing effectiveness by testing alternative time thresholds (e.g., 3 seconds, 4 seconds, 6 seconds).
More research on the shape and boundaries of the pressure zone.
Acknowledgement: SkillCorner, Daniel Wicker (Charlotte FC), Dr. Ron Yurko, Quang Nguyen, the CMSACamp TAs, and Carnegie Mellon University - Statistics & Data Science
Contact Information:
David Almona: almonadavid.github.io
Natalie Rayce: linkedin.com/in/natalie-rayce-318a70283
| Feature | Description | Type |
|---|---|---|
| ball_carrier_x | x-coordinate of ball carrier at press start | Numeric |
| ball_carrier_y | y-coordinate of ball carrier at press start | Numeric |
| n_pressing_defenders | Number of unique defenders who were actively pressing | Numeric |
| max_passing_options | Number of available passing options for ball carrier | Numeric |
| avg_approach_velocity | Average speed of pressing defenders (m/s) | Numeric |
| poss_third_start | Pitch third where press begins | Categorical |
| game_state | Current match status (winning/drawing/losing) | Categorical |
| start_type | How player gained possession | Categorical |
| incoming_high_pass | Pass received above 1.8m height | Boolean |
| incoming_pass_distance_received | Distance of received pass (m) | Numeric |
| incoming_pass_range_received | Range category of received pass | Categorical |
| organised_defense | Defense organized at pass moment | Boolean |
| dist_to_nearest_sideline | Distance to nearest sideline (m) | Numeric |
| dist_to_nearest_endline | Distance to nearest endline (m) | Numeric |
| dist_to_attacking_endline | Distance to attacking endline (m) | Numeric |
| dist_to_defensive_endline | Distance to defensive endline (m) | Numeric |
| dist_to_attacking_goal | Distance to attacking goal center (m) | Numeric |
| minutes_remaining_half | Minutes left in current half | Numeric |
| minutes_remaining_game | Minutes left in match | Numeric |
| ball_carrier_direction | Ball carrier direction (degrees) | Numeric |
| ball_carrier_speed | Ball carrier speed (m/s) | Numeric |
| penalty_area | Press starts in penalty area | Boolean |
| n_defenders_within_10m | Defenders within 10m radius | Numeric |
| n_defenders_within_15m | Defenders within 15m radius | Numeric |
| n_defenders_within_20m | Defenders within 20m radius | Numeric |
| n_defenders_within_25m | Defenders within 25m radius | Numeric |