Main Question
Can the effectiveness of a press in soccer be predicted using factors such as spatial context, pressing dynamics, game context and situational factors?
Carnegie Mellon University
Centre College
Manchester City wins back possession seconds after losing it through pressing.
Can the effectiveness of a press in soccer be predicted using factors such as spatial context, pressing dynamics, game context and situational factors?
Pressing: a defensive tactic where players apply coordinated pressure on the opponent with the ball to force mistakes, win back possession, and quickly transition to attack
Forced Turnover: when a player loses possession due to opponent pressure, resulting in the opposing team gaining control. This includes misplaced passes, interceptions, successful tackles, or losing control under pressure - all direct results of effective defensive pressure
Dataset: 520 matches in the MLS 2023 season
Three data types:
Source:
Problem: Doesn’t account for direction and oversimplifies pressing
Adopted from Andrienko et al. (2017).1
A defending player was classified as “pressing” if they were simultaneously
Pressing actions were grouped into sequences if at least one defender continued pressing within 1.5 seconds.
Since the goal of pressing is to regain ball possession from the attacking team, the impact of pressing should extend beyond immediate ball re-possession.
Pressing can force the attacking team into tight positions, which may increase the likelihood of an eventual turnover in the next few seconds or actions.
Response Variable: A forced turnover within 5s of pressing initiation.
Features: 31 features were extracted and used for training our model:
Spatial Context: Ball carrier position, distance to boundaries, field third, etc.
Pressing Dynamics: Number of defenders, approach velocity, passing options, etc.
Game Context: Score, game state (winning/losing/drawing), time remaining, etc.
Situational Factors: How the ball carrier gained possession (pass reception, interception, etc.), incoming pass characteristics (distance, height, range), etc.
252,464 pressing sequences were identified across the 502 MLS matches.
Two Models:
Logistic Regression
XGBoost
10-fold cross-validation with match-based splits to prevent data leakage.
Calibration plot: The XGBoost model (blue line) moves away from perfect calibration with higher turnover probabilities.
Teams in the upper-right quadrant combine high pressing frequency with high effectiveness
The feature start_type
contributed approximately 70% of total model importance.
This describes how the ball carrier got in possession of the ball, which could be an interception, reception, recovery, etc.
Looking at actual turnovers, pressing the ball carrier when they got the ball from an interception led to a turnover approximately 74% of the time.
Limitations:
23% class imbalance in forced turnovers potentially biases models toward predicting “no turnover”.
MLS-only data limits generalizability to leagues with different physical demands and player quality.
Tracking data inaccuracies may affect player position and movement precision.
No individual player skills such as pace and pressing ability.
No pitch control modeling limits understanding of spatial dominance during pressing.
Future Work:
Apply class weights to handle class imbalance.
Extend analysis to multiple leagues.
Add pressing intensity calculation.
Add pitch control metrics to account for spatial dominance.
Feature | Description | Type |
---|---|---|
ball_carrier_x | x-coordinate of ball carrier at press start | Numeric |
ball_carrier_y | y-coordinate of ball carrier at press start | Numeric |
n_pressing_defenders | Number of unique defenders who were actively pressing | Numeric |
max_passing_options | Number of available passing options for ball carrier | Numeric |
avg_approach_velocity | Average speed of pressing defenders (m/s) | Numeric |
poss_third_start | Pitch third where press begins | Categorical |
game_state | Current match status (winning/drawing/losing) | Categorical |
start_type | How player gained possession | Categorical |
incoming_high_pass | Pass received above 1.8m height | Boolean |
incoming_pass_distance_received | Distance of received pass (m) | Numeric |
incoming_pass_range_received | Range category of received pass | Categorical |
organised_defense | Defense organized at pass moment | Boolean |
dist_to_nearest_sideline | Distance to nearest sideline (m) | Numeric |
dist_to_nearest_endline | Distance to nearest endline (m) | Numeric |
dist_to_attacking_endline | Distance to attacking endline (m) | Numeric |
dist_to_defensive_endline | Distance to defensive endline (m) | Numeric |
dist_to_attacking_goal | Distance to attacking goal center (m) | Numeric |
minutes_remaining_half | Minutes left in current half | Numeric |
minutes_remaining_game | Minutes left in match | Numeric |
ball_carrier_direction | Ball carrier direction (degrees) | Numeric |
ball_carrier_speed | Ball carrier speed (m/s) | Numeric |
penalty_area | Press starts in penalty area | Boolean |
n_defenders_within_10m | Defenders within 10m radius | Numeric |
n_defenders_within_15m | Defenders within 15m radius | Numeric |
n_defenders_within_20m | Defenders within 20m radius | Numeric |
n_defenders_within_25m | Defenders within 25m radius | Numeric |
\[ L = D_{back} + (D_{front} - D_{back})(z^3 + 0.3z) / 1.3 \] where:
\(L\) = the maximum distance limit for effective pressure at angle \(\theta\) (the radius of the oval-shaped pressure zone at any given angle)
\(D_{back}\) = the maximum distance limit when the presser is positioned behind the ball carrier
\(D_{front}\) = the maximum distance limit when the presser is positioned in front of the ball carrier
\(z\) = \((1 - cos \theta) / 2\)
\(\theta\) = the angle between the vector from the ball carrier to the center of the attacking goal (which we determined as the threat direction) and the vector from the ball carrier to the presser
Andrienko et al. (2017) determined the distance thresholds \(D_{back}\) and \(D_{front}\) to be 3m and 9m, respectively, based on consultation with football (soccer) experts. He later performed an experiment to verify these parameters.
Heatmap showing the pressing patterns across the MLS 2023 season. The highest concentration of pressing occurs in the middle third of the field on both sides, where teams look to win the ball back in midfield areas to create quick attacking opportunities. Note: the home team is made to always attack left-to-right, the away team goes right-to-left.
Positive values (blue) indicate teams forcing more turnovers than predicted, while negative values (red) show underperformance. New York Red Bulls led MLS in pressing effectiveness, while Nashville SC struggled most relative to expectations.