Forced Turnover: Evaluating Pressing Effectiveness in Soccer
Introduction
Soccer is a highly tactical sport where defensive tactics not only prevent the opposing team from scoring but can create immediate attacking opportunities. Pressing, a defensive tactic where opposing players apply coordinated pressure on the offensive ball carrier to force turnovers, has been at the core of elite teams like Liverpool under Jürgen Klopp and Manchester City under Pep Guardiola (commonly known as “gegenpressing”). They have shown that effective pressing can turn defense into goal-scoring opportunities within seconds.
This study centers around the question: Can the effectiveness of a press in soccer be predicted using factors such as spatial context, pressing dynamics, game context, and situational factors? We have defined our measure of effectiveness as forcing a turnover within 5 seconds of pressing initiation.
Data
Match information, dynamic events, and XY tracking data are provided by SkillCorner. They use artificial intelligence and deep learning to detect moving objects in broadcast videos and extract data. Match information includes the data and time of the games, home and away team names, pitch dimensions, referee information, and other game-level details. Dynamic events include on-ball and off-ball activities such as passes, shots, tackles, recoveries, off-ball runs, goals, and many others. Each event also has a timestamp, location coordinates, and player identification. XY tracking data includes the identities, locations, and movements of all 22 players and the ball throughout the full 90 minutes at a rate of 10 frames per second (10 Hz). Since this data is extracted from broadcast video, SkillCorner uses its technology to extrapolate the coordinates of players outside of the camera’s field of view.
The dataset contains 520 matches played in the 2023 season of Major League Soccer (MLS), the professional soccer league in North America. The dynamic events for 18 of the 520 matches were not provided by SkillCorner because they did not pass their quality check and were, therefore, unusable for this analysis.
Methods
Data processing
Using the XY tracking data, we calculated the frame-by-frame distance, velocity, acceleration, and direction of both the players and the ball. These velocities were smoothed using a rolling function with a window size of 3 frames, and the accelerations with a window size of 5 frames. Afterwards, we removed the physically impossible values, such as velocity values larger than 11.9 m/s and acceleration values larger than 10 m/s2. We also standardized the data such that the home team always attacks from left to right (Anzer et al. (2025).
Detecting pressing sequences with XY tracking data
While StatsBomb-360 data and possibly other soccer tracking providers tag a pressure or pressing event, SkillCorner does not. To identify pressing events, we initially defined the “pressure zone” as any area within 6 meters of the ball carrier. However, according to Andrienko et al. (2017), this approach is too simplistic and does not account for the directions players are facing or moving towards. So, we chose to adopt the new approach they had proposed, where the “pressure zone” is elliptical (or oval) rather than circular. The distance limits are determined by the following formula:
L = D_{back} + (D_{front} - D_{back})(z^3 + 0.3z) / 1.3 where:
L = the maximum distance limit for effective pressure at angle \theta (the radius of the oval-shaped pressure zone at any given angle)
D_{back} = the maximum distance limit when the presser is positioned behind the ball carrier
D_{front} = the maximum distance limit when the presser is positioned in front of the ball carrier
z = (1 - cos \theta) / 2
\theta = the angle between the vector from the ball carrier to the center of the attacking goal (which we determined as the threat direction) and the vector from the ball carrier to the presser
Andrienko et al. (2017) determined the distance thresholds D_{back} and D_{front} to be 3m and 9m, respectively, based on consultation with football (soccer) experts. He later performed an experiment to verify these parameters.
Now that we have determined the pressure zone, the only other criterion we specified was that the approach velocity of the defender to the ball carrier must be greater than 1 m/s, as proposed by Merckx et al. This approach velocity threshold has been set in place to filter out “static” defending/pressing, as the defender must actively engage or move towards the ball carrier even if within the pressure zone. To reiterate, a defending player was classified as “pressing” if they were simultaneously within the oval pressure zone and approaching the ball carrier above the velocity threshold.
Grouping pressing sequences
Individual pressing actions were grouped into pressing sequences based on how close they happened in time. We defined a pressing sequence as a continuous period where at least one defender from the same team maintained pressing behavior, allowing for brief interruptions of up to 1.5 seconds (15 frames). In other words, if a pressing defender leaves the pressure zone or is no longer actively pressing, but another defender exhibits pressing behavior within 1.5 seconds, the sequence remains active. However, if the next press begins more than 1.5 seconds after the previous press, a new pressing sequence begins.
For each identified sequence, we extracted the sequence duration (in frames and seconds), the number of defending players involved, and the average approach velocity of pressing defenders at the sequence start. 252,646 pressing sequences were identified.
Measuring an effective press
Since the goal of pressing is to regain ball possession or force a turnover by the attacking team, Lee et al. (2025) highlighted that the impact of pressing should extend beyond immediate ball possession. This is mainly because pressing can force the attacking team into tight positions, which may increase the likelihood of an eventual turnover in the next few seconds or actions. They tested different success criteria for pressing, such as regaining possession after 7 seconds or after 4 actions, among others, but their analysis focused on regaining possession within 5 seconds of the pressing initiation. As a result, we decided to use the same method and assess the effectiveness of a pressing sequence based on whether the pressing team forced a turnover within 5 seconds of the start of the pressing sequence.
Feature engineering
After data cleaning and processing, we had 31 features that were used for training our model. These included:
- Spatial Context: Ball carrier position, distance to boundaries, field third, etc.
- Pressing Dynamics: Number of defenders, approach velocity, passing options, etc.
- Game Context: Score, game state (winning/losing/drawing), time remaining, etc.
- Situational Factors: How the ball carrier gained possession (pass reception, interception, etc.), incoming pass characteristics (distance, height, range), etc.
These features were extracted from already tagged events provided by SkillCorner and from processing the XY tracking data. Further details on the features are provided in the appendix.
Analysis
To reiterate, 252,464 pressing sequences were identified across the 502 MLS matches. After explicitly handling missing values within some of the features, we built and compared the performances of two models to predict forced turnovers within 5 seconds of pressing initiation: a logistic regression as a baseline and an XGBoost model. We used 10-fold cross-validation with match-based splits to prevent data leakage, making sure that no observations from the same match appeared in both training and test sets. To reduce computational load, hyperparameter tuning was done on a 10% stratified sample of the data to find the best XGBoost parameters.
Model Performance
The XGBoost model marginally outperformed the logistic regression model across all evaluation metrics, as seen in Table 1 below.
Model Calibration
Figure 3 below shows the calibration plot of our XGBoost model, plotting predicted turnover probabilities against actual turnover rates. In a perfectly calibrated model, the blue line, which is our model, should align exactly with the red dotted diagonal line (perfect calibration).
Our model shows reasonably good calibration overall, with some deviation from perfect. This tells us that our XGBoost model is better at predicting when turnovers are unlikely to occur than when they are likely to occur.
Results
Team Pressing Performance
Significant variation existed in pressing effectiveness across MLS teams during the 2023 season (Figure 4). Based on our model, the New York Red Bulls were the most effective pressing team, forcing approximately 7 more turnovers per game than our model predicted.
Pressing Volume vs. Pressing Effectiveness
Figure 5 below shows pressing volume against pressing effectiveness by teams with four quadrants: high-volume/high-effectiveness (top right), low-volume/high-effectiveness (top left), high-volume/low-effectiveness (bottom right), and low-volume/low-effectiveness (bottom left). The New York Red Bulls represented the ideal combination, attempting the most presses per game while maintaining the highest effectiveness.
When pressing vs. when being pressed
Figure 6 below shows the relationship between pressing effectiveness and press resistance across all MLS teams during the 2023 season, measured as the difference between actual and expected turnovers per game (xP_diff
). When teams are pressing, positive xP_diff
values indicate forcing more turnovers than expected. When teams are being pressed, negative xP_diff
values indicate better press resistance (fewer turnovers than expected). For visualization purposes, the y-axis values were inverted, transforming negative xP_diff
values into positive “turnovers avoided,” so that both axes follow the natural idea that higher values represent better performance.
Teams in the upper-right quadrant are good at both: they force more turnovers than expected when pressing while avoiding more turnovers than expected when in possession. Based on our model, Seattle Sounders, Los Angeles FC, and Sporting Kansas City are the select few that do well at both. St. Louis City Soccer Club and New York Red Bulls show strong pressing abilities but are vulnerable when being pressed (lower-right). Portland Timbers seem to struggle the most at both.
Discussion
Feature Importance
Our analysis shows that pressing effectiveness in the MLS is predictable to a meaningful degree, with our XGBoost model achieving an AUC of 0.771. Variable importance analysis showed that start_type
contributed approximately 70% of total model importance. This variable describes how the ball carrier got in possession of the ball, which could be an interception, reception, recovery, among others. When we looked at the actual turnover probabilities, pressing the ball carrier who got the ball from an interception led to a high turnover rate, approximately 74% of the time (Table 2). This aligns with tactical principles employed at the highest levels of professional football. As Domenec Torrent, Pep Guardiola’s former assistant coach at Manchester City, explained: “When we lose the ball it’s very important for Pep to press high in five seconds. If you don’t win it back within five seconds then make a foul and go back” (Torrent, cited in Manchester Evening News). This immediate pressing approach, particularly effective when opponents have just gained possession through interceptions, reflects the vulnerability window that our data quantifies, demonstrating why start_type
is an important predictor of successful pressure outcomes.
Limitations
There was a 23% class imbalance in the response variable, forced turnover. This potentially leads to biased models that perform poorly on the minority class (in our case, turnover = “yes”) as the models are structured for overall accuracy.
Our analysis uses only MLS data, limiting generalizability to other leagues with different tactical styles, player quality, or physical demands.
Tracking data inaccuracies may affect player position and movement precision.
Our model does not account for individual player skill levels, such as pace and pressing ability.
No pitch control modeling limits understanding of spatial dominance during pressing.
Future
Apply class weights to handle class imbalance.
Extend analysis to multiple leagues.
Add pressing intensity calculation.
Add pitch control metrics to account for spatial dominance.
Acknowledgement
Special thanks to Daniel Wicker (Charlotte FC), Dr. Ron Yurko, Quang Nguyen, the CMSACamp TAs, and Carnegie Mellon University.
Citations
Andrienko, G., Andrienko, N., Budziak, G., Dykes, J., Fuchs, G., von Landesberger, T., & Weber, H. (2017). Visual analysis of pressure in football. Data Mining and Knowledge Discovery, 31(6), 1793–1839. https://doi.org/10.1007/s10618-017-0513-2
Anzer, G., Arnsmeyer, K., Bauer, P., Bekkers, J., Brefeld, U., Davis, J., Evans, N., Kempe, M., Robertson, S. J., Smith, J. W., & Van Haaren, J. (2025). Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer). http://arxiv.org/abs/2505.15820
Bauer, P., & Anzer, G. (2021). Data-driven detection of counterpressing in professional football: A supervised machine learning task based on synchronized positional and event data with expert-based feature extraction. Data Mining and Knowledge Discovery, 35. https://doi.org/10.1007/s10618-021-00763-7
Lee, M., Jo, G., Hong, M., Bauer, P., & Ko, S.-K. (2025). exPress: Contextual Valuation of Individual Players Within Pressing Situations in Soccer.
Merckx, S., Robberechts, P., Euvrard, Y., & Davis, J. (n.d.). Measuring the Effectiveness of Pressing in Soccer. Robberechts, P. (n.d.). Valuing the Art of Pressing.
Contact Information
David Almona, Centre College, almonadavid@gmail.com
Natalie Rayce, Carnegie Mellon University, nrayce@andrew.cmu.edu
Code Availability
Code available on GitHub