Skip to content

MKOfosu/Football-Players-Analysis-in-R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

---

📘 README

1. Introduction and Objectives

This document breaks down my journey and the insights I gleaned from analysing the football dataset using RStudio.

📊 EDA of Global Football Players
Cleaned, transformed, and visualized using R (tidyverse, plotly, janitor, sf).
Includes demographic analysis, skill profiling, and statistical testing.


2. Dataset and Context

The football dataset contains details about football players worldwide. Although the data appears outdated (players are currently 8 years older than in the dataset), it still provides a valuable opportunity to showcase my R skills in unlocking insights. I plan to update the analysis when newer data becomes available.

📥 Get the dataset from here


3. Packages Used

Tidyverse, Janitor, ggridges, plotly, rnaturalearth,
rnaturalearthdata, rnaturalearthhires, sf


4. Data Cleaning and Transformation

🧹 Cleaning Steps

  1. Standardized column names using janitor.

  2. Investigated and corrected data types. Many numeric columns were incorrectly stored as characters.

  3. Identified messy values like "72+5" and outliers >100, cleaned them, and replaced extreme values with NA.

    # Data cleaning
    cleaned_df <- cleaned_df %>%
      # 1. Trim the string columns of all leading and trailing whitespaces
      mutate(across(where(is.character), str_trim)) %>%
      
      # 2. Detect values in the form "72+5" and extract the digits before the "+"
      mutate(across(all_of(cols_to_convert), 
                    ~ ifelse(str_detect(.x, "\\+"), str_extract("^\\d+"), .x))
            ) %>%
            
      # 3. Convert the columns to integers and replace value greater than 100 with NA
      mutate(
        across(all_of(cols_to_convert),
              ~ {
                val <- as.integer(.x)
                if_else(val > 100, NA, val)
              }
        )
      )
  4. Checked for duplicates and removed them.

  5. Handled missing values via median imputation.

      # Check for missing values in the the cleaned data.
    colSums(is.na(cleaned_df))
    
    # Fill the missing values with their respective means
    cleaned_df <- cleaned_df %>%
      mutate(across(all_of(cols_to_convert), ~ replace_na(.x, median(.x, na.rm = TRUE))))
  6. Created a new feature position by mapping the first listed preferred position to one of: Goalkeeper, Defender, Midfielder, Forward.


5. Exploratory Data Analysis (EDA) and Visualizations

📊 5.1 Demographic Exploration

Age Distribution
Most players are young, averaging 25 years. The distribution is similar across positions.

Nationality Representation
England, Germany, and Spain lead in player counts. Top football nations are concentrated in Europe and South America.

Position Breakdown
Midfielders and defenders dominate the dataset.


⚽ 5.2 Skill Profiles by Position

Players have similar average overall ratings across positions, with goalkeepers having slightly lower average rating.


Which of the attributes made a player in a position have a good overall rating?

To answer this, I explored correlations between overall rating and specialist attributes, including goalkeeper-specific metrics. GGally:: ggpairs was used to visualise the correlations between the various attributes and the overall ratings.

Goalkeepers With the exception of kicking all the goalkeeper attributes had a strong positive correlation with overall rating.

Defenders It was revealed that attributes such as standing tackle, sliding tackle, marking, interception among others were strongly correlated with the overall rating of a defender.

Midfielders

Attributes such as vision, ball control, composure, reactions and short passing had good correlations with the overal ratings of a midfielder.

Forwards Several attributes were found to have strong positive correlations with the overall rating of a forward. These included: reaction, positioning, finishing, composure, ball control amongst others. Interestingly, attributes like aggression, penalties, acceleration and heading accuracy had rather weak correlations with the overall ratings of forwards.


Reaction was the only attribute that had a strong positive correlation with overall rating for all the positions.

🌟 5.3 Top Performers Analysis

Identifying players with exceptional skills in specific attributes:

  • Best dribblers

    cleaned_df |> 
      select(name, nationality, position, dribbling) |> 
      filter(dribbling >= 90) |>
      slice_max(dribbling, n = 5, with_ties = TRUE)

Lionel Messi of Argentina is the best dribbler, followed closely by Neymar.

  • Best free kick takers

    cleaned_df |> 
      select(name, nationality, position, free_kick_accuracy) |> 
      slice_max(free_kick_accuracy, n = 5)

Çalhanoğlu and A. Pirlo are the best freekick takers

  • Best finishers - Forwards

    cleaned_df |> 
      select(name, nationality, position, finishing) |> 
      filter(position == "Forward") |> 
      slice_max(finishing, n = 10)

The best finishing for forwards are L. Messi, C. Ronaldo and L. Suarez

  • Most powerful shots

    cleaned_df |> 
      select(name, nationality, position, shot_power) |> 
      # filter(position == "Forward") |> 
      slice_max(shot_power, n = 5)

Cristiano Ronaldo has the most powerful shot


📈 5.4 Age vs Performance Curve

Players improve from under 20, peak in late 20s to early 30s, then gradually decline. Goalkeepers follow a similar trend but with slightly lower ratings.


🌍 5.5 Nationality & Playing Style

  • Compared average attribute profiles across nationalities.

    # This gives the top 6 football countries interms of number of football produced
    top_countries <- cleaned_df |> 
      summarise(
        num_of_players = n(),
        average_overall = mean(overall, na.rm = TRUE),
        .by = nationality
      ) |> 
      slice_max(
        num_of_players, 
        n = 6
      )
    Country Number of players average overll
    England 1629 63.1
    Germany 1135 65.8
    Spain 1009 69.9
    France 973 67.2
    Argentina 961 67.7
    Brazil 809 70.9

When these six countries were analysed, it was revealed that Brazil led in average ratings across all positions.


🧤 5.6 Goalkeeper vs Outfield Comparison

  • Compared goalkeeper-specific attributes vs outfielders.
  • Goalkeepers excel in diving, reflexes, positioning.
  • Outfielders slightly outperform in jumping.

Goalkeepers had lower ratings in attributes outside of their expected attributes

Hypothesis Testing

  1. Strength: Goalkeepers vs Outfielders

There was a statistically significant difference in strength of goalkeepers and outfield players

  1. Overall Rating: Goalkeepers vs Outfielders

There was a statistically significant difference in strength of goalkeepers and outfield players


6. Conclusion & Insights

  • Players are generally young; midfielders and defenders are most common.
  • Europe and South America dominate football talent.
  • Goalkeepers have lower average ratings but excel in specialized attributes.
  • Messi and Ronaldo top various skill categories.
  • Finishing correlates strongly with overall rating for forwards.
  • Brazil leads in average ratings across all positions.
  • Statistical tests confirm meaningful differences between goalkeepers and outfielders.

📄 Refer to the attached R script file for full reproducibility.


7. Suggestions for Further Analysis

  • Cluster positions using PCA + k-means.
  • Explore attribute correlations via heatmaps.
  • Perform dimensionality reduction.
  • Cluster players to uncover latent role groupings.

📬 Contact If you'd like to connect, collaborate, or discuss this project further:

📧 Email: mathiasofosu2@gmail.com

💼 LinkedIn: Mathias Ofosu

🧠 GitHub Profile: Mathias Ofosu

Twitter/X: Mathias Ofosu

Feel free to reach out — I’m always open to data-driven conversations.

About

This analysis explores the football dataset which contains detailed information about the attributes of football players. The analysis was done in R Studio. Kindly refer to README for further information.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages