Olympic-Medal-Data-Analysis

Olympic Medal Data Analysis Project

A comprehensive R script that pulls the TidyTuesday Olympic medals dataset from GitHub and enriches it with 2016 population and GDP data via the World Bank API. It then produces a wide array of exploratory and analytical outputs, including mappings, distributions, regressions, and clustering.


Table of Contents

  1. Project Overview
  2. Features & Outputs
  3. Prerequisites
  4. Installation & Setup
  5. Usage
  6. Script Breakdown
  7. Interpreting the Results
  8. Extending & Customizing
  9. Data Sources & Citations
  10. License

Project Overview

This project provides a single R script, Olympic_Medal_Analysis.R, that:

All plots render in sequence when you source the script; regression summaries and model diagnostics print to the console.


Features & Outputs

  1. Top NOCs by Medals

    • Bar chart of the top 10 NOCs by raw medal counts
  2. Efficiency Rankings

    • Bar chart of the top 10 NOCs by medals per million inhabitants
  3. Global Choropleth

    • World map shaded by total medals
  4. Medal‐Type Distribution

    • Pie chart of Gold, Silver, and Bronze proportions
  5. Medal Distributions

    • Histogram of total medals across all NOCs
  6. Pairwise Relationships

    • Scatter plots:

      • Gold vs. Silver
      • Total vs. Population
      • Medals per Million vs. GDP per Capita
  7. Correlation Heatmap

    • Heatmap of correlations among raw and efficiency variables
  8. Regression Analysis

    • Linear model predicting total medals from population, GDP per capita, and efficiency
    • Summary statistics and residuals‐versus‐fitted diagnostic plot
  9. K-Means Clustering

    • Four‐cluster segmentation of NOCs by medal profiles and efficiency
    • PCA–based cluster visualization

Prerequisites

R Packages

The script will auto-install any missing packages. It relies on:


Installation & Setup

  1. Clone or download this repository.
  2. Ensure R ≥ 4.0 is installed.
  3. From an R console or RStudio, set your working directory to the project folder.

No additional build steps are required.


Usage

In R or RStudio:

# Source the analysis script
source("Olympic_Medal_Analysis.R")

Plots will appear one after another, and model summaries will print to the console. To save plots, wrap the plotting calls in your own ggsave() calls or modify the script accordingly.


Script Breakdown

  1. Setup

    • Defines package list; installs & loads missing ones.
  2. Data Fetch

    • Reads Olympic medals CSV from TidyTuesday GitHub.
    • Queries World Bank API for 2016 population & GDP per capita.
  3. Data Preparation

    • Aggregates medal counts by NOC.
    • Merges with population/GDP; computes efficiency metrics.
  4. Visualizations

    • Bar charts (raw counts & efficiency)
    • Choropleth (world map)
    • Pie chart & histogram (distributional)
    • Scatter plots (pairwise relationships)
    • Correlation heatmap
  5. Statistical Modeling

    • Linear regression predicting total medals.
    • Diagnostic plots.
  6. Clustering

    • k-means on scaled medal & efficiency measures.
    • PCA cluster plot.

Interpreting the Results


Extending & Customizing


Data Sources & Citations


License

This project is released under the MIT License. See LICENSE for details.