Wine-Quality-Data-Analysis

Wine Quality Analysis

A comprehensive R script that downloads, cleans, and analyzes the UCI Wine Quality datasets (red & white) and produces a suite of exploratory visualizations, principal component analysis, and a predictive regression model.


Table of Contents

  1. Project Overview
  2. Features
  3. Prerequisites
  4. Installation
  5. Usage
  6. Script Breakdown
  7. Extending & Customizing
  8. Data Sources & Citations
  9. License

Project Overview

This project provides a single, self-contained R script (Wine_Quality_Analysis.R) that:

Ideal for data scientists, statisticians, and wine enthusiasts seeking a reproducible pipeline for exploring chemical correlates of perceived wine quality.


Features

  1. Data Ingestion & Cleaning

    • Downloads CSVs for red & white wines
    • Enforces consistent numeric types via read_delim() with explicit locale
  2. Exploratory Visualizations

    • Bar chart of quality counts by type
    • Jittered scatter of alcohol content vs. quality
    • Faceted boxplots for key chemical attributes by quality
    • Correlation heatmap of all numeric variables
  3. Dimensionality Reduction (PCA)

    • Scree plot of variance explained
    • Biplot illustrating attribute loadings
  4. Predictive Modeling

    • Linear regression of numeric quality on alcohol, sulphates, pH, and type
    • Model summary and residuals vs. fitted diagnostic
    • Scatter of fitted vs. actual quality


Prerequisites

R Packages

All missing packages will be installed automatically by the script.


Installation

  1. Clone or download this repository:

    git clone https://github.com/yourusername/wine-quality-analysis.git
    cd wine-quality-analysis
    
  2. Ensure R (≥4.0) is on your PATH.

No additional build steps required.


Usage

Open an R console or RStudio in the project directory and run:

source("Wine_Quality_Analysis.R")

The script will:


Script Breakdown

  1. Setup

    • Defines package list, installs missing ones
    • Loads libraries
  2. Data Loading

    • Uses read_delim() with locale(decimal_mark=".")
    • Applies a col_types spec to ensure all measurement columns are double()
  3. Data Wrangling

    • Adds a type factor (red/white)
    • Converts quality to an ordered factor
  4. Exploratory Plots

    • p1: Bar chart of quality counts
    • p2: Jitter scatter of alcohol vs. quality
    • p3: Faceted boxplots for pH, residual sugar, citric acid, sulphates
  5. Correlation Analysis

    • Correlation matrix heatmap of all numeric variables
  6. PCA

    • Scree plot of variance explained
    • Biplot of the first two principal components
  7. Regression

    • Linear model: quality ~ alcohol + sulphates + pH + type
    • Summary printed to console
    • Residuals vs. fitted plot
    • Fitted vs. actual scatter

Extending & Customizing


Data Sources & Citations


License

Distributed under the MIT License. See LICENSE for details.