This task involves designing and populating a relational database on the global COVID-19 vaccination campaign to enable efficient insights and analysis. The database includes tables for countries, states, vaccination data, and vaccine manufacturers, focusing on metrics such as the number of people vaccinated, vaccine types, and age groups.
Designed the database schema through E-R modelling, schema decomposition, and normalization (3NF) to optimize structure for query efficiency.
Cleaned and processed heterogeneous CSV datasets to populate the database according to the defined schema.
Developed advanced SQL queries (CTEs, window functions, date transformations) to extract insights on vaccination trends, coverage, and manufacturer distribution, and presented results through data visualizations for comparative analysis.
ERD
Data for this task was provided in multiple CSV files from Our World in Data.
ERD (Entity-Relationship Diagram) of the original data is provided below:
Database Schema
The database schema is fully normalized and defined in SQLite syntax (adaptable to other SQL databases). Data from CSV files was cleaned and transformed using R tidyverse, and the final Vaccinations.db file is available for download.
CREATETABLE country ( iso_code TEXT NOTNULL, country TEXT, population INTEGER, source_name TEXT,PRIMARYKEY ( iso_code ));CREATETABLE state ( state TEXT NOTNULL, iso_code TEXT,PRIMARYKEY ( state ),FOREIGNKEY ( iso_code )REFERENCES country (iso_code) );CREATETABLE country_lastest_source ( iso_code TEXT NOTNULL, last_observation_date TEXT, source_website,PRIMARYKEY ( iso_code, last_observation_date ),FOREIGNKEY ( iso_code )REFERENCES country (iso_code) );CREATETABLE vaccination_country ( iso_code TEXT NOTNULL,date TEXT NOTNULL, source_url TEXT, total_vaccinations INTEGER, people_vaccinated INTEGER, people_fully_vaccinated INTEGER, total_boosters INTEGER, daily_vaccinations_raw INTEGER, daily_vaccinations INTEGER, daily_people_vaccinated INTEGER,PRIMARYKEY ( iso_code,date ),FOREIGNKEY ( iso_code )REFERENCES country (iso_code) );CREATETABLE vaccination_manufacturer ( manufacturer TEXT NOTNULL, iso_code TEXT NOTNULL,date TEXT NOTNULL, total_vaccinations INTEGER,PRIMARYKEY ( manufacturer, iso_code,date ),FOREIGNKEY ( iso_code,date )REFERENCES vaccination_country (iso_code,date) );CREATETABLE vaccination_age ( age_group TEXT NOTNULL, iso_code TEXT NOTNULL,date TEXT NOTNULL, people_vaccinated_per_hundred FLOAT, people_fully_vaccinated_per_hundred FLOAT, people_with_booster_per_hundred FLOAT,PRIMARYKEY ( age_group, iso_code,date ),FOREIGNKEY ( iso_code,date )REFERENCES vaccination_country (iso_code,date) );CREATETABLE vaccination_state ( state TEXT NOTNULL,date TEXT NOTNULL, total_vaccinations INTEGER, people_vaccinated INTEGER, people_fully_vaccinated INTEGER, total_boosters INTEGER, daily_vaccinations_raw INTEGER, daily_vaccinations INTEGER, total_vaccinations_per_hundred FLOAT, people_vaccinated_per_hundred FLOAT, people_fully_vaccinated_per_hundred FLOAT, total_boosters_per_hundred FLOAT, daily_vaccinations_per_million FLOAT, share_doses_used FLOAT, total_distributed FLOAT,PRIMARYKEY ( state,date ) FOREIGNKEY (state) REFERENCES state (state));
Insights
Query 1: Vaccination Progress among Countries
The following query provides the running total of the number of people who have taken the vaccine so far for each observation day recorded in each countries around the world.
Show code
WITH quest1_view AS (SELECT vc.iso_code, c.country, vc.date,ROW_NUMBER() OVER (PARTITIONBY vc.iso_code ORDERBY vc.date) -1AS dateNo, vc.people_vaccinatedFROM vaccination_country vcJOIN country c ON vc.iso_code = c.iso_code)SELECT country, dateNo AS DayNO, people_vaccinated AS [Total Injected People]FROM quest1_viewWHERE people_vaccinated !="NA"AND people_vaccinated !=0;
Query 2: Covid-19 Vaccines Popularity
The Covid-19 vaccination campaign has seen the use of various vaccines across different countries. The following query provides insights into the adoption of these vaccines globally, showing the number of countries that have administered each type of vaccine.
Show code
SELECT c.country AS [Country/Region],MAX(vc.total_vaccinations) AS [Cummulative Doses]FROM vaccination_country vcJOIN country c ON vc.iso_code = c.iso_codeWHERE vc.total_vaccinations !="NA"GROUPBY vc.iso_codeORDERBYMAX(vc.total_vaccinations) DESC;