In his project the goal is to analyze data regarding COVID-19 deaths and vaccinations downloaded from https://ourworldindata.org.
The repository contains three files, two with the data used and a file with all the queries created.
- Total Cases vs Total Deaths in a certain country
- Total Cases vs Population of a certain country
- Countries with Highest Infection Rate compared to Population
- Countries with Highest Death Count per Population
- number of persons that were vaccinated at a given time in a certain location
- Percentage of Population that has recieved at least one Covid Vaccine
The data used is taken from a website that provides informationrelated with the confirmed deaths and vaccinations from COVID-19 on the Our World in Data website.
The database used is the Microsoft SQL Server 2022 and the Microsoft SQL server management studio was used to query the data.
SELECT *
FROM CovidDeaths
ORDER BY 3,4
SELECT *
FROM CovidVaccinations
ORDER BY 3,4
SELECT location, date, total_cases, new_cases, total_deaths, population
FROM CovidDeaths
ORDER BY date
SELECT location, date, total_cases, new_cases, total_deaths, population
FROM CovidDeaths
WHERE location = 'Portugal' AND new_cases IS NOT NULL
ORDER BY date
SELECT location, date, total_cases, new_cases, total_deaths, population
FROM CovidDeaths
WHERE location = 'Portugal' AND total_deaths IS NOT NULL
ORDER BY date
1 - Total Cases vs Total Deaths in a certain country
First we want to know if the Total Deaths is big in relation to the total cases for Portugal
SELECT location, date, total_cases,total_deaths, (total_deaths/total_cases)*100 AS DeathPercentage
FROM CovidDeaths
WHERE location LIKE '%Portugal%' AND total_deaths IS NOT NULL
ORDER BY DeathPercentage DESC
2 - Total Cases vs Population of Portugal
SELECT location, date, total_cases,population, (total_cases/population)*100 AS PercentPopInfected
FROM CovidDeaths
WHERE location LIKE '%Portugal%' AND total_deaths IS NOT NULL
ORDER BY PercentPopInfected DESC
3 - Countries with Highest Infection Rate compared to Population
SELECT location, population, MAX(total_cases) AS HighestInfectionCount, Max((total_cases/population)*100) AS PercentPopInfected
FROM CovidDeaths
GROUP BY location, population
ORDER BY PercentPopInfected DESC
We can see that the country with the highest percentage of population infected is Cypress with 77%.
4 - Countries with Highest Death Count per Population
SELECT location, MAX(Total_deaths) AS TotalDeathCount
FROM CovidDeaths
WHERE continent IS NOT NULL
GROUP BY location
ORDER BY TotalDeathCount DESC
5 - Number of persons that were vaccinated at a given time in a certain location
SELECT dea.continent,
dea.location,
dea.date,
dea.population,
vac.new_vaccinations,
SUM(vac.new_vaccinations) OVER (Partition BY dea.Location Order BY dea.location, dea.Date) AS RollingPeopleVaccinated
FROM CovidDeaths dea JOIN CovidVaccinations vac
ON dea.location = vac.location AND dea.date = vac.date
WHERE dea.continent IS NOT NULL
ORDER BY location, date
6 - Percentage of Population that has recieved at least one Covid Vaccine
WITH PopVac (Continent, Location, Date, Population, New_Vaccinations, RollingPeopleVaccinated)
AS
(
SELECT dea.continent,
dea.location,
dea.date,
dea.population,
vac.new_vaccinations,
SUM(vac.new_vaccinations) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.Date) AS RollingPeopleVaccinated
FROM CovidDeaths dea JOIN CovidVaccinations vac
On dea.location = vac.location AND dea.date = vac.date
)
SELECT *, (RollingPeopleVaccinated/Population)*100 AS PercPopVac
FROM PopVac
A temporary table can be created in case the results of a certain query are used on other queries.
DROP TABLE IF EXISTS #PercentPopulationVaccinated
CREATE TABLE #PercentPopulationVaccinated
(
Continent nvarchar(255),
Location nvarchar(255),
Date datetime,
Population numeric,
New_vaccinations numeric,
RollingPeopleVaccinated numeric
)
Then i insert the results of the previous query into the table.
INSERT INTO #PercentPopulationVaccinated
SELECT dea.continent,
dea.location,
dea.date,
dea.population,
vac.new_vaccinations,
SUM(vac.new_vaccinations) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.Date) AS RollingPeopleVaccinated
FROM CovidDeaths dea JOIN CovidVaccinations vac
On dea.location = vac.location AND dea.date = vac.date
And finally, i can use the new table on a new query.
SELECT *, (RollingPeopleVaccinated/Population)*100
FROM #PercentPopulationVaccinated
I created a view with the code below.
CREATE VIEW PercentPopVac AS
SELECT dea.continent,
dea.location,
dea.date,
dea.population,
vac.new_vaccinations,
SUM(vac.new_vaccinations) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.Date) AS RollingPeopleVaccinated
FROM CovidDeaths dea JOIN CovidVaccinations vac
On dea.location = vac.location AND dea.date = vac.date
And then i can create queries with it.
SELECT *
FROM PercentPopVac
I used following SQL statements to create the necessary indexes:
CREATE INDEX index1 ON CovidDeaths (location)
CREATE INDEX index2 ON CovidDeaths (date)
To create the multi-column index, i used the following SQL statement:
CREATE INDEX index3 ON CovidDeaths (location, date)
-Partition on table CovidDeaths
USE CovidDeaths
GO
--- create partition function
CREATE PARTITION FUNCTION CovidDeaths_Partition (datetime2(0))
AS RANGE RIGHT FOR VALUES ('2020-06-01', '2020-07-01') ;
GO
--- create scheme
CREATE PARTITION SCHEME CovidDeaths_Scheme
AS PARTITION CovidDeaths_Partition
ALL TO ('PRIMARY') ;
GO
--- create table
CREATE TABLE dbo.PartitionTable (date datetime2(0) PRIMARY KEY, location varchar(255), new_deaths(255))
ON CovidDeaths_Scheme (date) ;
GO
© 2024 Victor Malheiro