-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
7fd5d9a
commit 2bbc034
Showing
5 changed files
with
136 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
/* | ||
# 1. Row count check | ||
Count the total number of records (or rows) are in the SQL view | ||
*/ | ||
|
||
SELECT | ||
COUNT(*) AS no_of_rows | ||
FROM | ||
view_uk_youtubers_2024; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
/* | ||
# 2. Column count check | ||
Count the total number of columns (or fields) are in the SQL view | ||
*/ | ||
|
||
|
||
SELECT | ||
COUNT(*) AS column_count | ||
FROM | ||
INFORMATION_SCHEMA.COLUMNS | ||
WHERE | ||
TABLE_NAME = 'view_uk_youtubers_2024' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
/* | ||
# 3. Data type check | ||
Check the data types of each column from the view by checking the INFORMATION SCHEMA view | ||
*/ | ||
|
||
|
||
SELECT | ||
COLUMN_NAME, | ||
DATA_TYPE | ||
FROM | ||
INFORMATION_SCHEMA.COLUMNS | ||
WHERE | ||
TABLE_NAME = 'view_uk_youtubers_2024'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
/* | ||
# 4. Duplicate records check | ||
-- 1. Check for duplicate rows in the view | ||
-- 2. Group by the channel name | ||
-- 3. Filter for groups with more than one row | ||
*/ | ||
|
||
|
||
-- 1. | ||
SELECT | ||
channel_name, | ||
COUNT(*) AS duplicate_count | ||
FROM | ||
view_uk_youtubers_2024 | ||
|
||
-- 2. | ||
GROUP BY | ||
channel_name | ||
|
||
-- 3. | ||
HAVING | ||
COUNT(*) > 1; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
/* | ||
# Data quality tests | ||
1. The data needs to be 100 records of YouTube channels (row count test) --- (passed!!!) | ||
2. The data needs 4 fields (column count test) --- (passed!!!) | ||
3. The channel name column must be string format, and the other columns must be numerical data types (data type check) --- (passed!!!) | ||
4. Each record must be unique in the dataset (duplicate count check) --- (passed!!!) | ||
Row count - 100 | ||
Column count - 4 | ||
Data types | ||
channel_name = VARCHAR | ||
total_subscribers = INTEGER | ||
total_views = INTEGER | ||
total_videos = INTEGER | ||
Duplicate count = 0 | ||
*/ | ||
|
||
|
||
-- 1. Row count check | ||
|
||
SELECT | ||
COUNT(*) as no_of_rows | ||
FROM | ||
view_uk_youtubers_2024 | ||
|
||
|
||
-- 2. Column count check | ||
|
||
SELECT | ||
COUNT(*) as column_count | ||
FROM | ||
INFORMATION_SCHEMA.COLUMNS | ||
WHERE | ||
TABLE_NAME = 'view_uk_youtubers_2024' | ||
|
||
|
||
|
||
-- 3. Data type check | ||
|
||
|
||
SELECT | ||
COLUMN_NAME, | ||
DATA_TYPE | ||
FROM | ||
INFORMATION_SCHEMA.COLUMNS | ||
WHERE | ||
TABLE_NAME = 'view_uk_youtubers_2024' | ||
|
||
|
||
|
||
-- 4. Duplicate records check | ||
|
||
SELECT | ||
channel_name, | ||
COUNT(*) as duplicate_count | ||
FROM | ||
view_uk_youtubers_2024 | ||
GROUP BY | ||
channel_name | ||
HAVING | ||
COUNT(*) > 1 |