Project Description
Consider an online video-sharing platform like YouTube which hosts tens of thousands of channels and crores of users.
You have to analyse the data and provide meaningful insights on the type of content that drives engagement, users growth, and many more to all the stakeholders. Let’s roll our sleeves up for an insightful analysis!
Database
The sample database consists of tables that store the information of users, channels, videos, genres and likes/dislikes.
Note:
channel_user table
channel_id | user_id | subscribed_datetime |
---|---|---|
100 | 1 | 2020-12-10 10:30:45 |
100 | 7 | 2020-10-10 11:30:45 |
... | ... | ... |
First row in the table represents that the user with user_id = 1 is subscribed to the channel with channel_id = 100 at
user_likes table
user_id | video_id | reaction_type | reacted_at |
---|---|---|---|
1 | 10 | LIKE | 2020-12-10 10:30:45 |
7 | 10 | DISLIKE | 2020-10-10 11:30:45 |
... | ... | ... | ... |
Similarly,
video_genre table
video_id | genre_id |
---|---|
10 | 201 |
10 | 202 |
... | ... |
Similarly,
Let’s dive in to analyze the in and outs of each part of the data. Here we go!
1. Get all the videos with more than 1 lakh views.
Note:
Output must be in the alphabetical order of video
Expected Output Format
video_id | name | duration_in_secs | published_datetime | no_of_views | channel_id |
---|---|---|---|---|---|
... | ... | ... | ... | ... |
2. Get videos from TEDx channel (id=353) with more than 50 thousand views.
Note:
Sort the output in the descending order of
Expected Output Format
video_id | name | duration_in_secs | no_of_views |
---|---|---|---|
... | ... | ... | ... |
3. Get the top 10 most viewed videos till date.
Note:
Sort the output in the descending order of
Expected Output Format
name | channel_id | no_of_views |
---|---|---|
... | ... | ... |
4. Get all the recent movie trailers that have more than 1 lakh views.
Note:
- Consider the videos that have "trailer" in theirnameas trailers.
- Sort the output in the descending order ofno_of_viewsandpublished_datetime
Expected Output Format
name | channel_id | no_of_views | published_datetime |
---|---|---|---|
... | ... | ... | ... |
5. Get all the videos that are released in the year 2018.
Note:
Sort the output in the descending order of
Expected Output Format
video_id | name | duration_in_secs | no_of_views |
---|---|---|---|
... | ... | ... | ... |
6. Get the distinct ids of videos that belong to the following genres.
genre_id | genre |
---|---|
201 | Comedy |
202 | Action |
203 | Thriller |
211 | Scifi |
Note:
Sort the output in the descending order of
Expected Output Format
video_id |
---|
... |
7. Get all the esport videos that crossed one lakh views and were released between 2018 and 2020.
Note:
- Consider the videos that have "esport" in theirnameas gaming videos.
- Sort the output in the descending order ofno_of_viewsandpublished_datetime
Expected Output Format
name | published_datetime | no_of_views |
---|---|---|
... | ... | ... |
8. Get the total number of channels in the database as
Expected Output Format
channels_count |
---|
... |
9. Get the highest and least number of views for the videos in the database.
Expected Output Format
highest_number_of_views | least_number_of_views |
---|---|
... | ... |
10. Get the average number of views for the videos released by the "Single Shot" Channel (id = 373)
Expected Output Format
avg_views |
---|
... |
11. Get the total number of premium users in the platform as
Expected Output Format
premium_users_count |
---|
... |
12. Get the number of male and female premium users in the platform.
Expected Output Format
gender | total_users |
---|---|
F | ... |
M | ... |
..................