Project Cont'd
Consider an online video-sharing platform like YouTube which hosts tens of thousands of channels and crores of users.
You have to analyse the data and provide meaningful insights on the type of content that drives engagement, users growth, and many more to all the stakeholders. Let’s roll our sleeves up for an insightful analysis!
Database
The sample database consists of tables that store the information of users, channels, videos, genres and likes/dislikes.
Note:
channel_user table
channel_id | user_id | subscribed_datetime |
---|---|---|
100 | 1 | 2020-12-10 10:30:45 |
100 | 7 | 2020-10-10 11:30:45 |
... | ... | ... |
table stores the data of the channel_ids and their subscribers' user_ids.
First row in the table represents that the user with user_id = 1 is subscribed to the channel with channel_id = 100 at
user_likes table
user_id | video_id | reaction_type | reacted_at |
---|---|---|---|
1 | 10 | LIKE | 2020-12-10 10:30:45 |
7 | 10 | DISLIKE | 2020-10-10 11:30:45 |
... | ... | ... | ... |
Similarly,
table stores the data of video_id and the user_ids who reacted to the video.
video_genre table
video_id | genre_id |
---|---|
10 | 201 |
10 | 202 |
... | ... |
Similarly,
table stores the data of video_id and the ids of the genres that the corresponding video belongs to.
Let’s dive in to analyze the in and outs of each part of the data. Here we go!
1.
Fetch the top 10 videos having more number of views.
Note:
Sort the output in the descending order of no_of_views, and then in the ascending order of name.
Expected Output Format:
name | no_of_views |
---|---|
... | ... |
2.
Fetch the top 10 videos having more number of views, along with the channel details.
Note:
Sort the output in the descending order of no_of_views, and then in the ascending order of channel_name.
Expected Output Format:
video_name | no_of_views | channel_name |
---|---|---|
... | ... | ... |
3.
Get all the music videos released before the year 2016.
Note:
- You can consider the videos which contain "music" in name as music videos.
- Get the year in the integer format
- Sort the output in the descending order of year, and then in the ascending order of name.
Expected Output Format:
name | no_of_views | year |
---|---|---|
... | ... | ... |
4.
Get all distinct channels which published music videos before 2016.
Note:
- You can consider the videos which contain "music" in name as music videos.
- Sort the output in the ascending order of chanel_name.
Expected Output Format:
channel_id | channel_name |
---|---|
... | ... |
5.
Get all the review videos, i.e., videos which contain "review" in their name.
Note:
- Sort the output in the descending order of no_of_views.
Expected Output Format:
name | no_of_views |
---|---|
... | ... |
6.
Get all the unique channels that published review videos.
Note:
- You can consider the videos which contain "review" in name as review videos.
- Sort the output in the ascending order of channel_name.
Expected Output Format:
channel_id | channel_name |
---|---|
... | ... |
7.
Get all the videos that belong to "Action" genre (genre_id = 201) and have more than 1 lakh views.
Note:
- Sort the output in the ascending order of video_id.
Expected Output Format:
video_id | name | genre_id |
---|---|---|
... | ... | ... |
8.
Get all the Indian users details whose age is below 30 years and liked the video (video_id = 1011) in the year 2020.
Note:
- Consider the name of the country as "INDIA"
- Consider reaction_type LIKE as liked.
- Sort the output in the ascending order of name.
Expected Output Format:
name | gender | age | country | premium_membership |
---|---|---|---|---|
... | ... | ... | ... |
9.
Find the number of videos published between the years 2010 & 2016.
Note:
- Sort the output in the ascending order of the year
- Keep the year in the integer format
Expected Output Format:
year | no_of_videos |
---|---|
... | ... |
10.
Between 2010 & 2020, find the number of videos released in each of the below genres.
Note:
- genre_ids : 201, 202, 204, 205, 206, 207
- Sort the output in the descending order of no_of_videos, and then in the ascending order of genre_id.
Expected Output Format:
genre_id | no_of_videos |
---|---|
... | ... |
11.
Get the total number of channels in the database.
Expected Output Format
no_of_channels |
---|
... |
12.
Get the total number of users subscribed for the channel "Tyler Oakley" (channel_id = 376) in the year 2018.
Expected Output Format
no_of_subscribers |
---|
... |
13.
Get the total number of countries where the users of the platform are present.
Expected Output Format
country_count |
---|
... |
14.
Get the total number of countries where the subscribers of the Taylor Swift channel (channel_id = 399) are present.
Expected Output Format
country_count |
---|
... |
15.
Insights about users:
Get the geographic distribution of Taylor Swift channel (channel_id = 399) subscribers.
Note:
- Geographic distribution: Number of Taylor Swift subscribers present in each country. Ignore the countries whereno_of_subscribersis 0.
- Order the result in the ascending order of the country name.
Expected Output Format
country | no_of_subscribers |
---|---|
... | ... |
00000000