Project Cont'd
Consider an online video-sharing platform like YouTube which hosts tens of thousands of channels and crores of users.
You have to analyse the data and provide meaningful insights on the type of content that drives engagement, users growth, and many more to all the stakeholders. Let’s roll our sleeves up for an insightful analysis!
Database
The sample database consists of tables that store the information of users, channels, videos, genres and likes/dislikes.
Note:
channel_user table
channel_id | user_id | subscribed_datetime |
---|---|---|
100 | 1 | 2020-12-10 10:30:45 |
100 | 7 | 2020-10-10 11:30:45 |
... | ... | ... |
table stores the data of the channel_ids and their subscribers' user_ids.
First row in the table represents that the user with user_id = 1 is subscribed to the channel with channel_id = 100 at
user_likes table
user_id | video_id | reaction_type | reacted_at |
---|---|---|---|
1 | 10 | LIKE | 2020-12-10 10:30:45 |
7 | 10 | DISLIKE | 2020-10-10 11:30:45 |
... | ... | ... | ... |
Similarly,
table stores the data of video_id and the user_ids who reacted to the video.
video_genre table
video_id | genre_id |
---|---|
10 | 201 |
10 | 202 |
... | ... |
Similarly,
table stores the data of video_id and the ids of the genres that the corresponding video belongs to.
Let’s dive in to analyze the in and outs of each part of the data. Here we go!
1.
Get the top 10 channels for which more number of users are subscribed in the year 2018.
Note:
In case, if the no_of_subscribers are same, then sort the output in the ascending order of channel_name.
Expected Output Format:
channel_id | channel_name | no_of_subscribers |
---|---|---|
... | ... | ... |
2.
Get the number of users who possitively engaged with at least one video of Disney Channel (channel_id = 352).
Note:
- Consider possitive engagement as LIKE for a video uploaded by Disney channel.
Expected Output Format:
no_of_users_reached |
---|
... |
3.
Get the number of subscribers for each channel.
Note:
- Sort the output in the descending order of no_of_subscribers, and then in the ascending order of channel_name.
- If there are no subscribers for a channel is 0, then keep the no_of_subscribers as 0.
Expected Output Format:
channel_id | channel_name | no_of_subscribers |
---|---|---|
... | ... | ... |
4.
Get the number of videos uploaded by the "News for you" channel in the year in 2018.
Expected Output Format
no_of_videos |
---|
... |
5.
Get the number of users subscribed for the "Taylor Swift" channel in every month in the year 2018.
Note:
- Sort the output in the ascending order of month_in_2018.
- Ignore the months that have subscribers_per_month as 0.
Expected Output Format
month_in_2018 | subscribers_per_month |
---|---|
1 | .... |
2 | .... |
6.
Get the number of videos published by each channel.
Note:
- If a channel did not upload any video, keep the no_of_videos as 0.
- Sort the output in the ascending order of channel_name.
Expected Output Format
channel_name | no_of_videos |
---|---|
... | ... |
7.
Get all the channels that published at least 5 videos in the year 2018.
Note:
- Sort the output in the ascending order of channel_id.
Expected Output Format
channel_id | channel_name | no_of_videos |
---|---|---|
... | ... | ... |
8.
How many times each user has engaged with the videos of "News for you" channel (id = 366).
Note:
- Consider engagement as LIKE or DISLIKE for a video uploaded by News for you channel.
- Ignore the users who did not engage with the channel at least once.
- Sort the output in the descending order of no_of_reactions, and then in the ascending order of user_id.
Expected Output Format
user_id | no_of_reactions |
---|---|
... | ... |
9.
Get all the videos that have more than the average number of views.
Note:
- Sort the output in the ascending order of name.
Expected Output Format
name | no_of_views |
---|---|
... | ... |
10.
Get all the distinct user_ids who liked at least one video uploaded by Android Authority Channel (channel__id = 364) but didn't like the video uploaded by Tech savvy channel with video_id = 1005.
Note:
- Consider reaction_type LIKE as liked
- Sort the output in the ascending order of user_id.
Expected Output Format
user_id |
---|
... |
11.
Get to top 5 viewed videos which belong to both the genres "Action" (genre_id = 201 ) and "Comedy" (genre_id = 202).
Note:
- In case, if the no_of_views are same, then sort the output in the ascending order of video_name.
Expected Output Format
video_name | no_of_views |
---|---|
... | ... |
12.
Get the top 5 viewed videos that belong to the "GAMING" genre_type.
Note:
- Sort the output in the descending order of no_of_views, and then in the ascending order of video_name.
Expected Output Format
video_name | no_of_views |
---|---|
... | ... |
13.
Best time to upload a comedy video:
DunkFest channel is planning to upload a video in the "COMEDY" genre Give the channel the best suitable hour of the day when users positively engage more with comedy videos.
Note:
- Consider positive engagement as LIKE for the videos in the "COMEDY" genre_type.
- Consider reaction_type LIKE as liked.
- Return the hour in the integer format
Expected Output Format
hour_of_engagement | no_of_likes |
---|---|
5 | ... |
14.
Get the active users of the platform. Consider the users who liked at least fifty videos are considered active users.
Note:
- Consider reaction_type LIKE as liked.
- Sort the output in the ascending order of user_id.
Expected Output Format
active_user_id | no_of_likes |
---|---|
... | ... |
15.
Get all the user_ids who liked at least 5 videos published by "Tedx" channel.
Note:
- Consider reaction_type LIKE as liked.
- Sort the output in the descending order of no_of_likes, and then in the ascending order of active_user_id.
Expected Output Format
active_user_id | no_of_likes |
---|---|
... | ... |
16.
Get all the potential users. Fetch the user_ids who liked at least 2 videos published by "Disney" channel, and who did not subscribe to the channel (channel_id = 352).
Note:
- Consider reaction_type LIKE as liked.
- Sort the output in the descending order of no_of_likes, and then in the ascending order of potential_user_id.
Expected Output Format
potential_user_id | no_of_likes |
---|---|
... | ... |
17.
Get top 5 genres in the platform with most positive user activity, i.e., the genres with videos having more number of likes.
Note:
- Consider the reaction_type LIKE as liked.
- If a video belongs to 3 genres, then the likes of the video is counted in all the 3 genres.
- Sort the output in the descending order of no_of_likes, and then in the ascending order of the genre_type.
Expected Output Format
genre_id | genre_type | no_of_likes |
---|---|---|
... | ... | ... |
18.
Get the top 3 genre_ids that are liked by the users in India in the year 2018.
Note:
- Consider the name of the country as "INDIA"
- Consider reaction_type LIKE as liked. If a video belongs to 3 genres, then the like is counted in all the 3 genres.
- Sort the output in the descending order of no_of_likes, and then in the ascending order of the genre_id.
Expected Output Format
genre_id | no_of_likes |
---|---|
... | ... |