Project Cont'd
Consider an online video-sharing platform like YouTube which hosts tens of thousands of channels and crores of users.
You have to analyse the data and provide meaningful insights on the type of content that drives engagement, users growth, and many more to all the stakeholders. Let’s roll our sleeves up for an insightful analysis!
Database
The sample database consists of tables that store the information of users, channels, videos, genres and likes/dislikes.
Note:
channel_user table
channel_id | user_id | subscribed_datetime |
---|---|---|
100 | 1 | 2020-12-10 10:30:45 |
100 | 7 | 2020-10-10 11:30:45 |
... | ... | ... |
table stores the data of the channel_ids and their subscribers' user_ids.
First row in the table represents that the user with user_id = 1 is subscribed to the channel with channel_id = 100 at
user_likes table
user_id | video_id | reaction_type | reacted_at |
---|---|---|---|
1 | 10 | LIKE | 2020-12-10 10:30:45 |
7 | 10 | DISLIKE | 2020-10-10 11:30:45 |
... | ... | ... | ... |
Similarly,
table stores the data of video_id and the user_ids who reacted to the video.
video_genre table
video_id | genre_id |
---|---|
10 | 201 |
10 | 202 |
... | ... |
Similarly,
table stores the data of video_id and the ids of the genres that the corresponding video belongs to.
Let’s dive in to analyze the in and outs of each part of the data. Here we go!
1. Get the total number of users in the platform as
Expected Output Format
users_count |
---|
... |
2. Get the total number of distinct countries where the users are located. Country of the user is present in the
Expected Output Format
countries_count |
---|
... |
3. Get the number of videos uploaded by each channel.
Expected Output Format
channel_id | videos_count |
---|---|
... | ... |
4. Get the ids of all the channels that have uploaded at least 50 videos.
Note:
- Sort the output in the ascending order of thechannel_id
Expected Output Format
channel_id |
---|
... |
5. For all the videos, represent the number of views in multiples of thousands.
For example, if the number of views of a video is 17,200, it is represented as 17.2 in the output.
Note:
- Sort the output in the descending order ofno_of_views_in_thousands, and then in the alphabetical order ofname
Expected Output Format
video_id | name | no_of_views_in_thousands |
---|---|---|
... | ... | ... |
6. Find the sum of durations of the videos published by each channel in hours.
Note:
- The output must contain the duration as number of hours.
- Sort the output in the descending order of theno_of_hours
Expected Output Format
channel_id | no_of_hours |
---|---|
... | ... |
7. Categorise the performance of all the videos released by the “Motivation grid” Channel (id = 350).
Performance of a video is measured based on the number of views of the video.
Categorization
no_of_views | category |
---|---|
<= 10000 | poor |
10000 < views <= 100000 and | average |
> 100000 | good |
Note
Sort the output in the ascending order of
Expected Output Format
name | no_of_views | category |
---|---|---|
... | ... | ... |
8. Get the number of videos released in each year.
Note:
- For this question, convert theyearin string datatype to INT datatype.
- Sort the output in the ascending order ofyear
Expected Output Format
year | no_of_videos |
---|---|
... | ... |
9. For Marvel channel (id = 351), get the number of subscribers added in each month in the year 2020.
Note:
- You can find the subscribed date of a user for a channel in thechannel_usertable.
- For this question, convert themonth_of_yearin string datatype to INT datatype.
- Sort the output in the ascending order of the month.
Expected Output Format
month_of_year | no_of_subscribers |
---|---|
... | ... |
10. Get the number of reactions (likes/dislikes) generated in each hour of the day in the year 2020.
Note:
- For this question, convert thehour_of_dayin string datatype to INT datatype.
- Sort the output in the ascending order ofhour_of_day
Expected Output Format
hour_of_day | no_of_reactions |
---|---|
0 | 500 |
1 | 2450 |
.. | .. |
.. | .. |
23 | 400 |
11. Get all the channel_ids that uploaded at least one video in "AI/ML" or "Robotics" technologies between 2018 and 2021.
Note:
- Consider all the videos that have any of the technologies mentioned above in theirname
- Sort the output in the ascending order ofchannel_id
Expected Output Format
channel_id |
---|
... |
12.
Get all the channel_ids that uploaded at least 20 videos in "AI/ML", "Cyber Security", "Data Science" or "Robotics" technologies between 2018 and 2021.
Example: If a channel publishes 5 videos in AI/ML, 10 videos in Cyber Security and 5 videos in Data Science, consider the channel.
Note:
- Consider all the videos that have any of the technologies mentioned above in theirname
- Sort the output in the ascending order of channel_id.
Expected Output Format
channel_id |
---|
... |
...............