MIlestone 1 Assignment - 2

Project Cont'd

Consider an online video-sharing platform like YouTube which hosts tens of thousands of channels and crores of users.

You have to analyse the data and provide meaningful insights on the type of content that drives engagement, users growth, and many more to all the stakeholders. Let’s roll our sleeves up for an insightful analysis!

Database

The sample database consists of tables that store the information of users, channels, videos, genres and likes/dislikes.

Note:

channel_user table

channel_iduser_idsubscribed_datetime
10012020-12-10 10:30:45
10072020-10-10 11:30:45
.........

channel_user

table stores the data of the channel_ids and their subscribers' user_ids.

First row in the table represents that the user with user_id = 1 is subscribed to the channel with channel_id = 100 at

2020-12-10 10:30:45

user_likes table

user_idvideo_idreaction_typereacted_at
110LIKE2020-12-10 10:30:45
710DISLIKE2020-10-10 11:30:45
............

Similarly,

user_likes

table stores the data of video_id and the user_ids who reacted to the video.

video_genre table

video_idgenre_id
10201
10202
......

Similarly,

video_genre

table stores the data of video_id and the ids of the genres that the corresponding video belongs to.

Let’s dive in to analyze the in and outs of each part of the data. Here we go!

1. Get the total number of users in the platform as

users_count
.

Expected Output Format

users_count
...

2. Get the total number of distinct countries where the users are located. Country of the user is present in the

user
table.

Expected Output Format

countries_count
...

3. Get the number of videos uploaded by each channel.

Expected Output Format

channel_idvideos_count
......

4. Get the ids of all the channels that have uploaded at least 50 videos.

Note:

  • Sort the output in the ascending order of the
    channel_id

Expected Output Format

channel_id
...

5. For all the videos, represent the number of views in multiples of thousands.

For example, if the number of views of a video is 17,200, it is represented as 17.2 in the output.

Note:

  • Sort the output in the descending order of
    no_of_views_in_thousands
    , and then in the alphabetical order of
    name

Expected Output Format

video_idnameno_of_views_in_thousands
.........

6. Find the sum of durations of the videos published by each channel in hours.

Note:

  • The output must contain the duration as number of hours.
  • Sort the output in the descending order of the
    no_of_hours

Expected Output Format

channel_idno_of_hours
......

7. Categorise the performance of all the videos released by the “Motivation grid” Channel (id = 350).

Performance of a video is measured based on the number of views of the video.

Categorization

no_of_viewscategory
<= 10000poor
10000 < views <= 100000 andaverage
> 100000good

Note

Sort the output in the ascending order of

published_datetime

Expected Output Format

nameno_of_viewscategory
.........

8. Get the number of videos released in each year.

Note:

  • For this question, convert the
    year
    in string datatype to INT datatype.
  • Sort the output in the ascending order of
    year

Expected Output Format

yearno_of_videos
......

9. For Marvel channel (id = 351), get the number of subscribers added in each month in the year 2020.

Note:

  • You can find the subscribed date of a user for a channel in the
    channel_user
    table.
  • For this question, convert the
    month_of_year
    in string datatype to INT datatype.
  • Sort the output in the ascending order of the month.

Expected Output Format

month_of_yearno_of_subscribers
......

10. Get the number of reactions (likes/dislikes) generated in each hour of the day in the year 2020.

Note:

  • For this question, convert the
    hour_of_day
    in string datatype to INT datatype.
  • Sort the output in the ascending order of
    hour_of_day

Expected Output Format

hour_of_dayno_of_reactions
0500
12450
....
....
23400

11. Get all the channel_ids that uploaded at least one video in "AI/ML" or "Robotics" technologies between 2018 and 2021.

Note:

  • Consider all the videos that have any of the technologies mentioned above in their
    name
  • Sort the output in the ascending order of
    channel_id

Expected Output Format

channel_id
...

12.

Get all the channel_ids that uploaded at least 20 videos in "AI/ML", "Cyber Security", "Data Science" or "Robotics" technologies between 2018 and 2021.

Example: If a channel publishes 5 videos in AI/ML, 10 videos in Cyber Security and 5 videos in Data Science, consider the channel.

Note:

  • Consider all the videos that have any of the technologies mentioned above in their
    name
  • Sort the output in the ascending order of channel_id.

Expected Output Format

channel_id
...

...............

Post a Comment

Please Select Embedded Mode To Show The Comment System.*

Previous Post Next Post

Contact Form