Analysis of Netflix Movie Rating Popularity in the United States

Final Report

  • Select to see Final Report
  • Motivation

    The rise of streaming platforms like Netflix has led to an explosion in the number of movies available to the public. With so much content to choose from, it can be overwhelming for viewers to make informed decisions about what to watch. Our visualization tool aims to address this problem by providing users with a way to better understand the landscape of film ratings throughout the years. By visualizing data about the Motion Picture Association rating system, as well the movies complexity and duration information, our tool will help users make more informed decisions about what to watch.

    Our tool is designed for movie enthusiasts and the general public who are interested in understanding the history and evolution of film culture. It will support two domain tasks: exploring a movie’s complexity and duration based on its ratings and analyzing movie ratings throughout the years. By supporting these tasks, our tool will enable users to better understand the cultural trends and shifts in the film industry, and make more informed decisions about what movies to watch.

    This visualization tool will cater to movie enthusiasts and the general public to better illustrate the progression of movie ratings throughout the years, as well as Motion Picture Association ratings and general film ratings available on Netflix. The tool will be able to display to the user the movie name, release year, ratings, and more. The user will also be able to interact with the plot to select certain periods of interest, as well as being able to select desired filters. By using this tool, the public will be able to get a better understanding of the Motion Picture Association rating system, as well as movie culture shifts throughout the years.

    Background

    Data

    The selected data comes from Kaggle (Netflix Dataset), and is a compiled list of Netflix movies, with related vari- ables such as cast, director, release date, ratings, and more. There are no inherent biases in the dataset, other than not knowing the credibility of the Kaggle user who compiled the data. Ethical consid- erations could be social responsibility, however, there is no personal user data and the societal impacts of this dataset are low ethically speaking.

    To clean the dataset, we used Python to sift through missing values and omit unnecessary data. Additionally, rows with null values were dropped. We then filtered the data such that the only country included with data was the United States. These steps brought the row count from around 8,000, to roughly 1800. For consistency purposes, The date column was then transformed into a proper datetime format. The word "minutes" was also removed from this column in order to have a cleaner appearance. A new column for complexity was also added based on the original description column, using the automated readability index, roughly based on the character length of the description column. Then, the description column was dropped, as it would not be relevant for our analyses purposes. Finally, originally the 'listed in' column, referring to the genre, was just a singular column that contained one or more ratings that the project was listed in. To better be able to manipulate the data, we found every type of genre listed in the rows, and created its own column based on those findings, while still keeping the "listed in" column. The new rows now show a 1 or 0 (yes or no), if it was listed in that specific genre or not. Finally, we created a secondary csv file called line.csv, which filtered out films before 1980. This was generated using data from the original csv to contain infomation neccessary to generate the line plot. There is a column for release year, a column for rating, and a column for the count. After cleaning, there were no unexpected values or outliers.

    The design of the visualization changed after the submission of pm2. Starting at pm-03 the plan was there would be three visuals in total that all relate to some aspect of movies. The lineplot would show the number of movies released over the years grouped by rating. Then you can hover over each line to see what it represents via tooltick. The bar chart shows the average duration of movies grouped by rating. Then the final part of the visual that has not been implemented will appear when you click on a specific year from the line graph, and will show the average complexity of movies from each rating of that year in a movie ticket shape. This will provide important information about movies based on the rating they recieved.

    This design then changed significantly after pm-03 due to the fact that we had to restart the code to fix linking. The design kept the scatterplot, as well as the tickets (just as an added bonus for the theme), but added a bar plot of the Ratings, looking at their average duration. This was then linked to a scatterplot showing the duration by complexity for all movies. You are able to brush over points in the scatterplot to highlight what the associated rating is in the bar chart. The bar chart also shows a tooltip of what ratings are included.

    "Movies Released by Year per Rating": This graph looks at all ratings included in the dataset, with an emphasis on how much TV-MA contributes to the overall count of movies. Alongside the legend, the tooltip function helps identify which rating is which when hovering over.

    "Complexity and Duration of All Movies": This graph looks at complexity by duration for all movies in the dataset. You can select via brushing as many points as you want to highlight in the neighboring bar graph which ratings from the main five utilized ratings (G, PG, PG-13, R, NR) are being seen.

    "Average Duration for Each Rating": This graph, as explained prior, is connected to the scatterplot and shows the average duration of movies for each rating via Tooltip.

    Movie Tickets: As an added bonus, we included movie tickets with average complexities for each rating. Click on a rating in the bar chart to see your ticket stub with the complexity average of that rating (you can click on as many as you want). Click the "clear output" button to restart and clear the tickets.

    Demo Video

    Visualization

    Filter By Rating

    PG PG-13 R TV-14 TV-MA TV-PG

    Movies Released Per Year By Rating

    Acknowledgements