- Given the attributes of the movies, group the similar movies together by using clustering analysis.
- Given the users rating histories, identify the association rules among the movies to predict what movies the users are going to watch next.
We selected The Movies Dataset from Kaggle which consists of 7 CSV files, containing metadata for all 45,000 movies released on or before July 2017 that are listed in the Full MovieLens Dataset, and 26 million ratings from 270,000 users for all 45,000 movies.
- K-Means and K-Modes Clustering
- Hierachical Clustering
- Apriori Algorithm for association rules mining
- Apriori Algorithm for subsets of data (genre of movie)
You can view the team presentation via this link
K-Means and K-Modes clustering are chosen to cluster similar movies together. By having these clusters, it helps service provider to manage their contents more efficiently, and it also enables users to choose their favorite movies easily. Furthermore, by using association rules, we discovered a lot of rules with high lift that can be used to recommend the right movies to the users. So, the overall users experience will improve. The figure below shows a movie recommender system interface to a audience who has watched "The Sixth Sense".