Movie Recommendation System

Authors: Prof. D. P. Dhargalkar, Neha Gore, Rutuja Kadam, Gaurav Patil

DOI Link: https://doi.org/10.22214/ijraset.2022.40225

Abstract

Now a day’s recommendation system has changed the style of searching the things of our interest. This is information filtering approach that is used to predict the preference of that user. The most popular areas where recommender system is applied are books, news, articles, music, videos, movies etc. In this paper we have proposed a movie recommendation system named MOVREC. It is based on collaborative filtering approach that makes use of the information provided by users, analyzes them and then recommends the movies that is best suited to the user at that time. The recommended movie list is sorted according to the given to these movies by previous users and it uses K-means algorithm for this purpose. MOVREC also help users to find the movies of their choices based on the movie experience of other users in efficient and effective manner without wasting much time in useless browsing. This system has been developed in PHP using Dreamweaver 6.0 and Apache Server 2.0. The presented recommender system generates recommendations using various types of knowledge and data about users, the available items, and previous transactions stored in customized databases. The user can then browse the recommendations easily and find a movie of their choice.

Introduction

I. INTRODUCTION

In today’s world where internet has become an important part of human life, users often face the problem of too much choice. Right from looking for a motel to looking for good investment options, there is too much information available. To help the users cope with this information explosion, companies have deployed recommendation systems to guide their users. The research in the area of recommendation systems has been going on for several decades now, but the interest still remains high because of the abundance of practical applications and the problem rich domain. A number of such online recommendation systems implemented and used are the recommendation system for books at Amazon.com , for movies at MovieLens.org, CDs at CDNow.com (from Amazon.com), etc. Recommender Systems have added to the economy of the some of the e-commerce websites (like Amazon.com) and Netflix which have made these systems a salient parts of their websites. Recommender Systems generate recommendations; the user may accept them according to their choice and may also provide, immediately or at a next stage, an implicit or explicit feedback. The actions of the users and their feedbacks can be stored in the recommender database and may be used for generating new recommendations in the next user-system interactions. The economic potential of theses recommender systems have led some of the biggest e-commerce websites (like Amazon.com, snapdeal.com) and the online movie rental company Netflix to make these systems a salient part of their websites. High quality personalized recommendations add another dimension to user experience. The web personalized recommendation systems are recently applied to provide different types of customized information to their respective users. These systems can be applied in various types of applications and are very common now a day. The recommendation plays a crucial role in practical life , advice in form of recommendation is being taken, generally from an experienced person, because experience makes the efficient outcomes, moreover using a product can have a personal experience, Similarly in modern world as technology surrounds us, there is a need of recommendation from machine. The similar example in daily life are the OTT platforms, the moment a user watches or searches a specific movie further the user gets to see the popup of the recommended movies as per his/her interest on the top list. The technique is used to meet the customer needs and at the same time getting benefitted by the delivering the contents as per the user

We can classify the recommender systems in two broad categories:

A. Collaborative Filtering

Collaborative filtering system recommends items based on similarity measures between users and/or items. The system recommends those items that are preferred by similar kind of users. Collaborative filtering has many advantages 1. It is content-independent i.e. it relies on connections only 2. Since in CF people makes explicit ratings so real quality assessment of items are done. 3. It provides serendipitous recommendations because recommendations are base on user’s similarity rather than item’s similarity.

B. Content-based Filtering

Content-based filtering is based on the profile of the user’s preference and the item’s description. In CBF to describe items we use keywords apart from user’s profile to indicate user’s preferred liked or dislikes. In other words CBF algorithms recommend those items or similar to those items that were liked in the past.

It examines previously rated items and recommends best matching item. There are various approaches proposed in various research papers listed below. These approaches are often combined in Hybrid Recommender Systems. An earlier study by Eyjolfsdottir et. al for the recommendation of movies through MOVIEGEN had certain drawbacks such as , it asks a series of questions to users which was time taking . On the other hand it was not user friendly for the fact that it proved to be stressful to a certain extent. Keeping in mind these shortcomings, we have developed MovieREC, a movie recommendation system that recommends movies to users based on the information provided by the users themselves. In the present study, a user is given the option to select his choices from a set of attributes which include actor, director, genre, year and rating etc. We predict the users choices based on the choices of the previous visited history of users. The system has been developed in PHP and currently uses a simple console based interface.

II. RELATED WORK

Many recommendation systems have been developed over the past decades. These systems use different approaches like collaborative approach, content based approach, a utility base approach, hybrid approach etc. Looking at the purchase behavior and history of the shoppers, Lawrence et al. 2001 presented a recommender system which suggests the new product in the market. To refine the recommendation collaborative and content based filtering approach were used. To find the potential customers most of the recommendation systems today use ratings given by previous users. These ratings are further used to predict and recommend the item of one’s choice. In 2007 Weng, Lin and Chen performed an evaluation study which says using multidimensional analysis and additional customer’s profile increases the recommendation quality. Weng used MD recommendation model (multidimensional recommendation model) for this purpose. multidimensional recommendation model was proposed by Tuzhilin and Adomavicius (2001).

A. The Basic K-means Algorithm

The original K-means algorithm was proposed by MacQueen [20] .The ISODATA algorithm by Ball and Hall[22] was an early but sophisticated version of k-means. Clustering divides the objects into meaningful groups. Clustering is unsupervised learning. Document clustering is automatic document organization. In K-means clustering technique we choose K initial centroids, where K is the desired number of clusters. Each point is then assigned to the cluster with nearest mean i.e. the centroid of the cluster. Then we update the centroid of each cluster based on the points that are assigned to the cluster. We repeat the process until there is no change in the cluster center (centroid). Finally, this algorithm aims at minimizing an objective function, in this case a squared error function.

B. Data Description

In proposed model we use a pre filter before applying Kmeans algorithm. The attributes used to calculate distance of each point from centroid are Genre, Actor, Director ,Year ,Rating Different attributes have different weights. In our research we have found that the most appropriate recommendations that can be generated should be based on the ratings given to the movies by previous users, therefore we have given more importance to the rating attribute than other attributes. These ratings have been taken from www.imdb.com because perhaps it has the largest collection of movies along with the rating given to these movies by a large number of different users from different parts of the world. Another important parameter in our proposed model is total number of votes received by a particular movie. We have divided number of votes in to three categories that is less than or equal to 1000, more than 1000 but less than or equal to 10,000 and greater than 10,000. International Journal of Computer Applications (0975 – 8887) Volume 124 – No.3, August 2015 9 Wm= Wr + Wa +Wd +Wg +Wy In our research we have found that as the number of vote’s increases the weight of rating should also increase respectively. There fore we have used ratios of 1:1, 1:2, and 1:3 depending on total number of votes received by a movie. we have also found that the movies which have rating less than 5 are the ones which are least suitable for recommendation, and are least desirable by users. Users generally want to see a good movie and higher rating ensures that our predicted movie set are of those movies which are liked by a large number of users. Weights assigned to other attributes are generally based on the average of total movies associated with that particular attribute to the total number of movies in our data set.

C. Simulation of MOVREC

When any user enters our system MOVREC he has a couple of options. He /she can search a particular movie or see upcoming movies list or can go to our recommendation page. On recommendation page he is given the choice to select/input values for different attributes. On the basis of these input values, we search our search our database and prepare an array of suitable movies. Movies included in the array are those whose even one attribute value matches with the input value of the user. We then calculate the number of movies in our array with the help of a counter. If the counter value is less than or equal to twenty we display the movie list sorted according to ratings associated with the movies. If number of movies is greater than twenty then we apply a pre filter and select top twenty movies according to rating.

If two movies have same rating then priority is given to the movie having a large number of votes. After filtering the movie list we match the attributes value to their respective weights and compute the total weight of each movie. Once we have calculated the total weight of each movie we apply K-means clustering algorithm on these group of movies. In our research we have also found that generally a user prefer a list with five movies so we assume K equal to be 4 so that an average every K has five movies, where K is the number of cluster to be formed. For each cluster k1, k2 , k3, k4 we assume initial centroid c1, c2, c3, c4 which corresponds to the first, sixth, eleventh, and sixteenth movie in the movie array. After defining the initial centroid we compute the distance of all the other data points from each centroid and assign the remaining data points (movies) to closest centroid and form clusters. The distance measure we have used to calculate the distance between data points and centroid is the Euclidean Distance. After forming initial clusters we take one cluster at a time. We again calculate centroids but this time each centroid corresponds to mean of the points in that cluster. After recalculating centroids we compute the distance of all data points with respect to these newly formed centroids and reassign them to form clusters. We repeat this process till there is no change in centroids. This ensures that the clusters finally formed are optimized and no further grouping is possible. Once final cluster are formed we compute the average rating of all points belonging to that cluster i.e. cluster rating, then according to the input user query we display the cluster having highest cluster rating.

Weightage and matching of attributes

Actor (Wa ) Wa= No. of movies of Actor(a) in data set

Total no. of movies in data se

D. Proposed Algorithm

Input: a number of movies: m
Output: a number of clusters: K

a. Step 1 Select n movies from m movies n20 then select top 20 movies from n movies based on ratings. Else display the output movies sorted by rating.

b. Step 3 If rating of movies x, y are equal i.e. If Rx= Ry Then select those movies which have greater number of user votes. Step 4 Assume K=4.

c. Step 5 REPEAT (6, 7)

d. Step 6 Chose initial centroid C1, C2, C3, C4.

e. Step 7 Calculate Euclidean distance of all data points w.r.t. C1, C2, C3, C4 and re-compute the centroid of each cluster.

F. Step 8 UNTILL centroid does not change.

Where,

m: Total number of movies in database

n: Number of movies after user query

x, y: Two random movies

Rx , Ry : Rating of movies x, y

K: Number of cluster

C1, C2, C3, C4: Initial Centroid.

Conclusion

In this paper we have introduced MovieREC, a recommender system for movie recommendation. It allows a user to select his choices from a given set of attributes and then recommend him a movie list based on the cumulative weight of different attributes and using K-means algorithm. By the nature of our system, it is not an easy task to evaluate the performance since there is no right or wrong recommendation; it is just a matter of opinions. Based on informal evaluations that we carried out over a small set of users we got a positive response from them. We would like to have a larger data set that will enable more meaningful results using our system. Additionally we would like to incorporate different machine learning and clustering algorithms and study the comparative results. Eventually we would like to implement a web based user interface that has a user database, and has the learning model tailored to each userv

References

[1] Han J., Kamber M., “Data Mining: Concepts and Techniques”, Morgan Kaufmann (Elsevier), 2006. [2] Ricci and F. Del Missier, “Supporting Travel Decision making Through Personalized Recommendation,” Design Personalized User Experience for e-commerce, pp. 221-251, 2004. [3] Steinbach M., P Tan, Kumar V., “Introduction to Data Mining.” Pearson, 2007. [4] Jha N K, Kumar M, Kumar A, Gupta V K “Customer classification in retail marketing by data mining” International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 ISSN 2229- 5518

Copyright

Copyright © 2022 Prof. D. P. Dhargalkar, Neha Gore, Rutuja Kadam, Gaurav Patil. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET40225

Publish Date : 2022-02-04

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here