-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathSource_code.py
More file actions
67 lines (43 loc) · 2.04 KB
/
Source_code.py
File metadata and controls
67 lines (43 loc) · 2.04 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#!/usr/bin/env python
# coding: utf-8
# In[ ]:
import numpy as np
import pandas as pd
column_names = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('u.data', sep='\t', names=column_names)
print(df.head())
movie_titles = pd.read_csv('Movie_Id_Titles')
movie_titles.head()
# merge dataset
df = pd.merge(df, movie_titles, on='item_id')
print(df.head())
import matplotlib.pyplot as plt
import seaborn as sns
get_ipython().run_line_magic('matplotlib', 'notebook')
sns.set_style('white')
# create a ratings dataframe with average rating and number of ratings:¶
df.groupby('title')['rating'].mean().sort_values(ascending=False).head(10)
df.groupby('title')['rating'].count().sort_values(ascending=False).head(10)
ratings =pd.DataFrame(df.groupby('title')['rating'].mean())
print(ratings.head())
# Set the number of ratings column:
ratings['rating_numbers'] = pd.DataFrame(df.groupby('title')['rating'].count())
print(ratings.head())
# Number of ratings histogram
ratings['rating_numbers'].hist(bins=70)
# Average rating per movie histogram
ratings['rating'].hist(bins=70)
# Relationship between the average rating and the actual number of ratings
# The larger the number of ratings, the more likely the rating of a movie is
sns.jointplot(x='rating', y='rating_numbers', data=ratings, alpha=0.5)
# ## Recommending Similar Movies
# Let's create a matrix that has the user ids on one access and the movie title on another axis. Each cell will then consist of the rating the user gave to that movie. The NaN values are due to most people not having seen most of the movies.
moviemat = df.pivot_table(index='user_id', columns='title', values='rating')
print(moviemat.head())
# ##### Most rated movies
ratings.sort_values('rating_numbers', ascending=False).head(10)
# #### Let's choose two movies for our system: Starwars, a sci-fi movie. And Liar Liar, a comedy.
# What are the user ratings for those two movies?
starwars_user_ratings = moviemat['Star Wars (1977)']
liar_liar_user_ratings =moviemat['Liar Liar (1997)']
print(starwars_user_ratings.head())