Skip to content

need help in creating the X and y dataset for fastFM rating prediction #128

@chrisbangun

Description

@chrisbangun

Hi,

am I doing things correctly here while building the dataset that valid for fastFM?

So basically, I have a dataframe containing my user-item interaction, along with the context/features and the labels. I then split this dataframe into two: 1) X which contains my user-item interaction along with the features, and 2) y which is the rating.

I then convert my dataframe X into python dictionary and then use sklearn Dictvectorizer in order to create the scipy sparse matrix. I then feed it to the fastFM model. here are the code example:

X_train = train_interaction[['profile_id_encoded', 'item_id_encoded',
                            'popularity_score', 'is_last_interaction']]

y_train = train_interaction['ratings'].values.squeeze()
                            
X_val = val_interaction[['profile_id_encoded', 'item_id_encoded',
                            'popularity_score', 'is_last_interaction']]
y_val = val_interaction['ratings'].values.squeeze()

# X_train and X_val are dataframe while y_train and y_val are now np.array

X_train_dicts = X_train.to_dict('records')
X_val_dicts = X_val.to_dict('records')

from sklearn.feature_extraction import DictVectorizer
import scipy.sparse as sp

vec = DictVectorizer()
vectorizer = vec.fit_transform(X_train_dicts)

#below i convert the csr matrix into csc_matrix
fm_X_train = sp.csc_matrix(vectorizer)

fm = als.FMRegression(n_iter=10000, init_stdev=0.1, l2_reg_w=0, l2_reg_V=0, rank=5)

fm.fit(fm_X_train, y_train)

# prepare for prediction
vec = DictVectorizer()
vectorizer = vec.fit_transform(X_val_dicts)
fm_X_val = sp.csc_matrix(vectorizer)

y_pred = fm.predict(fm_X_val)

print(mean_squared_error(y_pred, y_val)) 

the MSE is bad tho: 93%

did I do things correctly here? really appreciate any help, thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions