Built-in datasets

MovieLens

@author: Quoc-Tuan Truong <tuantq.vnu@gmail.com>

MovieLens: https://grouplens.org/datasets/movielens/

cornac.datasets.movielens.load_100k(fmt='UIR')[source]

Load the MovieLens 100K dataset

Parameters:fmt (str, default: 'UIR') – Data format to be returned.
Returns:data – Data in the form of a list of tuples depending on the given data format.
Return type:array-like
cornac.datasets.movielens.load_1m(fmt='UIR')[source]

Load the MovieLens 1M dataset

Parameters:fmt (str, default: 'UIR') – Data format to be returned.
Returns:data – Data in the form of a list of tuples depending on the given data format.
Return type:array-like
cornac.datasets.movielens.load_plot()[source]

Load the plots of movies provided @ http://dm.postech.ac.kr/~cartopy/ConvMF/

Returns:movie_plots – A dictionary with keys are movie ids and values are text plots.
Return type:Dict

Netflix

@author: Quoc-Tuan Truong <tuantq.vnu@gmail.com>

Data: https://www.kaggle.com/netflix-inc/netflix-prize-data/

cornac.datasets.netflix.load_data(fmt='UIR')[source]

Load the Netflix entire dataset - Number of ratings: 100,480,507 - Number of users: 480,189 - Number of items: 17,770

Parameters:fmt (str, default: 'UIR') – Data format to be returned.
Returns:data – Data in the form of a list of tuples depending on the given data format.
Return type:array-like
cornac.datasets.netflix.load_data_small(fmt='UIR')[source]

Load a small subset of the Netflix dataset. We draw this subsample such that every user has at least 10 items and each item has at least 10 users. - Number of ratings: 607,803 - Number of users: 10,000 - Number of items: 5,000

Parameters:fmt (str, default: 'UIR') – Data format to be returned.
Returns:data – Data in the form of a list of tuples depending on the given data format.
Return type:array-like

Tradesy

@author: Quoc-Tuan Truong <tuantq.vnu@gmail.com>

Original data: http://jmcauley.ucsd.edu/data/tradesy/ This data is used in the VBPR paper. After cleaning the data, we have: - Number of feedback: 394,421 (410,186 is reported but there are duplicates) - Number of users: 19,243 (19,823 is reported due to duplicates) - Number of items: 165,906 (166,521 is reported due to duplicates)

cornac.datasets.tradesy.load_data()[source]

Load the feedback observations

Returns:data – Data in the form of a list of tuples (user, item, 1).
Return type:array-like
cornac.datasets.tradesy.load_feature()[source]

Load the item visual feature

Returns:data – Item-feature dictionary. Each feature vector is a Numpy array of size 4096.
Return type:dict

Amazon Office

@author: Aghiles Salah <asalah@smu.edu.sg>

This data is built based on the Amazon datasets provided by Julian McAuley at: http://jmcauley.ucsd.edu/data/amazon/

cornac.datasets.amazon_office.load_context(data_format='UIR')[source]

Load the item-item interactions

Parameters:data_format (str, default: 'UIR') – Data format to be returned.
Returns:data – Data in the form of a list of tuples depending on the specified data format.
Return type:array-like
cornac.datasets.amazon_office.load_rating(data_format='UIR')[source]

Load the user-item ratings

Parameters:data_format (str, default: 'UIR') – Data format to be returned.
Returns:data – Data in the form of a list of tuples depending on the specified data format.
Return type:array-like