Built-in datasets¶

MovieLens¶

@author: Quoc-Tuan Truong <tuantq.vnu@gmail.com>

MovieLens: https://grouplens.org/datasets/movielens/

cornac.datasets.movielens.load_100k(fmt='UIR')[source]¶

Load the MovieLens 100K dataset

Parameters:	fmt (str, default: 'UIR') – Data format to be returned.
Returns:	data – Data in the form of a list of tuples depending on the given data format.
Return type:	array-like

cornac.datasets.movielens.load_1m(fmt='UIR')[source]¶

Load the MovieLens 1M dataset

Parameters:	fmt (str, default: 'UIR') – Data format to be returned.
Returns:	data – Data in the form of a list of tuples depending on the given data format.
Return type:	array-like

cornac.datasets.movielens.load_plot()[source]¶

Load the plots of movies provided @ http://dm.postech.ac.kr/~cartopy/ConvMF/

Returns:	movie_plots – A dictionary with keys are movie ids and values are text plots.
Return type:	Dict

Netflix¶

@author: Quoc-Tuan Truong <tuantq.vnu@gmail.com>

Data: https://www.kaggle.com/netflix-inc/netflix-prize-data/

cornac.datasets.netflix.load_data(fmt='UIR')[source]¶

Load the Netflix entire dataset - Number of ratings: 100,480,507 - Number of users: 480,189 - Number of items: 17,770

Parameters:	fmt (str, default: 'UIR') – Data format to be returned.
Returns:	data – Data in the form of a list of tuples depending on the given data format.
Return type:	array-like

cornac.datasets.netflix.load_data_small(fmt='UIR')[source]¶

Load a small subset of the Netflix dataset. We draw this subsample such that every user has at least 10 items and each item has at least 10 users. - Number of ratings: 607,803 - Number of users: 10,000 - Number of items: 5,000

Parameters:	fmt (str, default: 'UIR') – Data format to be returned.
Returns:	data – Data in the form of a list of tuples depending on the given data format.
Return type:	array-like

Tradesy¶

@author: Quoc-Tuan Truong <tuantq.vnu@gmail.com>

Original data: http://jmcauley.ucsd.edu/data/tradesy/ This data is used in the VBPR paper. After cleaning the data, we have: - Number of feedback: 394,421 (410,186 is reported but there are duplicates) - Number of users: 19,243 (19,823 is reported due to duplicates) - Number of items: 165,906 (166,521 is reported due to duplicates)

cornac.datasets.tradesy.load_data()[source]¶

Load the feedback observations

Returns:	data – Data in the form of a list of tuples (user, item, 1).
Return type:	array-like

cornac.datasets.tradesy.load_feature()[source]¶

Load the item visual feature

Returns:	data – Item-feature dictionary. Each feature vector is a Numpy array of size 4096.
Return type:	dict

Amazon Office¶

@author: Aghiles Salah <asalah@smu.edu.sg>

This data is built based on the Amazon datasets provided by Julian McAuley at: http://jmcauley.ucsd.edu/data/amazon/

cornac.datasets.amazon_office.load_context(data_format='UIR')[source]¶

Load the item-item interactions

Parameters:	data_format (str, default: 'UIR') – Data format to be returned.
Returns:	data – Data in the form of a list of tuples depending on the specified data format.
Return type:	array-like

cornac.datasets.amazon_office.load_rating(data_format='UIR')[source]¶

Load the user-item ratings

Parameters:	data_format (str, default: 'UIR') – Data format to be returned.
Returns:	data – Data in the form of a list of tuples depending on the specified data format.
Return type:	array-like