Built-in datasets¶
MovieLens¶
@author: Quoc-Tuan Truong <tuantq.vnu@gmail.com>
MovieLens: https://grouplens.org/datasets/movielens/
-
cornac.datasets.movielens.
load_100k
(fmt='UIR')[source]¶ Load the MovieLens 100K dataset
Parameters: fmt (str, default: 'UIR') – Data format to be returned. Returns: data – Data in the form of a list of tuples depending on the given data format. Return type: array-like
-
cornac.datasets.movielens.
load_1m
(fmt='UIR')[source]¶ Load the MovieLens 1M dataset
Parameters: fmt (str, default: 'UIR') – Data format to be returned. Returns: data – Data in the form of a list of tuples depending on the given data format. Return type: array-like
-
cornac.datasets.movielens.
load_plot
()[source]¶ Load the plots of movies provided @ http://dm.postech.ac.kr/~cartopy/ConvMF/
Returns: movie_plots – A dictionary with keys are movie ids and values are text plots. Return type: Dict
Netflix¶
@author: Quoc-Tuan Truong <tuantq.vnu@gmail.com>
Data: https://www.kaggle.com/netflix-inc/netflix-prize-data/
-
cornac.datasets.netflix.
load_data
(fmt='UIR')[source]¶ Load the Netflix entire dataset - Number of ratings: 100,480,507 - Number of users: 480,189 - Number of items: 17,770
Parameters: fmt (str, default: 'UIR') – Data format to be returned. Returns: data – Data in the form of a list of tuples depending on the given data format. Return type: array-like
-
cornac.datasets.netflix.
load_data_small
(fmt='UIR')[source]¶ Load a small subset of the Netflix dataset. We draw this subsample such that every user has at least 10 items and each item has at least 10 users. - Number of ratings: 607,803 - Number of users: 10,000 - Number of items: 5,000
Parameters: fmt (str, default: 'UIR') – Data format to be returned. Returns: data – Data in the form of a list of tuples depending on the given data format. Return type: array-like
Tradesy¶
@author: Quoc-Tuan Truong <tuantq.vnu@gmail.com>
Original data: http://jmcauley.ucsd.edu/data/tradesy/ This data is used in the VBPR paper. After cleaning the data, we have: - Number of feedback: 394,421 (410,186 is reported but there are duplicates) - Number of users: 19,243 (19,823 is reported due to duplicates) - Number of items: 165,906 (166,521 is reported due to duplicates)
Amazon Office¶
@author: Aghiles Salah <asalah@smu.edu.sg>
This data is built based on the Amazon datasets provided by Julian McAuley at: http://jmcauley.ucsd.edu/data/amazon/