Data

Contributors:

Onur Yürüten and Pearl Pu

Description:

This dataset was curated to discover common physical activity routines of people and analyse the effects of social interventions on the behavior patterns. More specifically, it was curated via a longitudinal user study that involved a wearable sensor (Fitbit) and our custom mobile application called HealthyTogether. The dataset contains calorie expenditure and steps each participant to the longitudinal study.

More details on the curated data can be found in:

  • Onur Yürüten and Pearl Pu. Factoring the Habits: Comparing Methods for Discovering Behavior Patterns from Large Scale Activity Datasets. International Conference on Big Data Analytics, Data Mining and Computational Intelligence (BIGDACI). Part of Multi Conference on Computer Science and Information Systems (MCCSIS) (forthcoming) 2016
  • Onur Yürüten, Jiyong Zhang, and Pearl Pu. Decomposing Activities of Daily Living to Discover Routine Clusters. In the 28th AAAI Conference on Artificial Intelligence (AAAI-14), Quebec City, Canada, July 27-31, 2014

We distribute two version of this dataset: 
– HT-48, the original one, created as explained in the AAAI paper; 
– HT-83, where we expanded the dataset to 83 users

Contributors:

Onur Yürüten and Pearl Pu

Description:

This dataset contains the daily steps counts of 1000 participants of a social exercising campaign, who wear different sensors to measure their level of activeness. The dataset also contains the gender, height, weight, exercise group id, company id, and age (all anonymized). 

Contributors:

Onur Yürüten and Pearl Pu

Description:

This dataset was curated from Twitter using keywords we have obtained from the emotion lexicons curated in an earlier study. We distribute this dataset in Parquet format (75.5 MB, approximately 41000 tweets)

Contributors:

Onur Yürüten and Pearl Pu

Description:

This dataset was curated from Twitter using the popular emojis used in the social media website.

We distribute this dataset in Parquet format (3.13 GB, approximately 4M tweets)

Contributors:

Onur Yürüten and Pearl Pu

Description:

This dataset contains daily, self-reported number of snacks of a set of people. During the data collection period, these people received messages to motivate them to cut their unhealthy snacks.

Contributors:

Valentina Sintsova and Pearl Pu

Description:

Tweets with explicit emotional hashtags

We collected 17.6 million tweets with explicit emotional hashtags corresponding to the GEW emotion categories, by using Twitter Streaming API between 27th February and 26th May of 2014. Among them, we extracted 1,729,980 tweets that had those hashtags at the end of the text, were not repeated, were no retweets, did not contain URLs, and were assigned to only one emotion category. Using 500,000 of these pseudo-labeled tweets, we built the PMI-Hash emotion lexicon, as described above.

Unfortunately, by Twitter terms of service, we cannot share this dataset directly. Thus, we share those tweets via their identifiers.

Contributors:

Valentina Sintsova and Pearl Pu

Description:

This dataset contains a set of sports-related tweets manually annotated with emotion categories. The annotation was performed by workers from the Amazon Mechanical Turk platform. This dataset was the basis for the OlympLex emotion lexicon.

More details on the collection and annotation process can be found in: 
Valentina Sintsova, Claudiu Musat, and Pearl Pu. Fine-Grained Emotion Recognition in Olympic Tweets Based on Human Computation. In Proceedings of the NAACL/HLT Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), ACL, 2013.

Unfortunately, by Twitter terms of service, we cannot share this dataset directly. Thus, we share those tweets via their identifiers. In the current distribution (version 1.2), we provide the annotation for 1265 tweets for which we have the Twitter identifiers, instead of all 1957 tweets used in the paper.