Contributors:
Valentina Sintsova and Pearl Pu
Description:
We make available six emotion lexicons produced using the Dystemo framework and designed to recognize emotions within the domain of sport events. The construction process involved pseudo-labeling of unlabeled within-domain data (Olympic tweets) with emotions based on the application of initial emotion lexicons. The shared six lexicons were trained with Balanced Weighted Voting and PMI-based methods, while initialized from different initial lexicons (either GALC-R, OlympLex-1.1, or PMI-Hash). They were trained with the optimized parameters for each case.
More details on the construction of those lexicons can be found in:
Valentina Sintsova and Pearl Pu. Dystemo: Distant Supervision Method for Multi-Category Emotion Recognition in Tweets. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1):Article No.13, 2016
Contributors:
Valentina Sintsova and Pearl Pu
Description:
This emotion lexicon was designed to recognize emotions in the domain sport events. More specifically, it was constructed via human computation using the Twitter data collected from the Olympic Games 2012 in Gymnastics. The used tweets contained the hashtag #gymnastics. The labeled tweets are provided in section Sports-Related Emotion Corpus (SREC) below.
More details on the construction process of the lexicon can be found in:
Valentina Sintsova, Claudiu Musat, and Pearl Pu. Fine-Grained Emotion Recognition in Olympic Tweets Based on Human Computation. In Proceedings of the NAACL/HLT Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), ACL, 2013.
We distribute two version of this lexicon:
– Version 1.0, the original one, created as explained in the WASSA paper;
– Version 1.1, where we removed ~100 terms that are not seen as indicative of specific emotions and can often be factual. ‘Event’, ‘women’, and ‘today’ are examples of removed terms. This was done by manual exploration of the most occurred terms in our data (i.e. not all terms in the lexicon were checked).
In both versions, the lexicon terms are n-grams up to 5 consecutive words. Each term has an assigned emotion distribution in terms of GEW 2.0 category set.
Contributors:
Valentina Sintsova and Pearl Pu
Description:
We generated this lexicon by extracting the PMI-scores of the association of terms to each emotion over the pseudo-labeled dataset of tweets with emotional hashtags. This PMI-based method of training emotion lexicons was described in:
Saif M. Mohammad. #Emotional Tweets. In Proceedings of the First Joint Conference on Lexical and Computational Semantics and the Sixth International Workshop on Semantic Evaluation, 246–55, 2012.
We collected tweets with explicit emotional hashtags corresponding to the used emotion categories. Those tweets are considered to be pseudo-labeled. Out of all those tweets we randomly selected 500,000 tweets that were non-retweets and non-duplicates, contained at least 3 words, and had only one of the considered hashtags. Those tweets were used for learning PMI-Hash lexicon, where we assigned per-emotion PMI-based Strength-of-Association scores to each unigram and bigrams appearing at least 5 times.
Contributors:
Source: Swiss Center for Affective Sciences
Description:
This is the original Geneva Affect Label Coder (GALC)-lexicon of explicit emotional stems provided with the emotion categories from Geneva Emotion Wheel (GEW). This lexicon contains stemmed terms, allowing for any continuation, such as happ* for Happiness.
Contributors:
Valentina Sintsova and Pearl Pu
Description:
The original GALC lexicon contains stemmed terms, allowing for any continuation, such as happ* for Happiness. We produced a revised lexicon GALC-R with manually validated, instantiated words corresponding to the used emotion categories from GEW, version 2.0. For that, we first extracted the terms corresponding to the used emotion categories. Second, we instantiated the remaining stems on the random tweets, i.e. we found the exact words that matched the stems, along with their frequency. And finally, we manually reviewed the most frequent words to validate their correspondence to the associated emotion. This resulted in 1026 terms, 52.9 on average per emotion category.