Joint sentiment-topic (JST) model, Latent Dirichlet Allocation (LDA) , semi-supervised approach, sentiment analysis.

Text-only Preview

International Journal of Computer Applications Technology and Research
Volume 3- Issue 7, 468 - 472, 2014
Joint Sentiment-Topic Detection from Text Document
Gauri Nivrutti Tuplondhe
Department of Computer Engineering,
KKWIEER, University of Pune,
Nashik- 422003, India
Abstract: Automated tools are used to detect subjective information like attitudes, opinions and feelings. Such process is called as
sentiment analysis. The Joint Sentiment-Detection (JST) model is the probabilistic model which is extension of Latent Dirichlet
Allocation (LDA) model that detects sentiment and topic simultaneously from text. Supervised approaches to sentiment classification
often fail to produce satisfactory results when applied to other domains while the JST model is weakly supervised in nature where
supervision only comes from domain independent sentiment lexicon. Thus, makes JST model portable to other domains. The proposed
system incorporates a small amount of domain independent prior knowledge which is sentiment lexicon to further improve the
sentiment classification accuracy. It also carry out experiments and evaluates the model performance on different datasets.
Keywords: Joint sentiment-topic (JST) model, Latent Dirichlet Allocation (LDA) , semi-supervised approach, sentiment analysis.
by adding sentiment layer. It is different from other sentiment-
topic model in that: 1) It is weakly supervised. 2) It can detect
topics and sentiment simultaneously. Unlike supervised
ompanies and consumers have the greater impact of opinion
approaches to sentiment classification, which often fail to
reach resources like online reviews and social networks
produce satisfactory performance when applied to other
compared to traditional media. The demand of gleaning
domains, the weakly-supervised nature of JST makes it highly
insights into such vast amount of user-generated data, work on
portable to other domains, as will be verified by the
developing new algorithms for automated sentiment analysis
experimental results on datasets from different domains.
has bloomed in the past few years.
Sentiment classification is the major task of sentiment
analysis. A large portion of work concentrates on classifying a
2.1 Sentiment Classification
sentiment-bearing document according to its sentiment
Standard machine learning techniques such as support
polarity, i.e. either positive or negative as a binary
vector machines (SVMs) and Naive Bayes (NB) classifiers are
classification like [1], [2], [3], [9]. Most of this work rely on
labeled corpora where documents are labeled as positive,
approaches are corpus-based, in which a domain-specific
negative prior to the training. In real world applications such
classifier is trained with labeled training data. The work in [3]
labeled corpora may not be easily available. Also, sentiment
employed machine learning techniques including SVMs, NB
classification models trained in one domain might not work
and Maximum Entropy to determine whether the sentiment
expressed in a movie review was thumbs up`` or thumbs
topic/feature detection and sentiment classification are mostly
down. In subsequent work [4], they further improved
performed separately. But sentiments are context dependent,
sentiment classification accuracy on the movie review dataset
so that sentiment expressions can be quite different for
using a cascaded approach. The work [2], [3], [4] only focus
different topics or domains. For instance, when appearing
on sentiment classification in one domain while the work in
under different topics within movie review data, the adjective
complicated may have negative sentiment orientation as
complicated role in one topic, and positive orientation as
customizing sentiment classifiers to new domains [5] like
complicated plot in another topic. This suggests that
small number of labeled examples can be used as training set
modeling sentiment and topic simultaneously may help find
or it can combine labeled data with large amount of unlabeled
better feature representations for sentiment classification.
data from target domain. All the above work has some similar
Therefore, these problems motivated the need of using weakly
limitations: 1) the mixture of topics is ignored while doing
sentiment classification, 2) They consider supervised learning
independent sentiment classification.
approach by using labeled corpora for training which is not
suitable for cross-domain work.
Sentiment and topic of sentiment are simultaneously
detected from text at document level by Joint Sentiment-Topic
2.2 Sentiment-Topic Models
(JST) which is weakly supervised in nature. A mechanism is
The work related to jointly determine sentiment and topic
introduced to incorporate prior information about the
simultaneously from text is relatively sparse. Most closely
sentiment lexicons into model learning by modifying the
related to our work is [7], [8], [9]. ]. Topic-sentiment model
Dirichlet priors of the topic-word distributions. . This model
(TSM) [7] models the mixture of topics and sentiments
extends the topic model latent dirichlet allocation (LDA) [6]
simultaneously from web-blogs. TSM is based on the

International Journal of Computer Applications Technology and Research
Volume 3- Issue 7, 468 - 472, 2014
Figure 1. Block Diagram
probabilistic latent semantic indexing (pLSI). It finds the
associated with both sentiment labels and topics. The
latent topics in a Weblog collection, sentiments and the
graphical model of JST is given in figure 1.
subtopics in the results of
query. If the word is common
English then it samples a word from background component
a corpus with a collection of
D documents
model. Else, a word is sampled from a topical model or
denoted by C = {d
sentiment model. Thus, the word generation for sentiment is
1,d2, d3...,dD}, each document in the corpus
is a sequence of N
independent of topic. While in JST, a word is drawn from the
d words denoted by d = (w1
nd), and
joint distribution of sentiment and topic label. To obtain the
each word in the document is an item from a vocabulary index
sentiment coverage, TSM performs postprocessing. JST gives
with V distinct terms denoted by{1,2....V}. S be the number
the document sentiment by using probability distribution of
of distinct sentiment labels, and T be the total number of
sentiment label for a given document.
topics. The procedure for generating a word wi in document d
The Multi-Grain Latent Dirichlet Allocation (MG-LDA)
under JST can be given as: 1) Choose a sentiment label l from
[8] is more appropriate to build topics in which a customer
the per-document sentiment distribution d. 2) Choose a
provide a rating for each aspect that is customer will annotate
topic from the topic distribution d,l, where d,l is conditioned
every sentence and phrase in a review as being relevant to
on the sampled sentiment label l. Each document is associated
some aspect. Each word is generated from either a global
with S topic distributions, each of which corresponds to a
topic or a local topic. The model uses a topic model in that it
sentiment label l with the same number of topics. Thus, JST
assigns words to a set of induced topics, each of which may
represent one particular aspect. The limitation of MG-LDA is
model can predict the sentiment associated with the extracted
that it does not considers the associations between sentiments
topics. 3) Draw a word from the per-corpus word distribution
and topics.
conditioned on both topic and sentiment label.
The MG-LDA model is extended to Multi-Aspect
Sentiment [MAS] model [9]. The model extracts the ratable
aspects of an object and cluster them into coherent topics.
The graphical model of JST approach as shown in figure 1
Then model uses various techniques to classify and aggregate
sentiment over each of these aspects. Thus limitation of MG-
can be defined as follows:
LDA is overcome by MAS. It differs from JST in that it is a
supervised model because it requires that every aspect should
1) For every l (sentiment label) {1....,S}
be rated which may not be possible in real world applications.
While JST is a weakly supervised model which only requires
- For every topic j {1....,T}, draw lj ~ Dir (l X Tlj).
minimum prior information.
2) For every document d, choose a distribution
d ~
3.1 Joint Sentiment-Topic Model
JST model is the extension of existing LDA framework
which has three hierarchical layers, where topics are
3) For every l {1....S} under document d, choose
associated with documents, and words are associated with
a distribution ~ Dir ( ).
topics. JST [10] introduces fourth layer to the LDA model
called sentiment layer in order to consider sentiment of the
4) For every word wi in document d
document. Hence, JST becomes four-layer model, where
sentiment labels are associated with documents, under which
- choose li ~ Mult (d),
topics are associated with sentiment labels and words are
- choose z ~ Mult (

International Journal of Computer Applications Technology and Research
Volume 3- Issue 7, 468 - 472, 2014
choose a word w from
lizi which is a multinomial
Samples obtained from the Gibbs sampling are used
distribution over words conditioned on both sentiment label l
and topic z
distribution which can be given as:
The hyperparameters and in JST is the number of times
topic j associated with sentiment label l is sampled from a
document and the number of times words sampled from topic
The approximate per-document topic distribution specific
to the sentiment label can be given as:
j are associated with sentiment label l, respectively. The
hyperparameter is number of times sentiment label l
sampled from a document before any word from the corpus is
observed. is per-document sentiment distribution, is per-
And the approximate per-document sentiment distribution
document sentiment label specific topic distribution, and is
can be given as
per corpus joint sentiment-topic word distribution .
3.2 Incorporating Model Priors
JST model is the extension of LDA model in which additional
dependency link of on the matrix of size S X V is used to
encode word prior sentiment information into the JST model.
3.4 Algorithm
A transformation matrix modifies the Dirichlet priors of
Algorithm : Procedure of Gibbs sampling for JST model.
size S X T X V , so that the word prior sentiment polarity can
Input: corpus, , ,
be captured. The process of incorporating prior knowledge
Output : sentiment and topic label assignment for all word
into the JST model is as follows: first, is initialized with all
tokens in the corpus.
the elements equal to 1. For every sentiment label l {1.....S}
1: Initialize S X T X V matrix , D X S X T matrix ,
and every word w {1..... V } in the corpus vocabulary, if
D X S matrix .
word w is also available in the sentiment lexicons used, the
2: for i = 1 to maximum Gibbs sampling iterations do
element lw is updated as follows:
for all documents d = [1, D] do
for all terms t = [1, Nd] do
Exclude term t associated with topic label z
and sentiment label l from variables Nd , Nd,k , Nd,k,j
Nk,j and Nk,j,I;
where S(w) is the function which returns the prior sentiment
Sample a new sentiment-topic pair l and z
label of w found in a sentiment lexicon ( neutral, positive, or
negative). Suppose, a word Bad` have polarity negative
using above equation 2;
Update variables Nd , Nd,k , Nd,k,j, Nk,j and Nk,j,i
which is from vocabulary with index i. The corresponding
using the new sentiment label l and topic label z;
row vector of is given by [1, 0, 0] which corresponds to
end for
negative, positive, neutral prior polarity. Now, for each topic
j {1,...,T}, multiply with . Here, the value of
end for
lnegji is
retained only and
for every 25 iterations do
lneuji becomes 0.
Using Maximum Likelihood Estimation
3.3 Model Inference
Update hyperparameter ;
To obtain the distributions of , , and , first estimate the
end for
for every 100 iterations do
posterior distribution over z and l, i.e., the assignment of word
Update matrices , , and with new
tokens to topics and sentiment labels for a corpus. The
Sampling results;
sampling distribution for a word given remaining topics and
end for
sentiment labels is given by, P(zt=j, lt=k| ,,). All words in
16: end for
the collection except for the word at location


in document
D are given by z-t and l-t which are vectors of assignment of
3.5 Hyperparameter Setting and
topics and sentiment labels.
Classifying Document Sentiment
The joint probability of the words, topics and sentiment
In the JST model implementation,
set the
label assignments can be given by
symmetric prior = 0:01, the symmetric prior = ( 0:05 X
L) / S, where L is the average document length, S the is total
P(w, z, l)=P(w|z, l) P(z, l) = P(w|z, l)P(z|l) P(l)
number of sentiment labels. The asymmetric prior is learned
directly from data using maximum-likelihood estimation [11]
To estimate the posterior distribution by sampling the
and updated every 25 iterations during the Gibbs sampling
variables zt and lt , the process of Gibbs sampling is used. Let,
the superscript -t denote a quantity that excludes word from tth
position. By marginalizing out random variables , and ,
the conditional posterior for variables of interest zt and lt is

International Journal of Computer Applications Technology and Research
Volume 3- Issue 7, 468 - 472, 2014
3.6 Classifying Document Sentiment
independent of the domain.
For general domain sentiment
The document sentiment is classified as the probability of a
classification, by incorporating a small amount of domain
sentiment label given a document P(l|d). Experiments only
independent prior knowledge, JST model achieves either
consider the probability of positive and negative labels for a
better or comparable performance compared to existing semi-
given document, while the neutral label probability is ignored.
supervised approaches without
using labeled documents.
A document d is classified as a positive if the probability of a
JST is flexible in the sentiment classification task.
Weakly supervised nature of JST makes it highly portable to
probability of negative sentiment label P(lneg|d), and vice
other domains. Moreover, the topics and topic sentiments
detected by JST are indeed coherent and informative.
In future, incremental learning of the JST parameters can
be done when facing with new data. Also, the modification of
4.1 Datasets
the JST model can be achieved by incorporating other
Two easily available data sets, movie review (MR) data set
supervised information into JST model learning, such as some
known topic knowledge for certain product reviews or
document labels derived automatically from the user supplied
review ratings.
ml are used in the experiments. The MR data set contains
1,000 positive and 1,000 negative movie reviews with average
of 30 sentences each document. MDS data set is crawled from
[1] C. Lin, Yulan He, R. Everson Weakly Supervised Joint which includes reviews of four different
products. Both data sets are first preprocessed in which
punctuation, non-alphabet characters, numbers and stop words
are removed. Two subjectivity lexicons, appraisal lexicon
( and
[2] P.D. Turney, Thumbs Up Or Thumbs Down?: Semantic
MPQA lexicon ( are combined
Reviews, Proc. Assoc. for Computational Linguistics (ACL
and incorporated as model prior information. Stemming is
`01), pp. 417-424, 2001.
performed on both data sets and both lexicons in the
preprocessing. The two lexicons used in work are fully
[3] B. Pang, L. Lee, and S. Vaithyanathan, Thumbs Up?:
domain independent and do not bear any supervised
information related to the MR and MDS data set.
Techniques, Proc. ACL Conf. Empirical Methods in Natural
Language Processing (EMNLP) pp. 79-86, 2002.
4.2 Performance Analysis
4.2.1 Sentiment Classification Results versus
[4] B. Pang and L. Lee, A Sentimental Education: Sentiment
Different Number of Topics
Using Subjectivity Summarization Based on
Minimum Cuts,
Proc. 42th Ann. Meeting on Assoc. for
As JST models sentiment and topic mixtures simultaneously,
Computational Linguistics (ACL), pp. 271-278, 2004.
classification and topic extraction tasks affect/benefit each
[5] A. Aue and M. Gamon, Customizing Sentiment
other and in addition, the model behave with different topic
Classifiers to New Domains: A Case Study, Proc. Recent
number settings on different data sets when prior information
is incorporated.
Advances in Natural Language Processing (RANLP), 2005.
Modeling sentiment and topics simultaneously help to
[6] D.M. Blei, A.Y. Ng, and M.I. Jordan, Latent Dirichlet
improve sentiment classification. For the cases where a single
Allocation, J. Machine Learning Research, vol. 3, pp. 993-
topic performs the best , it is observed that
the drop in
1022, 2003.
sentiment classification accuracy by additionally modeling
mixtures of topics is only marginal, but it is able to extract
[7] Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai, "Topic Sentiment
sentiment-oriented topics in addition to document-level
sentiment detection.
Mixture: Modeling Facets and Opinions in Weblogs," Proc. 16th Int'l
Conf. World Wide Web (WWW), pp. 171-180, 2007.
4.2.3 Topic Extraction
Manually examining the data reveals that the terms that seem
[8] I. Titov and R. McDonald, "Modeling Online Reviews with Multi-
not convey sentiments under the topic in fact appear in the
Grain Topic Models," Proc. 17th Int'l Conf. World Wide Web, pp.
context of expressing positive sentiments.
111-120, 2008.
[9] I. Titov and R. McDonald, "A Joint Model of Text and Aspect
Ratings for Sentiment Summarization," Proc. Assoc. Computational
JST model detects sentiment and topic simultaneously
Linguistics--Human Language Technology (ACL-HLT), pp. 308-316
from a text at document level in a weakly supervised fashion.
Only sentiment prior knowledge is incorporated which is

International Journal of Computer Applications Technology and Research
Volume 3- Issue 7, 468 - 472, 2014
[10] C. Lin and Y. He, Joint Sentiment/Topic Model for
Sentiment Analysis, Proc. 18th ACM Conf. Information and
Knowledge Management (CIKM), pp. 375-384, 2009.
[11] T. Minka, Estimating a Dirichlet Distribution, technical
report, MIT, 2003.
[12] S. Li and C. Zong, Multi-Domain Sentiment
Classification, Proc. Assoc. Computational Linguistics--
Human Language Technology (ACL-HLT), pp. 257-260,
[13] T. Hofmann, Probabilistic Latent Semantic Indexing,
Proc. 22nd Ann. Int`l ACM SIGIR Conf. Research and
Development in
Information Retrieval, pp. 50-57, 1999.