情感分析观点挖掘英文文献和中文翻译(3)

Mining output: Given an evaluative document d, the mining result is a set of quadruples. Each quadruple is denoted by (H, O, f, SO), where H is the opinion holder, O is the object, f is a feature of the object and SO is the semantic orientation of the opinion expressed on feature f in a sentence of d. Neutral opinions are ignored in the output as they are not usually useful.
Given a collection of evaluative documents D containing opinions on an object, three main technical problems can be identified (clearly there are more):
Problem 1: Extracting object features that have been commented on in each document d ∈ D.
Problem 2: Determining whether the opinions on the features are positive, negative or neutral.
Problem 3: Grouping synonyms of features (as different opinion holders may use different words or
phrase to express the same feature).
Opinion Summary: There are many ways to use the mining results. One simple way is to produce a feature-based summary of opinions on the object [6]. An example is used to illustrate what that means.
Fig. 1 summarizes the opinions in a set of reviews of a particular digital camera, digital_camera_1. The opinion holders are omitted. In the figure, “CAMERA” represents the camera itself (the root node of the object hierarchy). 125 reviews expressed positive opinions on the camera and 7 reviews expressed negative opinions on the camera. “picture quality” and “size” are two product features. 123 reviews expressed positive opinions on the picture quality, and only 6 reviews expressed negative opinions. The <inpidual review sentences> points to the specific sentences and/or the whole reviews that give the positive or negative comments about the feature. With such a summary, the user can easily see how existing customers feel about the digital camera. If he/she is very interested in a particular feature, he/she can drill down by following the <inpidual review sentences> link to see why existing customers like it and/or dislike it.
Fig. 2. Visualization of feature-based opinion summary and comparison
The summary in Fig. 1 can be easily visualized using a bar chart [10]. Fig. 2(A) shows such a chart. In the figure, each bar above the X-axis gives the number of positive opinions on a feature (listed at the top), and the bar below the X-axis gives the number of negative opinions on the same feature. Obviously, other visualizations are also possible. For example, one may only show the percentage of positive (or negative) opinions on each feature. Comparing opinion summaries of a few competing objects is even more interesting [10]. Fig. 2(B) shows a visual comparison of consumer opinions on two competing digital cameras. One can clearly see how consumers view different features of each camera.
Sentiment Classification
Sentiment classification has been widely studied in the natural language processing (NLP) community [e.g., 2, 11, 13]. It is defined as follows: Given a set of evaluative documents D, it determines whether each document d ∈ D expresses a positive or negative opinion (or sentiment) on an object. For example, given a set of movie reviews, the system classifies them into positive reviews and negative reviews. This is clearly a classification learning problem. It is similar but also different from the classic topic-based text classification, which classifies documents into predefined topic classes, e.g., politics, sciences, and sports. In topic-based classification, topic related words are important. However, in sentiment classification, topic-related words are unimportant. Instead, opinion words that indicate positive or negative opinions are important, e.g., great, excellent, amazing, horrible, bad, worst, etc. There are many existing techniques. Most of them apply some forms of machine learning techniques for classification [e.g., 11]. Custom-designed algorithms specifically for sentiment classification also exist, which exploit opinion words and phrases together with some scoring functions [2, 13]. This classification is said to be at the document level as it treats each document as the basic information unit. Sentiment classification thus makes the following assumption: Each evaluative document (e.g., a review) focuses on a single object O and contains opinions of a single opinion holder. Since in the above opinion mining model an object O itself is also a feature (the root node of the object hierarchy), sentiment classification basically determines the semantic orientation of the opinion expressed on O in each evaluative document that satisfies the above assumption. Apart from the document-level sentiment classification, researchers have also studied classification at the sentence-level, i.e., classifying each sentence as a subjective or objective sentence and/or as expressing a positive or negative opinion [9, 14, 15]. Like the document-level classification, the sentence-level sentiment classification does not consider object features that have been commented on in a sentence. Compound sentences are also an issue. Such a sentence often express more than one opinion, e.g., “The picture quality of this camera is amazing and so is the battery life, but the viewfinder is too small”. 情感分析观点挖掘英文文献和中文翻译(3):http://www.youerw.com/fanyi/lunwen_40627.html