Development and validation of the review protocol

Need for a systematic review

There is quite a number of survey studies on opinion mining and sentiment analysis in the literature. Pang & Lee (2008) made a comprehensive contribution to opinion mining and sentiment analysis survey studies by covering applications, major tasks of opinion mining, extraction and summarization, sentiment classification and also the common challenges in this new research field. Tsytsarau & Palpanas (2012) surveyed the development of sentiment analysis and opinion mining research studies including spam detection and contradiction analysis. Their survey study provided 26 additional papers compared to Pang & Lee (2008)’s preliminary survey. The survey of Tang et al. (2009) had a narrower scope, examining the opinion mining problem only for customer reviews on the web sites that couple reviews with e-commerce like Amazon.com or from sites that specialize in collecting user reviews in a variety of areas like Rottentomates.com. Cambria et al. (2013) revealed the complexities involved in opinion mining with respect to current use along with future research directions .

Even though some surveys have reviewed the techniques and methods in opinion mining and sentiment analysis from text, no Systematic Literature Review (SLR) had yet reviewed the literature regarding mobile app store data mining, opinion aggregation and spam detection. Martin et al. (2015) provided an initial survey into the literature from 2000 to November 2015; however their survey is not a SLR and they were particularly interested in the studies that combine technical (Application Program Interface (API) usage, size, platform version, etc.) and non-technical attributes (category, rating, reviews, installs, etc.) of mobile apps.

The goal of our SLR was to methodically review and gather research results for specific research questions and to develop evidence-based guidelines for app store practitioners. We developed a set of five research questions to guide the literature review process and performed an extensive search to find publications that answer the research questions.

The objectives of this systematic literature review are to identify:
– Proposed solutions for mining online opinions in app store user reviews.
– Challenges and unsolved problems in the domain.
– Any new contributions to software requirements evolution, and
– Future research directions.

Systematic review methodology

Planning

Research questions RQ)

RQ1: Which specific data mining techniques are used for reviews on software distribution platforms?

Motivation: App stores provide a wealth of information in the form of customer reviews. Opinion mining and sentiment analysis systems have been applied to various kinds of texts including newspaper headline, novels, emails, blogs, tweets and customer reviews. Different techniques and automated systems have been proposed over the years by researchers to extract user opinions and sentiments within the text over the years. Unlike documents or long length text, mobile app store reviews have some unique characteristics such as being short, informal and sometimes even ungrammatical consisting of incomplete sentences, elongations and abbreviations that make them difficult to handle. This RQ1 question targets approaches and techniques proposed particularly for app store user review mining and opinion extraction problems.

RQ2: How do the studies address the ‘domain dependency’ challenge for app store reviews? Motivation: Vocabulary varies within different contexts and domains, and the same term might mean different opinions. An opinion classifier trained using opinionated words from one domain might perform poorly when it is applied to another domain: Not only the words and the phrases, but also that the language structure could differ from one domain to another. Hence, the language structure and linguistic context of opinion and sentiments terms play a key role in opinion mining: domain adaptation methods are also required to be considered while dealing with app store user reviews. This RQ2 question aims to identify how mobile app store opinion mining studies tackle the domain dependency problem.

RQ3: What criteria make a review useful? Motivation: Quality varies from review to review and low quality reviews might not convey any signal to be used for information extraction. To tackle spam identification problem, it is critical to have a mechanism or criteria to assess the quality of reviews and to filter out lowquality or noisy reviews. While review helpfulness is assessed manually by users in mobile app stores, there also exists some automated systems to assess and rank reviews in accordance with their usefulness or helpfulness. This RQ3 question aims to identify the methods or criteria used to differentiate useful app store reviews from the others. Furthermore, this research question also searches for automated systems that evaluate review usefulness and helpfulness.

RQ4: How can spam reviews be differentiated from legitimate reviews? Motivation: With the number of online reviews increasing, the number of fraudsters who produce deceptive or untruthful reviews increases as well. It is an essential task to identify and filter out the opinion spam. Different studies and techniques have been proposed for the spam review detection problem. The opinion spam identification task has great impact on industrial and academia communities. Our objective with this RQ4 research question is to investigate spam review and ranking fraud detection methods and techniques for online stores and mobile app stores.

RQ5: Does the study extract targeted/desired software features from application reviews? Motivation: Apart from app’s average rating over a 5-star scale and its corresponding ranking in app store, users would like to learn about others’ experience with the app and which aspects/features they liked or disliked most. The information obtained from mobile app reviews is also valuable for developers to get users’ feedback about most liked or expected features (e.g., requirements elicitation) as well as bugs in the application (e.g., software quality and software evaluation). This RQ5 research question focuses on aspect-based opinion mining studies extracting application features and aims to identify the studies that make automated application feature extraction from user reviews

Development and validation of the review protocol

The review protocol defines the activities required to carry out the literature review. A review protocol helps reduce researchers’ biases and defines the source selection and searching processes, the quality criteria and the information synthesis strategies. This subsection presents the details of our review protocol.

The following digital libraries were used to search for primary studies:
– Science Direct
– IEEExplore
– ACM Digital Library
– Citeseer library (citeseer.ist.psu.edu)
– Springer Link
– Google Scholar

Our research covered the period between January 2010 and November 2017. The following search query was created by augmenting the keywords with possible synonyms. While conducting the review, we examined the reference list of primary studies to determine if there were additional studies not captured by our research query.

Table des matières

INTRODUCTION
CHAPTER 1 LITERATURE REVIEW
1.1 Need for a systematic review
1.2 Systematic review methodology
1.2.1 Planning
1.2.1.1 Research questions RQ)
1.2.1.2 Development and validation of the review protocol
1.2.2 Conducting the review
1.2.2.1 Identification and selection of relevant studies
1.2.2.2 Extraction of data
1.2.2.3 Information synthesis
1.2.3 Reporting the review
1.3 Results
1.3.1 RQ1: Which specific data mining techniques are used for the reviews on software distribution platforms?
1.3.2 RQ2: How do the studies remedy the ‘domain dependency’ challenge for app store reviews?
1.3.3 RQ3: What criteria make a review useful?
1.3.4 RQ4: How could the spam reviews be differentiated from legitimate reviews?
1.3.5 RQ5: Extracted Application Features from User Reviews
1.4 Discussion
1.4.1 Principal Findings
CHAPTER 2 PROBLEM STATEMENT & RESEARCH METHODOLOGY
2.1 Problem Statement
2.1.1 Research Motivation
2.1.2 Research Objectives
2.1.3 Target audiences
2.2 Research Methodology
2.2.1 Phase 1: Development of an app store crawler
2.2.2 Phase 2: Identification of opinion spam (deceptive) reviews
2.2.2.1 Construction of a spam review dataset
2.2.2.2 Development of an opinion spam detection model
2.2.2.3 Performance evaluation of the opinion spam detection model
2.2.3 Phase 3: Automated assessment of review helpfulness
2.2.3.1 Feature Generation
2.2.3.2 Selection of the model
2.2.3.3 Model Refinement
2.2.4 Phase 4: Extraction of Mobile App Features
2.2.4.1 Collection of a Dataset of Reviews
2.2.4.2 Filtering Opinion Spam and Non-Helpful Reviews
2.2.4.3 Development of Aspect Extraction Models
2.2.4.4 Performance Evaluation
CHAPTER 3 OPINION SPAM DETECTION
3.1 Introduction
3.2 Background Information
3.2.1 Spam Review Detection
3.2.2 Convolutional Neural Networks (CNNs)
3.2.3 One-Layer CNN for Text Classification Task
3.2.4 Pre-trained Word Embeddings
3.2.4.1 Word2Vec Model
3.2.4.2 GloVe Model
3.3 Methodology
3.3.1 Automated Review Generation
3.3.2 Two-layer CNN Architecture
3.4 Implementation
3.4.1 Hyper-parameters
3.4.2 Training and Evaluation Corpora
3.4.3 Performance Measures
3.5 Experiments and Results
3.6 Conclusion
CHAPTER 4 AUTOMATED ASSESSMENT OF REVIEW HELPFULNESS
4.1 Introduction
4.2 Methodology
4.3 Pre-Processing
4.3.1 Dataset Creation
4.3.2 Dataset Cleaning
4.3.3 Feature Generation
4.4 Selection of Appropriate Classifier
4.4.1 Logistic Regression
4.4.2 AdaBoosting
4.4.3 Gaussian Naive Bayes
4.4.4 Support Vector Machine (SVM)
4.4.5 Performance Measure
4.4.6 Evaluation Results
4.5 Experiments
4.5.1 Adding Meta-data Features
4.5.2 Grid Search
4.5.3 Results
4.6 Conclusion
CHAPTER 5 EXTRACTION OF MOBILE APP FEATURES
5.1 Introduction
5.2 Definitions and Related Work
5.2.1 Definitions
5.2.2 Related Work
5.3 Methodology
5.3.1 Collection of Review Dataset
5.3.2 Filtering Opinion Spam and Non-Helpful Reviews
5.3.3 Annotation of Mobile App Aspects
5.3.3.1 Annotation Task
5.3.3.2 App Review Annotation Guideline
5.3.4 Aspect Extraction Models
5.3.4.1 Bi-directional LSTM+CRF Model
5.3.4.2 Deep CNN+CRF Model
5.4 Implementation
5.4.1 Hyper-parameters
5.4.1.1 Bi-directional LSTM+CRF Model
5.4.1.2 Deep CNN+CRF Model
5.4.2 Training and Evaluation Corpora
5.5 Experiments and Results
5.6 Conclusion
CHAPTER 6 CONTRIBUTIONS AND FUTURE WORK
6.1 Contributions
6.2 Future Work
CONCLUSION