Apps downloaded by users are mostly based on the psyche of downloading
well-rounded and efficiently working apps. These performance parameters are
assessed by the general users by rating these apps on a scale of 5. The top
rated apps are the first to appear while searching and sorting for the desired
apps. However, these ratings are being tweaked and fraudulently misrepresented
to appear on the popularity lists to boost downloads. There is a collective nod
among the users to keep these dubious deeds of misrepresentation at check. This
fraudulent representation of mobile app ratings will be discerned in this paper
by detecting the leading sessions of the App at which the fraudulent ratings
are depicted. Secondly, rating, ranking and review based evidences are mined by
modelling Apps’ behaviours of the same using statistical hypothesis tests.
Furthermore, all the evidences for the detection of the fraud are integrated by
optimization based aggregation method. The efficacy and the scalability of the
detection algorithm and the proposed system are validates by implementing the
same on real-life data of the Apps collected from iOS App Store.
With the advent of the wide spread practice of cellular mobiles with
internet connectivity that replaced the public switch telephone network (PSTN),
the face of the functioning of humans across the globe has taken giant leaps
towards advancements in the fields of communication and connectivity. Mobile
applications have become the lifelines of these very smart phones with internet
access through mobile broadband. In 2008, the App Store released by Apple gave
a drastic turn to how smartphones are used altogether with the intent of
well-packages, downloadable apps on phones. Since then, the mobile application
market has exponentially multiplied faster than a beanstalk. With projected
gross annual revenue to surpass $189 billion by the year 2020, the population
of web developers has seen a huge rise in numbers. With so much collective
enthusiasm in this field, the number of mobile applications in the play store
has shot up with fierce competitions among the app developers for higher number
of downloads. Like in any field, the bug of fraudulent projections of performances
has bitten this domain as well with fake representation of top rankings of Apps
by some App developers which dupes users into downloading their Apps. The fake
top leader board positions are achieved by paying up for a bot farm or
human/internet water armies that are hired to rate, rank and provide the said
App with a better review. Quite significantly, with 6.2 billion app downloads
in India in 2016, about 16.2% of the downloads showed some kind of fraud with
India ranking 10th highest ranking country for app install fraud
rate by Tune’s Accounting. Thus, this must be controlled to provide the users
with an authentic list of Apps for them to choose from and give a fair chance
to the Apps that genuinely appear on top of the App leader boards.
To curtail this fraud, the proposed system detects ranking frauds that
occur majorly during the leading sessions of the Apps and not throughout the
lifecycle of the Apps. Leading sessions of the App lifecycle have the highest
probability of a red flag being noticed in the ratings. Thus these leading
sessions must be detected in the first module. Once, the leading sessions are
tracked, the rating based evidences, ranking based evidences and the review
based evidences are extracted from the modelling Apps’ behaviours of rating,
ranking and reviews by making use of statistics hypothesis tests. These
evidences will be aggregated using aggregation methods based on optimization.
If the said evidences differ vastly from the historical performances of Apps in
terms of ratings, rankings and reviews, then there is an anomaly that must be
addressed for course correction in the App rankings.
research papers were referred in order to make this paper a well-rounded paper
for further reference in this field of assessment.
are majorly three categories into which the research work can be grouped into.
Firstly, web ranking spam detection detects
any incidence of web spamming. Web spamming is the procedure of raising particular
web pages by tweaking page ranking algorithms of search engines. A, Ntoulas
presented a range of heuristic methods to detect factors affecting spam on web
based on content to find heuristic methods. Using spamicity, Zhou et al.
proposed online link spam and spam detection methods.
online review spam detection: spam
detection of the online reviews. B.
Spirin et al. did a survey that introduced many algorithms and principles in
literation for Web Spam Detection.
Mobile App Recommendation: it lays emphasis on the algorithms and factors
affecting them in recommending mobile application to users in ways of using
flexible generative model for preference aggregation authored by M. N.
Volkovs and R. Zemel has expressed a model that proposes a malleable model over
comparisons where preferences to items could be conveyed in different forms
that otherwise make the aggregation
problem hard. Several experiments done on high yardstick datasets state higher
performance compared to existent methods.
rank Aggregation with domain-specific expertise proposed by A. KKlemetiev, D.
Roth, K. Small and I. Titov have suggested a framework for learning to
aggregate rankings with domain specific expertise sans supervision by applying
it to the sceneries of combining full rankings and aggregating top-k lists, indicating
major progress over domain-agnostic standard in these cases.
the sources of literature based on which the proposed system was articulated
fraud ranking for Apps is a subject still under study. We propose a system to
fill the void a little in detecting this fraud. There are a certain challenges
that we face on doing so that are listed below.
challenge, the ranking fraud does not occur all the time in the lifecycle of an
App. Hence, we need to detect the time when it happens leading to identifying
local anomaly instead of global anomaly.
Second challenge is to possess scalability detect
ranking fraud certainly without the use of any basis information because manual
labelling of ranking fraud for each and every App is very difficult.
it is hard to catch and verify the evidences associated with ranking fraud due
to the volatile nature of rankings in the charts, which influences us to
discover contained fraud patterns of mobile Apps as evidences.
of the Proposed System
We have proposed a simple algorithm with good efficacy to detect
leading sessions of each App based upon its’ historical records. It is
discovered that fraudulent Apps have their ratings spiked during the leading
sessions by analysing their ranking behaviours. By examining the ranking
behaviours of Apps, we notice that the fraudulent Apps habitually have
different ranking patterns in each leading session likened with normal
Furthermore, grounded on Apps’ past records of rating and review, two
kinds of fraud evidences are gathered. Any anomaly detected will flag the red
flags for fraud detection. The time period of popularity for an App is
reflected by its leading sessions. Thus, ranking fraud scenarios can be founded
by identifying susceptible leading sessions. Also, the major work here involves
extraction of leading sessions from the Apps’ historical records of ranking.
The two main segments of fraudulent ranking detection are as follows:
Detecting mobile apps’ leading sessions.
Detecting evidences that support ranking fraud
To have a brief look at these aspects,
1) Detecting mobile apps’ leading sessions.
This in turn is divided into two segments. Firstly, the leading events
are extracted from the Apps past records of ranking. Secondly, leading sessions
are erected by merging the leading events together. An algorithm identifies
leading events and sessions by skimming the historical records of the App from
pseudo code for mining sessions of a certain mobile app.
2) Detecting evidences that support ranking fraud detection.
There are three types of evidences that support the detection of
a)Evidence based on Ranking: The leading sessions comprise of the
leading events which can be analysed of their general behaviours for an anomaly
with the app’ past records of the same. It is observed that a certain pattern
of ranking is always fulfilled by ranking behaviour of the app in case of a
b) Evidence based on Rating: The previous evidence is helpful but not adequate
for conclusion of results. To restrict the problem of “restrict time depletion”,
evidence accumulation is also based on historical records of rating for mobile
apps. Since the rating is done after an app is installed by the user, the higher
the rating the higher its position in the leader board which would result in
further downloads by attracting new users. Naturally, rating fraud occurs
during the leading sessions in the case of an anomaly which can be used to identify
evidence for fraudulent rating of the mobile apps.
3) Evidence based on Review: Review contain textual comments on the
app and its performance. These reviews are given by current users of the app
who have already installed the said app. This can be termed as the hardest
segment of evidence that can be gathered. These are compared again with the apps’
historical record of reviews and if there is an unusual spike of good reviews
during the leading sessions, evidence is said to be gathered.
The above mentioned three evidences are merged using evidence aggregation
technique that is unsupervised. This helps test the integrity of mobile Apps’
leading sessions. The statistical hypotheses tests models Apps’ ranking, rating
and review behaviors to extract all the evidences. This outline is scalable
which can be drawn-out with other area spawned evidences for detecting ranking
fraud. At last, the proposed system will be tested with real-world data of Apps
composed from Apple’s App store for a time extent of more than two years.
This paper reviews various
existing methods used for web spam detection, which is related to the ranking
fraud for mobile Apps. Also, we have seen references for online review spam
detection and mobile App recommendation. By mining the leading sessions of
mobile Apps, we aim to locate the ranking fraud. The leading sessions works for
detecting the local anomaly of App rankings. The system aims to detect the
ranking frauds based on three types of evidences, such as ranking based evidences,
rating based evidences and review based evidences. Furthermore, an optimization
based aggregation method chains all the three suggestions to detect the fraud.