2. Machine Learning based anti-fraud solutions — machine learning ML algorithms automate the investigation of more data than is possible for a human, correlating thousands types of data points like time to install, IP address, domain, top cities, countries, devices, time to 1st transaction after sign-up, abnormal purchase patterns and so on. ML-based solutions are able to learn from the data just like a human learns from experience. As you get deeper into the topic of machine learning solutions, you can discover that they might be using different algorithms, which determine the way how they work and detect fraud installs. Broadly, machine learning algorithms can be classified into three types based on the way they “learn” about data to make decisions and predictions to fight mobile ad fraud: supervised (SML), semi-supervised (SSML) and unsupervised learning (UML).
Supervised learning named so because it needs to be given a set of predictors (independent variables). In other words, it has a guide in the person of data scientist, who teach the algorithm how it should operate and what conclusions it should output. Based on that, SML is showing good results at capturing already known types of fraud. But when it comes to new evolving types of fraud, the effectiveness of SML greatly decreases. On the other hand, unsupervised machine learning is already able to identify complex processes and patterns through clustering and segmenting input data in different groups for specific intervention without any target variable to predict or estimate. UML algorithms are self-learning. Such algorithm will be more effective for detecting new fraudulent activity, which is not already learned from existing dataset. However, despite the fact that UML gives the most accurate result, this result is very difficult to interpret to indicate the causes of fraud. For this issue perfectly suited Semi-Supervised ML algorithm that uses a small amount of labeled data with a large amount of unlabeled data. Using SSML in addition to UML can considerable improve the accuracy and completeness of the fraud identification and allows you to easily interpret the fraud reasons in a human-readable form. As a result, a well-rounded solution uses these both types of algorithms to build predictive data models that help companies to prevent any fraud threat.
In addition to differences between Rules-based and ML-based models we should say that rules-based systems, whose model is based on metrics, flag up to 50 percent of users as suspicious and requires additional review by a human. At this point, might be hard for user acquisition managers to distinguish all fraudulent traffic from non-fraudulent. Especially, considering the fact that from affiliate networks fraudsters actively started to seep into “trust”. The share of the fraud, which we can explore now in such sources varies from 2 to 14% averagely, but just a few years ago such numbers were impossible. On the other hand, machine learning solutions are able to give a univocal result whether it’s fraud or not a fraud. Below is the simplified working model of Scalarrs’ anti fraud solution: