Skip to content

Methodology

By DANIIL GORBATENKO

The TestaCoin tool uses machine learning to estimate the probability that the project you are interested in may be a scam. Read the description below to learn more.

Motivation for using machine learning

The idea to create the TestaCoin scam detection tool using machine learning (ML) occurred to us after observing the existing crypto scam monitoring sites. While doing that we noticed that they were mostly either just lists of short announcements of suspicious projects without any detailed explanations or arbitrarily constructed rating scales.

The problem with the former approach is that it is easy to find faults with almost any crypto project and make them seem so egregious that even a pretty legitimate project may suddenly appear a total scam. Meanwhile, the key issue with arbitrary rating scales is that they attempt to quantify hardly quantifiable or loosely formulated features. There is also no clear-cut dependence between a ranking that a project has and its likelihood of being a scam.Ā 

On the other hand, using ML forces you to only focus on a fixed number of quantifiable parameters or yes-or-no questions about a particular project. It also allows you to directly estimate the probability that the project is a scam given the values of its parameters.



The methodology

For reasons of commercial secrecy and the need to prevent scammers from easily adjusting the appearance of their projects to our tool, we are not going to describe the model we use in great detail.

What we can disclose is the following:Ā 

  1. We first gathered the data on a number of definite past scams and clearly legitimate projectsĀ 
  2. We labelled them as such, which means our ML approach is supervisedĀ 
  3. We then randomly separated the labelled examples into the training set (the examples that the program sees before arriving at a model) and the test set (the examples which the program does not see)
  4. Then we trained the model to distinguish scams from non-scams on the basis of the training set
  5. We checked various candidate modelsā€™ performance on the test set and chose the model that performed the best
  6. We derived from the model a formula for calculating for each new project the probability of it being a scam
  7. We developed a questionnaire and a back-end that uses that formula to automatically calculate the probability that the project tested is a scam and sends an email with the estimate and explanations to the client by email.Ā 

Ā 

Examples

To convince you that our tool really works, in the table below we give examples of the probabilities our model assigns to 2 known scams and 2 clearly legitimate projects. You can check the veracity of the calculations by retesting these projects by yourselves.

Ā 

Scams

Probability

Legitimate projects

Probability

Sparkster

0.74

Litecoin

0.17

Pluscoin

0.9

iExec

0.004