Machine learning and math can’t trump smart attackers

When you’ve been fighting black-hat hackers for decades, you learn a thing or two about them. Obviously, they are bad, and they like to play with code. But most importantly, they’re continually learning and you need to keep up if you want to protect your customers’ businesses from their sticky fingers.

Now, if we were a post-truth security vendor, we would talk a lot about how our machine learning makes us fit for the fight, or how mathematics can predict an attacker’s every move. We would also try to downplay the fact that even advanced technologies can be fooled by adversaries.

"No matter how smart a machine learning algorithm is, it has a narrow focus and learns from a specific data set."

But at ESET, we value the truth. No matter how smart a machine learning algorithm is, it has a narrow focus and learns from a specific data set. By contrast, attackers possess so-called general intelligence and are able to think outside of the box. They can learn from context and benefit from inspiration, which no machine or algorithm can predict.

Take self-driving cars as an example. These smart machines learn how to drive in an environment with road signs and pre-set rules.

But what if someone covers all the signs or manipulates them? Without such a vital component, the cars start to make wrong decisions that can end in a fatal crash, or simply immobilize the vehicle.

In cyberspace, malware writers specialize in such malicious behavior. They try to hide the true purpose of their code, by “covering” it with obfuscation or encryption. If the algorithm cannot look behind this mask, it can make a wrong decision, labeling a malicious item as clean – causing a potentially dangerous miss.

However, recognizing the mask doesn’t always reveal the code’s true nature, and without executing the sample there is no way of knowing what is under the hood. To do this ESET uses a simulated environment – known as sandboxing – deprecated by many of the post-truth vendors. They claim their technology can recognize malice simply by looking at a sample and doing the “math”.

How would that work in real life? Try and determine a house’s price just by looking at a picture of it. You can use some features, such as the number of windows or floors to get a rough estimate. But without knowing where the house is located, what is inside, and other details, there is a high probability of error.

On top of that, the mathematics itself contradicts these post-truth claims – by referring to what’s known as an “undecidable problem”, i.e. determining whether a program will behave maliciously according to its external appearance – as demonstrated by the computer scientist who formulated the definition of computer virus, Fred Cohen.

Moreover, in cybersecurity, some problems require so much computational capacity – or are so time-consuming – that even a machine learning algorithm would be ineffective in solving them – making them practically undecidable.

Now put all this information into an equation with a smart, dynamic opponent and your endpoints can end up infected.

ESET has considerable experience with intelligent adversaries and knows that machine learning alone is not enough to protect endpoints. We have been using this technology for years and have fine-tuned it to work with a variety of other layers of protection that are under the hood of our security solutions.

Moreover, our detection engineers and malware researchers constantly supervise “the machine” to avoid unnecessary errors along the way, ensuring that detection runs smoothly without bothering ESET business customers with false positives.

The whole series: