The Hype-free blog at http://hype-free.blogspot.com/2009/12/congratulation-to-av-comparatives.html yesterday mentioned the latest AV-Comparatives round of test reports, including:

I have a pretty jaundiced view of testing organizations in general: after all, I see some pretty awful tests proclaimed by the testers and others as in some sense authoritative (or, even worse, AMTSO-compliant), but it's good to see AV-Comparatives recognized for its continuing efforts to provide a "great and impartial service", as I know how hard Andreas and Peter work.

I had to take issue, though, with Hype-free's surprise at "the high detection rates in the dynamic test - upward of 90%. .. I would expect AV products to be around 60-70% effective against new threats." I think this is confusing two issues.

I wish anti-malware products could manage closer to 100% detection of new threats, but that's not very feasible unless they're used as part of a multi-layered defensive strategy incorporating some form of generic filtering. But I think it's a little strange to assume that testers can find and identify threats better than vendors. Of course, some mainstream testers do have enormous collections, and may have sources and resources that aren't necessarily available to vendors, such as their own honeypots/honeynets/crawlers, or access to samples from other vendors and testers. But most mainstream vendors also share samples with their peers through inter-lab or inter-researcher agreements, or through third parties. Often the same third parties, of course.

So it's not so surprising if, in a well-implemented test, mainstream products not only score pretty highly, but are pretty well-clustered around a fairly high mean. It's actually more surprising if there's a huge disparity between products from which you'd expect roughly comparable performance: this is likely to signify a methodological anomaly such as an over-specialized test focus, a sample selection bias, sample validation bias/error and so on.

Does this mean that detection testing isn't important? Well, detection certainly matters, and it would take a braver man than I to say that detection testing is irrelevant. However, there's a pretty wide margin for - perhaps variation would be a better word in this instance - which goes some way towards explaining why the top-ranking products vary between tests. And, of course, there's a lot more to a good product than simple detection rates, which is why it's good to see more emphasis (as in AV-Comparatives' testing)  on other functional aspects such as performance. In fact, there is an AMTSO paper on performance testing (as opposed to detection testing, of course) in process right now.

While (as in the AV-Comparative reports) there is a trend towards calling these convergent test methodologies "whole product testing", I'd like to see good testers take on even more scope, and work towards standardized ergonomic testing, for example.

Make no mistake though: this will be no easy option. True dynamic testing is more difficult and more resource-intensive than static testing, and it's depressing to see some testers announcing that their tests are now "dynamic"  as if the mere label in itself conferred validity. True whole product testing is an even more daunting prospect. Kudos to AV-Comparatives for having taken a signficant step towards it.

David Harley
Director of Malware Intelligence