Wednesday, 14 April 2010
Static Analysis: A Science or An Art?
« Top 3 Reasons You Need Hybrid Analysis. | Main | Expansion of Domain Names to Include Non-latin Characters »Last Monday marked the beginning of my seventh year at Fortify. During the last six years, I’ve been participating in several efforts to perform scientific experiments in the area of static analysis. Some focus on comparing different static analysis tools. Others have a goal of establishing guidelines for the code that should be analyzable by static analysis. While others want to define the notion of compliance as it is applied to the tools – that is, define a set of requirements that the tools need to meet in terms of the kinds of vulnerabilities they detect and results they generate. I think static analysis is still a little bit of an art, so while the knowledge we gain from such efforts is potentially amazingly useful, the challenges we face must be addressed before the outcomes of similar endeavors become beneficial. Some of the challenges I personally ran into are discussed below.
- Different tools generate different results. Time and time again, we witness differences in the output generated by static (ant not just static) analysis tools. Even though the tools claim to be looking for the same kinds of problems, techniques they use differ enough to generate results with very little overlap. In fact, even if they do overlap, it is very difficult to correlate them because they differ in format and metadata that is associated with each finding.
- Evaluating generated results is not enough. When comparing static analysis tools, evaluating generated results is definitely important, but considering other aspects of the product’s usage is critical. The tool might be producing excellent findings with low false positive rates, but be completely unusable because it does not integrate well into build systems, does a poor job of managing and tracking results, or cannot be customized to fit specific user needs.
- Definition of false positive is highly subjective. To some, false positive means reporting something that is not true under any circumstances, while to others a result that is not interesting in the current context is also a false positive. Which brings me to the last and probably most interesting challenge:
- Tools perform differently in different contexts. The same tool can perform very poorly in one situation and amazingly well under different circumstances. It all depends on the kinds of constructs and libraries used in the code, the size of the codebase, build system used for building the code, application profile (whether it is internal or externally facing), whether the user wants to treat the database as a source of untrusted data, and a lot of other factors that we sometimes don’t even think about in advance. Obviously, it is impossible to build one tool that understands different contexts equally well out-of-the-box. That’s why it’s so important to build a product that is extremely flexible -- can be easily customized, configured, and extended in various ways that make the most sense. It is not easy!
Perhaps, in the future things will change -- different techniques the tools use to perform analysis and interfaces for interacting with the tools will converge, but I think we’re not there yet. That’s why, before we attempt to compare tools in a scientific fashion and try to define binary compliance guidelines for the products by requiring them to generate a set of specific results on a set of specific benchmarks, we need to acknowledge and address the challenges current state of the art (pun intended) presents.
[Trackback URL for this entry]







