Friday, 20 February 2009
SHA-3 Round 1: Buffer Overflows
« Gartner Magic Quadrant for Static Analysis | Main | SHA-3 Analysis Details »NIST is currently holding a competition to choose a design for the SHA-3 algorithm (Bruce Schneier has a good description of secure hashing algorithms and why this is important). The reference implementations of a few of the contestants have bugs in them that could cause crashes, performance problems, or security problems if they are used in their current state. Based on our bug reports, some of those bugs have already been fixed. Here's the full story:
The main idea behind the competition is to have the cryptographic community weed out the less secure algorithms and choose from the remainder. A couple of us at Fortify (thanks to Doug Held for his help) decided to do our part. We're not hard-core cryptographers, so we decided to take a look at the reference implementations.
This competition is to pick an algorithm, but all of the submissions had to include a C implementation, to demonstrate how it works and test the speed, which will be a factor in the final choice. We used Fortify SCA to audit the 42 projects accepted into Round 1. We were impressed with the overall quality of the code, but we did find significant issues in a few projects, including buffer overflows in two of the projects. We have emailed the submission teams with our findings and one team has already corrected their implementation.
Confirmed issues:
| Implementation | Buffer Overflow | Out-of-bounds Read | Memory Leak | Null Dereference |
|---|---|---|---|---|
| Blender | 1 | 0 | 0 | 0 |
| Crunch | 0 | 0 | 0 | 4 |
| FSB | 0 | 0 | 3 | 11 |
| MD6 | 3 | 2 | 0 | 0 |
| Vortex | 0 | 0 | 1 | 15 |
One of the projects with buffer issues was MD6, the implementation provided Professor Ron Rivest and his team. All of the problems came back to the hashval field of the md6_state struct:
unsigned char hashval[ (md6_c/2)*(md6_w/8) ];
The buffer size is determined by two constants:
#define w md6_w /* # bits in a word (64) */ #define c md6_c /* # words in compression output (16) */
At several points, this buffer is read or written to using a different bound:
if (z==1) /* save final chaining value in st->hashval */ { memcpy( st->hashval, C, md6_c*(w/8) ); return MD6_SUCCESS; } Further analysis showed that ANSI standard layout rules would make incorrect behavior unlikely, but other compilers may have allowed it to be exploited. The MD6 team has doubled the size of the vulnerable buffer, which eliminated the risk. In this case, Fortify SCA found an issue that would have been difficult to catch otherwise.
The other buffer overflow was found in the Blender implementation, from Dr. Colin Bradbury. This issue was a classic typo:
DataLength sourceDataLength2[3]; // high order parts of data length ... if (ss.sourceDataLength < (bcount | databitlen)) // overflow if (++ss.sourceDataLength2[0] == 0) // increment higher order count if (++ss.sourceDataLength2[1] == 0) // and the next higher order ++ss.sourceDataLength2[3]; // and the next one, etc.
The developer simply mistyped, using 3 instead of 2 for the array access. This issue was probably not caught because it would not be exposed without a very large input. The other issues we found were memory leaks and null dereferences from memory allocation.
This just emphasizes what we already knew about C, even the most careful, security conscious developer messes up memory management. Some of you are saying, so what? These are reference implementations and this is only Round 1. There are a few problems with that thought.
Reference implementations don't disappear, they serve as a starting point for future implementations or are used directly. A bug in the RSA reference implementation was responsible for vulnerabilities in OpenSSL and two separate SSH implementations. They can also be used to design hardware implementations, using buffer sizes to decide how much silicon should be used.
The other consideration is speed, which will be a factor in the choice of algorithm. The fix for the MD6 buffer issues was to double the size of a buffer, which could degrade the performance. On the other hand, memory leaks could slow an implementation. A correct implementation is an accurate implementation.
We will put out a more detailed report on all the results soon.
Technorati Tags: sha-3 buffer overflow
[Trackback URL for this entry]
Nice post. Do you guys plan on releasing all your confirmed findings? I believe these would be very interesting resources.
Why?
These reference code is _NOT_ the code that will be used in real life applications. It was written for a specific input, and no one should ever use it in any application...
A complete waste of time...
SHA-3 Round 1: Buffer Overflows
i SHA-3-implementationer Fortify har postat på sin blogg om en undersökning av säkerheten i referensimplementationerna av SHA-3-kandidaterna. Fortify har använt sitt verktyg
Kilmo is really wrong on this. Reference code lasts and lasts and lasts. In fact, people are reluctant to change it, precisely because they feel that it has been looked at carefully.
Please continue the good work.
Dyanmo is also wrong;
The point here is analyzing the underlying algorithms, not some arbitrary pseudocode implementation.
Why not analyze these functions for something that /matters/? Like collision resistance, differential cryptanalysis resistance, hash length extension attacks, etc.?
Oh, right, because that requires actual knowledge of cryptography. You fail.
Nick,
C is "arbitrary pseudocode?" Since when? Do all the people programming in C know their code can't be used in the real world?
I think the key was Name of a security developer *AND HIS TEAM*. What do we know of them? Apaprently they ARENT the most security concious developers. C is by far the most powerful flexible language, and a truely aware team can write secure code. To attack C is ludicrous. Yeah! Lets write everything in JAVA! or interpret/babysit everyone! This isn't the answer. If we truely make security a focus, C coders can be trained to do things right. Unfortunatly now, usually the focus is features, minimal budget, and short schedules. Note security wasn't listed.
The point of the competition isn't to produce great code; it's the high level algorithms behind the code.
Fortify should be reading the PDF algorithm descriptions, not the crap people threw together in C.
Bugs in the reference implementations have the potential to effect performance. Under-allocating a buffer can also skew the cost estimate for an embedded system. In other words, these bugs have the potential to effect the outcome of the competition.
The purpose here is not to judge the algorithm based on some implementation errors, it's to get the implementation errors fixed so that the best algorithm is selected and so that mistakes aren't propagated into production code.
Mike, if you offer an entry into a competition to choose an algorithm which will see world-wide distribution and be relied upon for security, and you throw together some crap in C for the reference code, then that indicates the quality of your entry.
As has been pointed out several times, the reference code of the top algorithms will both influence the winning choice and will no doubt inform/become several real-world implementations.
Performing this analysis on the reference code was a very good idea, which will contribute towards a better product in the end. Well done!
P.S. Nick, there's plenty of work to go around. If you want to do a cryptographic analysis of collision analysis, etc - go for it! Just because Fortify chose to analyse a different aspect of the code, doesn't mean that you can't contribute too. Unless you simply like criticising other people's work without doing any yourself...
I think some commenters are missing the point here - these code tests are only one of the tests being applied to the algorithms. There is, of course, exhaustive cryptanalysis being applied to them in parallel to check for the more fundamental problems in the algorithms. However, testing the reference code for standard coding errors is vital as reference code _does_ get used a lot, even where faster or more secure implementations might exist elsewhere. Having this kind of code audit is vital to the security process but it is by no means the only step.
Here's another way to think about the use of reference implementations in production code:
The reference implementation will be used to validate the production code. If the reference implementation has defects, the production code will reliably reproduce those defects.
What does "reference" mean? It's a well-known high-quality standard that everything is compared with.
On the audio production side we have reference recordings, reference microphones, reference monitors, reference headphones, etc. None of those are random crap to show a proof of a concept or something.
kilmo and Nick are just trolling. Anyone as incompetent and anti-social as them wouldn't survive five minutes when let out of the basement.
Sorry to all those in the ivory tower but in the real world the reference implementations often show up in real code (especially open source). The purpose of the competition is not to produce code snippets for use in real code but that is what happens. The comments of Brian, Brian and DaveM are spot on. In addition this is not just Fortify grandstanding but rather them providing valuable feedback to an important program. I personally appreciate this sort of industry involvement in the process. Correcting these defects early in the process is critical to insuring that clean, safe and secure code propagates in to production software.
It is certainly appreciated that work is put in to improve the implementations during the competition (my group did something similar for the Java parts of AES, so I know how much work it can be).
However I think it is not really efficient at this stage to insist on secure programming for submission implementations. For the simple reason that there are 42 submissions, and 41 of those will be thrown away, more or less. There isn't much point in making the 41 secure; better off to save the energy until "the one" is found. Then concentrate the energy, no?
1. What does it say about an algorithm that it's too complex for its designers to implement correctly?
2. All of the bugs described here would be bugs in other languages - they might be caught at different times, and/or cause different behaviours, but to use this as an indictment of C seems a little cheap.
4. Implementing a three byte counter manually as array elements seems to be a waste of compiler.
RE: The comment about the security problems in the reference code being a "non-issue" brought to mind something my old boss used to say: " I never met a prototype I didn't ship"
SHA-3 Round 1: Buffer Overflows
a competition to choose the next standard hash scheme. When the corpus of reference implementations were examined with Fortify’s code analysis tool, it was found that two of the submissions had buffer overflow errors. These problems don’t affect the cryptographic correctness of the
@Alun Jones Buffer overflows are certainly not a problem common to most modern languages. Of the four problems analyzed, only null dereferences are an issue.
They're a symptom of C's lack of real memory management. That the programmer has to allocate and deallocate static chunks of memory leads to mistakes. That C will let you walk off the end of an array or string without so much as a peep... until it crashes horribly. This is a major cause of security holes and crashes in software.
And it's not that the algorithm is too complicated to implement, its that memory management is too complicated to implement. This is why all modern languages handle all this for you, growing and shrinking data structures as necessary. The down side is less control, less performance. The up side is you never have to worry about a buffer overflow in Perl, Python, Ruby, Java, Haskell, Javascript.
C clearly trades off safety for performance and control. It is portable assembly. It is by design, but that choice has severe consequences.
Of course, all modern languages are all written in C...
C++ Coder writes, at 22 Feb 2:07 pm:
"I think the key was Name of a security developer *AND HIS TEAM*. What do we know of them? Apparently they AREN'T the most security conscious developers."
http://people.csail.mit.edu/rivest/
http://en.wikipedia.org/wiki/Ron_Rivest
Bravo, Fortify. I think the gold here is that Fortify found significant errors that human reviewers are likely to miss.
While I'm still a fan of manual code review as critical to risk reduction, I can't deny that there are some things that static analysis does very, very well. The OWASP ASVS does a good job of breaking this down.
Definetely very surprising, especially in the more 'pedigree' MD6. I'd already decided auditing the reference code was 100% pointless, guess I'll rethink that and go hunting.
No problems found in CubeHash's implementation I see. I wonder if djb will back the quality of his code with prizes :)
mandolin: ORLY? Why do people think great mathematical skills mean you're also a great programmer? A mathematician may be good at designing concepts and high-level algorhythms but actual programming is frequently too low-level. It's only human to overestimate your skills in one area you know a bit about if you excel in a related area.
Writing *correct* code has nothing to do with *security*. Of course, an incorrectly implemented, yet securely designed algorhythm, will likely compromise security but correctly implemented code is not secure per se at all. You wade deep in snake oil already - or just fail horribly at human language - if you use these terms as loosely.
I can tell that most people coding in C I've known, have never read any of the C standards. They only practiced learning by doing and sooner or later they'll become totally ignorant and arrogant towards the standards, so that whenever someone points out what they're doing is either not portable or obviously not standards-conform, these guys frequently ignore you, even if you provide a correction, or worse try to offend you and bash the standards for being stupid or whatever.
Schwern: C has never been a portable assembly language. Unfortunately, many people see it this way and try to use the same "tricks" that were acceptable in assembly language on a specific machine but a strictly forbidden in C. I see a parallel between modern languages and modern politics. C is one of these languages which give you a lot of freedom but also responsibility. Nowadays nobody wants to take responsibility anymore but everyone is crying for security despite knowing that perfect security exists nowhere except in theory perhaps. Likewise, this also means less freedom. I'm not saying this means anything in this context, I just find it interesting that this parallel exists.
By the way, you can do high-level programming in C, too. It'll simply take more effort but the result is likely more readable and maintainable. There are no rules that you have to obfuscate your code, use unintelligible identifiers, use brain-dead library functions etc. People who do this without any need and are even proud of it are simply abusing the freedom given to them by C.
Iang: Then why are they already providing optimized code for MD6?
Last but not least, it's fairly pathetic to see these professionals don't seem to use verification tools that have been around for years.
DaveH,
The FPRs are from an unreleased version of Fortify SCA, so we will not be making them generally available.
Regards,
Doug
Douglas Held,
Can you tell how many false positives Fortify found in each of these reference implementations and how many hours of manual review were required to sort out the false from true positives?
I'm curious about the performance of the unreleased version of SCA.
SHA-3 Round 1: Buffer Overflows
para seleccionar el algoritmo que se utilizará para SHA-3. Pues bien; una primera aproximación con la herramienta de análisis de código de Fortify ya ha detectado varios bugs en algunos de los algoritmos candidatos. Entre los más destacables, varios








Good write-up. You're absolutely correct, too - crypto code, as an underlying routine that's linked all over the place, has to be pretty much perfect.