I have previously recommended a course entitled Software Exploits by Open Security Training, and similarly, a book “called The Shellcoder’s Handbook: Discovering and Exploiting Security Holes”. Using some of the examples presented in the book, I thought it would be a good idea to explore how the theory on real code vulnerabilities stands true.
Copying data
One of the simplest scenarios in which vulnerable code can manifest itself - which can usually be spotted immediately - goes hand in hand with the copying of buffer data using functions such as strcpy, without performing any check on the size of the copy.
The above shows part of the vulnerable code on the University of Washington's IMAP server, which was corrected in 1998. We can see that it never checks the size of the data in mechanism before copying to tmp, which could result in a buffer overflow. The error is easily resolved by a check using strlen (mechanism) before copying, or by using n bytes copy functions, like strncpy.
As you can imagine, it’s very unusual these days for such vulnerabilities to be found in open-source applications; and, where they do exist, they are quickly corrected.
Wrong indexes in loops
When the indexes or cut conditions in iterative loops are badly programmed, it can result in more bytes being copied than was intended: either one byte (off-by-one) or several (off-by-a-few). Take, for example, this old version of the OpenBSD FTP demon:
While the purpose of the code is to reserve one byte for the null character at the end of the string, when the size of name is equal to or greater than npath, and the last byte to be copied is “ (double quotation mark), we can see that the index i increases by more in the highlighted command. This results in the null character being inserted one byte beyond the limit, generating an overflow.
Loops that parse strings or handle user inputs tend to be good places to look for vulnerabilities, as shown in the below example, again taken from the University of Washington's IMAP server (CVE-2005-2933):
In line 20, there is a search for a double quotation mark within the string being parsed. If it is found, the loop in line 22 will copy until a second double quotation mark is found. Clearly, if a string is entered that only has one such character, the loop will keep copying, resulting in a stack overflow.
Integer overflow
One thing that can often happen when attempting to avoid an excessive amount of data copying to a buffer - by checking the size to be copied - is that the number of bytes to be copied exceeds the largest number that can be represented by the data type, rotating toward small values. The check ends up being completed only because the size is interpreted as a small number, but the copy is actually made up of a large number of elements. This results in an overflow of the destination buffer.
An example of this is when a multiplication is used that produces very large results, like in this OpenSSH code, which appears in versions prior to 3.4:
We can see that nresp stores the size of a packet entered by the user. In the highlighted command, nresp is multiplied by the size of a cursor (4 bytes). Consequently, if nresp stores a value greater than 1073741823, the multiplication will exceed the maximum value of unsigned int (4294967295). In other words, malloc will reserve a small quantity of memory and the loop will copy a large quantity of data, producing an overflow. This type of vulnerability is still relevant and common nowadays.
These are just a few types of vulnerabilities, and I would recommend looking at the book I mentioned earlier for more examples. While some, like the strcpy type, are now very uncommon in today's open-source applications, their presence lives on in closed or proprietary applications, or those that have never undergone a code audit.
As a general rule, when looking for vulnerabilities in open-source code, it is advisable to check portions of code that are rarely executed, with special functionalities, as they are more prone to errors. Furthermore, as already mentioned, when developers apply their own functions for parsing or handling strings, data frames, and so on, it increases the chance that they will make errors.