Code security review is quite a hot topic nowadays. Many developers don't really think much of security when writing code. There are many great tools which can scan code in different programming languages for typical bugs, such as PVS Studio or checkmarx. However, many companies skip static code analysis at all, and some developers just ignore compiler or analysis tool warnings. Some bugs can be easily found manually, while being ignored by automated tools.
This short cheat-sheet describes typical vulnerabilities and unsafe code patterns, which were found in code of real projects. Many of the bugs were not reported by static analysis tools. The aim of this page is to show some patterns, which you have to pay attention to when manually analyzing the code. The code patterns you'll see here should trigger you when you encounter them in real code.
The cheat-sheet will somewhat focus on C/C++
, but will also touch some other languages.
- Unintended default enum values (Any language)
- Unhandled, unexpected, mishandled exceptions (Any language)
- Safe unsafe functions (C/C++)
- Race conditions and their possible consequences (Any language)
- Manual buffer manipulation and input parsing (Any language)
- Missed switch cases (All languages)
- Possibly vulnerable libraries or their incorrect usage (All languages)
- TOCTOU with file operations (All languages)
- Improper logging (All languages)
- Passing secrets to another program via the command line (All languages)
- Excessive privileges for a resource (All languages)
Enumeration is a well-known and wide-spread construct of many programming languages. Enumeration defines a set of distinct names in the common named scope and assigns some values to them. Unfortunately, enum mishandling may lead to critical vulnerabilities in some cases. Let's look at the following example:
enum class UserPrivileges {
SuperUser,
Admin,
User,
Guest
};
Nothing bad happens here yet, but you may have noticed, that the UserPrivileges::SuperUser
has the default value 0
. Now let's take a look at this simplified code fragment:
struct User {
string name;
timestamp_t loginTime;
bool isLoggedIn;
UserPrivileges privileges;
//...
};
User createDefaultUser(/* ... */) {
//...
return User {};
}
void loginUser(User& user, /* ... */) {
//authenticate user,
//fill user privileges,
//set user.privileges,
//set user.isLoggedIn = true
}
Now things get interesting. As you can see, the createDefaultUser
function creates a default user with privileges
field initialized to value 0
by default. This corresponds to the UserPrivileges::SuperUser
enum value. This means, that when a new user appears in the system, this user gets the SuperUser
privileges by default! Sometimes instead of the default constructor call or value initialization you will see a line like memset(&user, 0, sizeof(user))
.
Now you need to look through the code. There may be places, which first check the isLoggedIn
boolean flag, for example:
void readSecureData(const User& user) {
if (!user.isLoggedIn) {
throw NotAuthenticatedException();
}
//...
}
Such code fragments are secure, because isLoggedIn
flag is false
for a default user. However, there may be code fragments, which rely exclusively on the privileges
field check:
void changeDeviceSettings(const User& user) {
if (user.privileges != UserPrivileges::SuperUser) {
throw NotEnoughPrivilegesException();
}
//...
}
And here things can turn really bad. If there is a way to call this function for the unauthenticated user while bypassing other checks, privilege escalation becomes possible, and I've seen such bugs in real code. Basically, a guest user can execute some endpoints or functions, which must be accessible only to the authenticated super user!
Advice for code reviewers and for developers: pay attention to default enum values. Pay attention to default values of other critical fields of classes and structs. Static code analysis tools may not report such problems.
Exceptions is a big part of many programming languages. Unfortunately, many programmers fail to handle them correctly, which may lead to vulnerabilities in code. Let's look at some of such cases.
Some developers (especially C++) sometimes pretend, that exceptions do not exist at all. They don't use try
, catch
or throw
keywords, they handle errors using return codes. Some of them even pass the -fno-exceptions
flag to their C++ compiler. There are several major negative consequences of ignoring exceptions in languages, which support and use exceptions.
-
The first case is completely ignoring exceptions (by passing the
-fno-exceptions
flag). This basically disables all the exception-related keywords of the compiler. But the exceptions may still be there! They are extensively used in the standard C++ library. If a developer uses this library, it may throw an exception regardless of the-fno-exceptions
flag. The library must be compiled with the-fno-exceptions
flag, too! But this still does not solve the problem completely. Allthrows
in the standard library are now replaced with calls to theabort
function, which kills the program instantly instead of throwind an exception. There may be a bunch of denial-of-service vulnerabilities in such code! An attacker just needs to find a way to trigger any exception in the standard library, which is used by the code. -
The second common case is forgetting to handle exceptions. Let's look at this example code:
int parseHexInteger(const std::string& str) noexcept { if (str.empty()) { return 0; } return std::stoi(str, nullptr, 16); }
Here, the author of this code forgot to check, which exceptions may be thrown by the
std::stoi
function. There are several string-to-number conversion functions in C++, some of which do not throw exceptions, and some of them do. This particular one throwsstd::out_of_range
andstd::invalid_argument
exceptions, which the developer didn't expect. If this function is used to parse the untrusted input, it is extremely easy to kill the program, thus causing denial-of-service. -
Exception mishandling is the third case we'll look at. Consider the following pseudo-code based on some real code:
boolean checkSignatures(List<Blob> signedBlobs) { bool checkPassed = false; for (Blob blob : signedBlobs) { try { checkPassed = checkSignatureForBlob(blob); } catch (Exception) { Logger.log("Incorrect signature for blob"); } if (!checkPassed) { break; } } return checkPassed; }
The developer checks digital signatures for an array of blobs. If any signature is incorrect, the function returns
false
. Does it? Take a minute and try to find the bug yourself.This function checks the list of blobs one by one. The first blob may be signed correctly, so the
checkPassed
variable is now set totrue
. However, all other blob signatures may be intentionally corrupted, thus causing thecheckSignatureForBlob
function throw exceptions for all of them. Notice, that thecheckPassed
variable value remainstrue
in this case! When all blobs are enumerated, thecheckSignatures
function returnstrue
, even that only the first blob was signed correctly! This may lead to anything like privilege escalation, denial of service, etc, because the program will process untrusted unsigned data.
Advice for code reviewers: pay close attention to exception handling in code. Are exceptions ignored? Are some exceptions missed? Are they handled correctly?
Advice for developers: pay close attention to exception handling. Check all the external functions you use. Do they throw exceptions? Do you handle them correctly? Do not ignore exceptions in exception-enabled languages, as the performance boost you (probably) get may not be so drastic compared to the bugs you may unintentionally bring to your code.
Many companies currently enforce usage of so-called "safe" C/C++
functions instead of unsafe
ones. For example, developers must use strcpy_s
instead of strcpy
or scanf_s
instead of scanf
. Some companies enforce these rules via code checking during automated build, some of them even develop their own "safe" libraries, which provide a set of such functions.
Of course, all of the functions in the language are safe. It is only their usage that may be unsafe. If a programmer doesn't know how the function works, it is easy to make a mistake and write code with a buffer overflow (or some another) vulnerability.
Unfortunately, enforcing rules for inexperienced programmers is not the best tactics. Programmers will use whitelisted "safe" functions, yes. But they still may use them incorrectly, for example:
void processBuffer(const char* untrustedBuffer, size_t bufferSize) {
char tempBuf[64];
//...
//...
strcpy_s(tempBuf, bufferSize, untrustedBuffer);
//...
//...
}
The second argument of the strcpy_s function is destsize
- the size of the destination buffer you are copying the string to. However, in the above code the developer passed the source buffer size of an untrusted buffer.
The function is "safe", but it still will overflow the tempBuf
buffer. And I've seen similar misusage in the real code.
Advice for code reviewers: even if the function is "safe", check its arguments. They may be passed incorrectly, or they may be swapped.
Advice for developers: if you code in C++, just learn modern C++ and don't use low-level functions at all. There is an extensive standard library, which completely removes the need to use these low-level functions. If you still need to write low-level code, carefully check, which arguments and in which order you pass to low-level functions.
Race conditions and data races are quite common in code. Multithreaded code is hard to develop. Many programmers incorrectly utilize synchronization facilities in the language of their choice, or forget to synchronize access to some shared writable variable at all. Such bugs are sometimes hard to exploit, but sometimes they are extremely handy and easy to use for an attacker. Possible consequences of race conditions are program crashes, deadlocks and livelocks, or simply unintended program behavior. TOCTOU vulnerabilities are also race conditions, but for now I will talk about inter-thread race conditions. Let's look at the following code snippet (based on real code):
import org.restlet.resource.*;
public class LoginHandler extends ServerResource {
private static int failedLoginCount = 0;
//...
@Post
public LoginResponse login(LoginRequest request) {
if (LoginHandler.failedLoginCount == 3) {
logger.log("user is blocked");
return new LoginResponse("blocked");
}
if (!loginService.authenticate(request)) {
LoginHandler.failedLoginCount = LoginHandler.failedLoginCount + 1;
ScheduleUnblockInFiveMinutes();
return new LoginResponse("failed");
}
LoginHandler.failedLoginCount = 0;
return new LoginResponse("success");
}
}
The code seems completely fine, until you realize that it may be called by many threads of Apache Tomcat server simultaneously. When this happens, two threads may simultaneously pass the LoginHandler.failedLoginCount == 3
check, and then both may increment the LoginHandler.failedLoginCount
value. If the LoginHandler.failedLoginCount
was 2
, then it will become 4
in this case, and the check LoginHandler.failedLoginCount == 3
will always be passed by all future login attempts, enabling the attacker to perform unlimited brute-force of the user credentials!
Advice for code reviewers: carefully check multi-threaded code. It may be hard, but it may be worth your time. Such vulnerabilities may not be reported by static code analysis tools.
Advice for developers: carefully check the multi-threaded code you write. Ask your colleagues to review such code. Do not ever code by permutation when working with several threads!
Well, this is a wide topic, and very many vulnerabilities and bugs have been found in code which parses something (file formats, user input, etc). With modern languages it is much more difficult to do something wrong (compared to C, for example) when parsing untrusted input, but even with high-level language facilities it is possible to make mistakes.
Let's take a look at some examples.
//Parses the number out of the "value=123" or "value=h23f" string.
//"h" character means hexadecimal number.
//Returns 0 for invalid input.
int parseNumber(const std::string& input) {
auto pos = input.find("value=");
if (pos != 0) {
return 0;
}
auto number = input.substr(std::string("value=").size());
int base = 10;
if (number.at(0) == 'h') { //hex string
number = number.substr(1);
base = 16;
}
try {
return std::stoi(number, nullptr, base);
} catch (const std::exception&) {
return 0;
}
}
This code looks fine. It handles exceptions from the std::stoi
function, it checks if the string starts with value=
substring... Can you spot a bug?
What will happen if we pass the "value="
string to this function? The number.at(0)
will throw the std::out_of_range
exception! If the calling code doesn't handle it, the program will crash.
To the next one:
class VideoStream {
std::stringstream dataStream_;
public:
void OnDataReceived(const std::string& data, bool keyFrame) {
if (keyFrame) {
ProcessFrame();
dataStream_.str("");
}
dataStream_ << data;
}
void ProcessFrame() {
//...
}
//...
};
Here the developer is parsing the video stream from the possibly untrusted client. Data from the client is appended to the dataStream_
buffer until the keyFrame
flag is received. But what if the client never sends the keyFrame
flag? The dataStream_
buffer will infinitely grow and may take up all the available memory, disrupting other services on the server and possibly crashing!
The next example:
using CharCount = std::array<int, 256>;
CharCount countChars(const std::string& input) {
CharCount count {};
for (char ch : input) {
++count[ch];
}
return count;
}
This function counts, how many times each character is used in the input string, and returns the CharCount
array with the calculated counts for each character. What can go wrong here?!
Well, everything can. At first, the developer has incorrectly assumed that the maximum value for the char
type variabe is 256
. The developer should have used the numeric_limits<char>::max()
to determine this.
But what is much worse, the char
type can be either signed
or unsigned
, depending on the compiler settings. If this type is signed
for your compiler, then the attacker may overwrite memory outside of the count
array, passing the string like this: "test_str\xde"
. The "\xde"
character may have the -34
value (if represented as a signed char
). Basically, the attacker now can address negative array offsets, writing outside of the count
array memory!
Let's conclude with a really simple one (even detectable by automatic analysis tools, even reported by a compiler), but which I still encountered several times:
std::string str = getUntrustedString();
auto pos = str.find("text");
if (pos < 0) {
return;
}
The pos
variable is unsigned
(of the size_t
type). Thus, it can never be less then zero, and the check will always pass. The correct check is:
std::string str = getUntrustedString();
auto pos = str.find("text");
if (pos == std::string::npos) {
return;
}
Advice for code reviewers: check parsing code carefully, especially when it may parse untrusted user input. Read it manually, check it using automated tools, write fuzzing tests for it.
Advice for developers: Do not ever code by permutation when parsing something! Do not trust user input in any way, check it as carefully as possible.
This is the one automated tools may tell you about in some cases. Let's start with an example:
const int ChangePermissions = 1;
const int ReadData = 2;
const int WriteData = 3;
//...
bool HandleMessage(const Message& msg) {
bool handled = true;
switch (msg.getType()) {
case ChangePermissions:
handled = ChangePermissions(msg);
break;
case ReadData:
handled = ReadData(msg);
break;
case WriteData:
handled = WriteData(msg);
break;
}
if (!handled) {
return false;
}
SendAnswerForMessage(msg);
return true;
}
As you can see, there is no default
case in the switch-case
clause. This means, the developer may forget to add a message handling case
for a new message, which may be added many months after the initial code was written. This is especially true, when there are several separate functions handling different sets of messages. For this new message, handled
will be true
, and the code will call SendAnswerForMessage
for an unchecked and unhandled message, causing unintended behavior.
I believe, that for the code above, even automated static analysis tools will be silent. If there was enum
instead of just numeric constants, tools would warn of unhandled enum
values inside the switch
. With just constants (or macros), there is no way for a tool to determine, which values must be handled in the switch-case
!
Advice for code reviewers: check if all necessary cases are handled in switch-case
clauses, or if the default
case is present. Check the default
case for correctness.
Advice for developers: Carefully write switch-case
clauses. Do not miss cases. Add the default
case, which does something reasonable for unexpected cases. Put case
values to enumeration (don't use constants or macros), so that automated analysis tools could warn you, if you forgot to handle a value inside your switch
.
There are some libraries, which have to trigger you as a code reviewer, when you see them in code.
- Any ZIP archive unpacking library. Or code, which unpacks ZIP archives manually. There is that infamous Zip Slip vulnerability, which is still present in different libraries by design. If used incorrectly, it opens a hole in software, which may allow an attacker to overwrite any file on the system!
- Any library for the XML format processing. There is XXE vulnerabilities, and if you encounter XML processing in code, there is a chance, that these vulnerabilities may be present.
Any file operations present in code may be vulnerable to TOCTOU attacks. For example:
- The file signature is first checked, and then the file is reopened and processed.
- The file is first written, and then restrictive file permissions are set.
Advice for code reviewers: when you encounter file operations in code, pay close attention to presence of race conditions, which may lead to TOCTOU attacks. Race conditions with file operations are extremely common.
Advice for developers: Think about the order of file operations. Can it lead to race conditions in some cases?
Often software writes logs. However, often logging is implemented incorrectly from the security point of view. Let's look at some improper logging scenarios:
-
Improper log entries escaping. Logs often contain data received from the untrusted user. This data may contain any characters (including a newline character, null-bytes, etc). If this data is not properly escaped, an attacker may break log formatting. For example, new fake lines may be added to program logs, making log analysis more difficult or even impossible.
-
Writing unmasked secrets to logs. Often developers write user secrets to logs (and often unintentionally). One case of such unintentional secrets logging is exception logging. for example:
void Authenticate(string login, string password) { try { AuthenticateUser(login, password); } catch (AuthException e) { Logger.logException("Unable to authenticate", e); } }
Here,
e
may contain the user login and password, which will end up in logs in case an exception is thrown. An exception stacktrace may also contain function argument values (string login
,string password
).Passwords not only end up in a log file with possibly unrestricted read permissions. These log files also may be taken by the server administrator and transferred via insecure channels for additional analysis or backup!
Advice for code reviewers: pay attention to how logging is implemented. Are untrusted strings properly escaped? Is there any secrets that may end up in log files?
Advice for developers: Escape untrusted user data which is being written to logs. Do not write any user secrets to logs or mask them properly. Check how you log exceptions, as they may contain user secrets as well.
To pass secrets to another programs via the command line is quite a bad idea. On Linux, command lines with arguments may be visible in the process list (ps
, /proc
). They may be also logged to audit logs or /var/log/...
logs. On Windows, the situation is similar: command lines with arguments may be readable from the process list, and they also may be logged to the Event log.
Advice for code reviewers: If you see an invocation of an external program in code, check if there are any secrets, which are passed to that program via the command line.
Advice for developers: Do not pass secrets to external programs via the command line. Use interactive input (pipes) or other means to do this.
When any resource is acquired in code, it is in the best interest to request as less privileges as possible. Here are some examples:
- A file is opened. If the program is going to only read the file, then it doesn't need write access to that file.
- A database user is created. If the purpose of this user is to write data to tables, then it should not have
alter table
permissions. - A process is opened to query information about it. Thus, its handle doesn't need the
PROCESS_ALL_ACCESS
access attribute.
Excessive resource privileges may allow an attacker to perform an attack, which otherwise would not be possible.
Advice for code reviewers: Look through the acquired resource privileges, and what privileges they really require.
Advice for developers: Do not grant excessive permissions to the resource handles you acquire.