Recently, I have discovered a hidden bug in a computer program, which caused the program to crash under certain circumstances. The code looks like the following:
The program crashed when
tmpBlob is being written inside the for-loop. But why?
Maybe a bit more background information first: the function is used to copy an image blob into a
C++ string which will be used later in the program; the size of the image is passed in as
When I debugged the program, I found the image that caused the crash has
in_size around 70MB and
was filled and not empty. I suspected the large image size could be the root cause but I wasn’t sure. Then I found
the memory for
tmpBlob was not allocated at the moment the program crashed, since the memory of that array
was not accessible in the debugger. So … ah-ha!!!!! There it is!
tmpBlob is a statically allocated variable-length character array, which means it is usually allocated on statck.
If the array length is too long (or the available remaining space on stack is not enough), which was my case,
then stack overflow will occur and allocation will fail.
And no, not the stackoverflow website.
But wait, how do you know for sure the array is bigger than your stack? how big is your stack? Try
Looks like normal stack size is only a few mega bytes. Therefore, 70MB is no doubt way to big. So, the array space was not created, but the program still tried to write data to a non-existing memory space, segfault occured, mystery solved.
This type of error is one of the ‘NO-NOs’ your teacher taught you to avoid at school, but yet it happens in real-life,
production code (how embarrassed). The rule of thumb is, you shouldn’t allocate variable-length array on stack
if the size cannot be determined at compile-time; in such cases, you should allocate memory (using
malloc) on heap.
Stack is better-suited for fixed-size data structures.
However, we can also imagine in a reasonable-sized software, the need for dynamic memory allocation is quite large. And if
we keep using
malloc all over the place, the code will look like spaghetti and causes memory fragmentation, which might
drag down your software performance; in other words, this is not an efficient way to manage your memory EITHER. The more
structural solution is to implement a memory pool.
My colleauge provided an alternative solution to the code above, which prevents copying twice (and creating allocation issues):
Much better and simpler isn’t it? But why C++ has to make things so complicated in the first place so that the first person who wrote the original code didn’t figure out?
Also, the rule of thumb is to never use a C array in C++. Use
vector instead if you know the size in advance (which is the case
in the example shown above - a known fixed length at run time), and
vector is always allocated on stack which also solves our
problem mentioned in this post.