This question might seem simple, but I think it's not so trivial. Or maybe I'm overthinking this, but I'd still like to know.
Let's imagine we have to read data from a TCP socket until we encounter some special character. The data has to be saved somewhere. We don't know the size of the data, so we don't know how large to make our buffer. What are the possible options in this case?
Extend the buffer as more data arrives using
realloc
. This approach raises a few questions. What are the performance implications of using realloc? It may move memory around, so if there's a lot of data in the buffer (and there can be a lot of data), we'll spend a lot of time moving bytes around. How much should we extend the buffer size? Do we double it every time? If yes, what about all the wasted space? If we call realloc later with a smaller size, will it truncate the unused bytes?Allocate new buffers in constant-size chunks and chain them together. This would work much like the deque container from the C++ standard library, allowing for quickly appending new data. This also has some questions, like how big should we make the block and what to do with the unused space, but at least it has good performance.
What is your opinion on this? Which of these two approaches is better? Maybe there is some other approach I haven't considered?
P.S.:
Personally, I'm leaning more towards the second solution, because I think it can be made pretty fast if we "recycle" the blocks instead of doing dynamic allocations every time a block is needed. The only problem I can see with it is that it hurts locality, but I don't think that it's terribly important for my purposes (processing HTTP-like requests).
Thanks