c++ - Reading from a socket into a buffer

Question

This question might seem simple, but I think it's not so trivial. Or maybe I'm overthinking this, but I'd still like to know.

Let's imagine we have to read data from a TCP socket until we encounter some special character. The data has to be saved somewhere. We don't know the size of the data, so we don't know how large to make our buffer. What are the possible options in this case?

Extend the buffer as more data arrives using realloc. This approach raises a few questions. What are the performance implications of using realloc? It may move memory around, so if there's a lot of data in the buffer (and there can be a lot of data), we'll spend a lot of time moving bytes around. How much should we extend the buffer size? Do we double it every time? If yes, what about all the wasted space? If we call realloc later with a smaller size, will it truncate the unused bytes?
Allocate new buffers in constant-size chunks and chain them together. This would work much like the deque container from the C++ standard library, allowing for quickly appending new data. This also has some questions, like how big should we make the block and what to do with the unused space, but at least it has good performance.

What is your opinion on this? Which of these two approaches is better? Maybe there is some other approach I haven't considered?

P.S.:

Personally, I'm leaning more towards the second solution, because I think it can be made pretty fast if we "recycle" the blocks instead of doing dynamic allocations every time a block is needed. The only problem I can see with it is that it hurts locality, but I don't think that it's terribly important for my purposes (processing HTTP-like requests).

Thanks

score 0 · Accepted Answer

私は2番目の変種を好むでしょう。また、生のバッファを 1 つだけ使用して、受信したデータを処理してから、ソケットから別のデータを受け取ることを検討することもできます。つまり、特殊文字に遭遇する前にデータの処理を開始します。

いずれにせよ、生メモリと再割り当てを使用することはお勧めしませんstd::vectorが、独自の再割り当てを持つものを使用するかstd::array、固定サイズのバッファとして使用してください。

socket_iostreamまた、生のバッファの上に別の抽象化レイヤーを提供するBoost.Asio の s にも興味があるかもしれません。

score 0 · Accepted Answer

方法 2 の方が良いように思えますが、パーサーに重大な影響がある可能性があります... つまり、特別なマーカーを見つけた後、HTTP リクエストの解析中に連続していないバッファーを処理すると、大きなバッファーを再割り当てするよりもコストがかかるか複雑になる可能性があります (方法 1)。Net-net: パーサーが単純な場合は 2 を使用し、そうでない場合は 1 を使用します。

c++ - Reading from a socket into a buffer

2 に答える 2

Related

Reference