c - memcpy の正しい実装方法

Question

memcpy の次の実装を見つけました (インタビューの質問、反復回数 ~ サイズ/4):

void memcpy(void* dest, void* src, int size)
{
    uint8_t *pdest = (uint8_t*) dest;
    uint8_t *psrc = (uint8_t*) src;

    int loops = (size / sizeof(uint32_t));
    for(int index = 0; index < loops; ++index)
    {
        *((uint32_t*)pdest) = *((uint32_t*)psrc);
        pdest += sizeof(uint32_t);
        psrc += sizeof(uint32_t);
    }

    loops = (size % sizeof(uint32_t));
    for (int index = 0; index < loops; ++index)
    {
        *pdest = *psrc;
        ++pdest;
        ++psrc;
    }
}

そして、私はそれを理解しているかどうかわかりません.....:

1) なぜ定義uint8_t *pdest,uint8_t *psrcし、その後にキャストするのかuint32_t-

*((uint32_t*)pdest) = *((uint32_t*)psrc);

最初から uint32_t として定義する必要があると思いpdestますpsrc...何が欠けていますか? 2) この実装には問題があるように見えます: if src = 0x100anddst = 0x104 および src (元の) はそのように見えます:

-------------------------
|  6  |  8  |  7  |  1  |
-------------------------    
0x100  0x104 0x108 0x1C0

実行後はそのようになります

-------------------------
|  6  |  6  |  6  |  6  |.....
-------------------------
0x100  0x104 0x108 0x1C0

それにもかかわらず、次のメモリレイアウトが結果になるはずです

-------------------------
|  6  |  6  |  8  |  7  |....
-------------------------
0x100  0x104 0x108 0x1C0

score 0 · Accepted Answer

regarding pointer type: the idea here is that in order to reduce both loop and copy overhead, you want to copy using the largest data "chunks" (say 32bit) possible. So you try to copy as much as possible using 32bit words. The remainder then needs to be copied in smaller 8bit "chunks". For example, if you want to copy 13 bytes, you would do 3 iterations copying 32bit word + 1 iteration copying a single byte. This is preferable to doing 13 iterations of single byte copy. You could convert to uint32_t*, but then you'll have to convert back to uint8_t* to do the remainder.

Regarding the second issue - this implementation will not work properly in case the destination address overlapps the source buffer. Assuming you want to support this kind of memcpy as well - it is a bug. This is a popular interview question pitfall ;).

score 0 · Accepted Answer

The first loop goes through and copies memory from psrc to pdest in chunks of 4 bytes per loop, hence the cast to uint32_t*. The second loop copies the remaining memory in chunks of 1 byte. For large chunks of memory, this effectively cuts down the number of iterations by a factor of 4.

As to the reason why the cast is to uint8_t* instead of uint32_t*. With a cast directly to uint32_t*, the first loop would work fine however you would need to increment the pointers by 1 instead of 4 each loop. You would get something like the following

for(int index = 0; index < loops; ++index)
{
    *(pdest) = *(psrc); //no need for cast
    pdest++;            //increment by 1 not 4
    psrc++;
}

However with the second loop you would need to cast to uint8_t*, and increment the pointer by 1/4. There is no way to do this with pointer arithmetic, so it is not possible to do it this way.

Another way to think about it: loops1: The number of 4 byte chunks in the original memory block loops2: The number of bytes left over

c - memcpy の正しい実装方法

3 に答える 3

Related

Reference