1

Can't someone explain to me why the output of this program is [nan, nan]? The code is supposed to load the value of d into the high and low 64-bits of the XMM1 register and then move the contents of XMM1 into a. Because a is not initialized to a set of specific values, D initializes each element to nan. If the movupd instruction was not in the objdump, I would understand the result, but the instruction is there. Thoughts?

import std.stdio;

void main()
{
    enum double d = 1.0 / cast(double)2;
    double[] a = new double[2];
    auto aptr = a.ptr;

    asm
    {
        movddup XMM1, d;
        movupd [aptr], XMM1;
    }
    writeln(a);
}

Here is the objdump of the main function:

0000000000426b88 <_Dmain>:
  426b88:       55                      push   %rbp
  426b89:       48 8b ec                mov    %rsp,%rbp
  426b8c:       48 83 ec 50             sub    $0x50,%rsp
  426b90:       f2 48 0f 10 05 77 81    rex.W movsd 0x28177(%rip),%xmm0
  426b97:       02 00 
  426b99:       f2 48 0f 11 45 b0       rex.W movsd %xmm0,-0x50(%rbp)
  426b9f:       48 be 02 00 00 00 00    movabs $0x2,%rsi
  426ba6:       00 00 00 
  426ba9:       f2 48 0f 10 05 66 81    rex.W movsd 0x28166(%rip),%xmm0
  426bb0:       02 00 
  426bb2:       48 8d 7d c0             lea    -0x40(%rbp),%rdi
  426bb6:       e8 65 d1 00 00          callq  433d20 <_memsetDouble>
  426bbb:       f2 48 0f 10 0d 4c 81    rex.W movsd 0x2814c(%rip),%xmm1
  426bc2:       02 00 
  426bc4:       f2 48 0f 11 4d c0       rex.W movsd %xmm1,-0x40(%rbp)
  426bca:       f2 48 0f 10 15 3d 81    rex.W movsd 0x2813d(%rip),%xmm2
  426bd1:       02 00 
  426bd3:       f2 48 0f 11 55 c8       rex.W movsd %xmm2,-0x38(%rbp)
  426bd9:       48 8d 45 c0             lea    -0x40(%rbp),%rax
  426bdd:       48 89 45 d0             mov    %rax,-0x30(%rbp)
  426be1:       48 8d 55 e0             lea    -0x20(%rbp),%rdx
  426be5:       48 b8 02 00 00 00 00    movabs $0x2,%rax
  426bec:       00 00 00 
  426bef:       48 89 c1                mov    %rax,%rcx
  426bf2:       49 89 d0                mov    %rdx,%r8
  426bf5:       51                      push   %rcx
  426bf6:       41 50                   push   %r8
  426bf8:       48 be 02 00 00 00 00    movabs $0x2,%rsi
  426bff:       00 00 00 
  426c02:       48 bf c0 84 65 00 00    movabs $0x6584c0,%rdi
  426c09:       00 00 00 
  426c0c:       e8 87 ce 00 00          callq  433a98 <_d_arrayliteralTX>
  426c11:       48 89 45 f0             mov    %rax,-0x10(%rbp)
  426c15:       f2 48 0f 10 05 02 81    rex.W movsd 0x28102(%rip),%xmm0
  426c1c:       02 00 
  426c1e:       f2 48 0f 11 00          rex.W movsd %xmm0,(%rax)
  426c23:       f2 48 0f 10 0d f4 80    rex.W movsd 0x280f4(%rip),%xmm1
  426c2a:       02 00 
  426c2c:       48 8b 45 f0             mov    -0x10(%rbp),%rax
  426c30:       f2 48 0f 11 48 08       rex.W movsd %xmm1,0x8(%rax)
  426c36:       48 8b 55 f0             mov    -0x10(%rbp),%rdx
  426c3a:       48 be 02 00 00 00 00    movabs $0x2,%rsi
  426c41:       00 00 00 
  426c44:       41 58                   pop    %r8
  426c46:       59                      pop    %rcx
  426c47:       48 bf 08 00 00 00 00    movabs $0x8,%rdi
  426c4e:       00 00 00 
  426c51:       e8 8e 95 00 00          callq  4301e4 <_d_arraycopy>
  426c56:       f2 0f 12 4d b0          movddup -0x50(%rbp),%xmm1
  426c5b:       66 0f 11 4d d0          movupd %xmm1,-0x30(%rbp)
  426c60:       ff 75 c8                pushq  -0x38(%rbp)
  426c63:       ff 75 c0                pushq  -0x40(%rbp)
  426c66:       e8 09 00 00 00          callq  426c74 <_D3std5stdio16__T7writelnTG2dZ7writelnFG2dZv>
  426c6b:       48 83 c4 10             add    $0x10,%rsp
  426c6f:       31 c0                   xor    %eax,%eax
  426c71:       c9                      leaveq 
  426c72:       c3                      retq   
  426c73:       90                      nop
4

1 に答える 1

3

I looked into it, and apparently the compiler decides that by movupd [aptr], XMM1 you really mean movupd aptr, XMM1. Loading aptr into a register beforehand (mov aptr, RAX; movupd [RAX], XMM1) will make it work.

You should probably file a bug report.

于 2012-08-23T06:05:21.977 に答える