I have implemented a depth peeling algorithm using a GLSL spinlock (inspired by this). In the following visualization, notice how overall the depth peeling algorithm functions correctly (first layer top left, second layer top right, third layer bottom left, fourth layer bottom right). The four depth layers are stored into a single RGBA texture.
Unfortunately, the spinlock sometimes fails to prevent errors--you can see little white speckles, particularly in the fourth layer. There's also one on the wing of the spaceship in the second layer. These speckles vary each frame.
In my GLSL spinlock, when a fragment is to be drawn, the fragment program reads and write a locking value into a separate locking texture atomically, waiting until a 0 shows up, indicating that the lock is open. In practice, I found that the program must be parallel, because if two threads are on the same pixel, the warp cannot continue (one must wait, while the other continues, and all threads in a GPU thread warp must execute simultaneously).
My fragment program looks like this (comments and spacing added):
#version 420 core
//locking texture
layout(r32ui) coherent uniform uimage2D img2D_0;
//data texture, also render target
layout(RGBA32F) coherent uniform image2D img2D_1;
//Inserts "new_data" into "data", a sorted list
vec4 insert(vec4 data, float new_data) {
if (new_data<data.x) return vec4( new_data,data.xyz);
else if (new_data<data.y) return vec4(data.x,new_data,data.yz);
else if (new_data<data.z) return vec4(data.xy,new_data,data.z);
else if (new_data<data.w) return vec4(data.xyz,new_data );
else return data;
}
void main() {
ivec2 coord = ivec2(gl_FragCoord.xy);
//The idea here is to keep looping over a pixel until a value is written.
//By looping over the entire logic, threads in the same warp aren't stalled
//by other waiting threads. The first imageAtomicExchange call sets the
//locking value to 1. If the locking value was already 1, then someone
//else has the lock, and can_write is false. If the locking value was 0,
//then the lock is free, and can_write is true. The depth is then read,
//the new value inserted, but only written if can_write is true (the
//locking texture was free). The second imageAtomicExchange call resets
//the lock back to 0.
bool have_written = false;
while (!have_written) {
bool can_write = (imageAtomicExchange(img2D_0,coord,1u) != 1u);
memoryBarrier();
vec4 depths = imageLoad(img2D_1,coord);
depths = insert(depths,gl_FragCoord.z);
if (can_write) {
imageStore(img2D_1,coord,depths);
have_written = true;
}
memoryBarrier();
imageAtomicExchange(img2D_0,coord,0);
memoryBarrier();
}
discard; //Already wrote to render target with imageStore
}
My question is why this speckling behavior occurs? I want the spinlock to work 100% of the time! Could it relate to my placement of memoryBarrier()?