I am trying to build a kernel to do parallel string search. To this end I tend to use a finite state machine. The transition table of the fsm is in the kernel argument states. The code:
 __kernel void Find ( __constant char *text,
        const      int offset,
        const      int tlenght,
        __constant char *characters,
        const int clength,
        const int maxlength,
        __constant int *states,
        const int statesdim){
    private char c;
    private int state;
    private const int id = get_global_id(0);
    if (id<(tlenght-maxlength)) {
        private int cIndex,sd,s,k;
        for (int i=0; i<maxlength; i++) {
            c = text[i+offset];
            cIndex = -1;
            for (int j=0; j<clength; j++) {
                if (characters[j]==c) {
                    cIndex = j;
                }       
            }    
            if (cIndex==-1) {
                state = 0;
                break;
            }  else {
                s = states[state+cIndex*statesdim];
            }
            if (state<=0) break;
        }    
    }
}   
If I compile this kernel using iocgui, I get the result:
Using default instruction set architecture.
Intel OpenCL CPU device was found!
Device name: Pentium(R) Dual-Core CPU       T4400  @ 2.20GHz
Device version: OpenCL 1.1 (Build 31360.31426)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
Build started
Kernel <Find> was successfully vectorized
Done.
Build succeeded!
When I change the line where the new state is determined to:
state = states[state+cIndex*statesdim];
The result is:
Using default instruction set architecture.
Intel OpenCL CPU device was found!
Device name: Pentium(R) Dual-Core CPU       T4400  @ 2.20GHz
Device version: OpenCL 1.1 (Build 31360.31426)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
Build started
Kernel <Find> was not vectorized
Done.
Build succeeded!