SSE 4.2 perform comparation on two operands of 16 bytes at a time. But it is also possible to compare two operands of 8 bytes at a time with the ordinary processor instructions.
Difference is not so large, to have the special hardvare realization of such comparison. Is SSE 4.2 so irrelevance, or I missed something?