Your question is one of the classic ones in computer vision and image processing. Many doctoral theses have been written and scores of papers in conferences and journals.
In short direct pixel comparisons will not work in this case. A transformation of some kind is needed to take you to a different feature space. You could do something simple or complex depending on the requirements you have in mind. You could compute edges or corners. One suggestion already mentioned is the FAST corner detection. This would be a good choice as would SIFT etc... There are many others you could use but it will depend on how much the two images can vary and in what ways.
For example, if there is only going to be global color changes, tint, etc the approach would be different than if the images could be rotated or the object position changing in size (i.e. camera zoom).
Strictly speaking for the case you mention features such as FAST, SIFT, or even edges would work reasonably well. Check http://en.wikipedia.org/wiki/Feature_detection_%28computer_vision%29 for more information