1) It is proper approach to put voice recognition into a service, like it is made in Google api, where callback methods are used to get results. To make it run continously, service must deal with wakelock that will avoid falling in sleep mode. Some more information is provided here Wake locks android service recurring It has one big disadvantage - high battery usage, cause by continuous work of CPU and coninuous computations of incoming sound data. (Can be reduced with filters, thresholds etc.)
2) Voice recognition is not a simple task. It desires huge number of calculation and data to reference to. If input audio is not clear (noise, many human voices etc.), it is harder to get proper output. What can be done to make accuracy better is, filter input audio: noise suppresion, low pass filter etc. You cannot expect 100% accuracy, but 80-95 % can be achieved.
Harder is to filter many human voices. But there can be used some simple amplitude (audio strength level) algorithms with adaptive threshold that decides when word begins and ends. Idea is that the proper voice is the loudest = nearest to phone/device. So according to 4) accuracy is better when user speak close to microphone, because it is the loudest voice.
3) I dont know what you mean by sensor, but there are algorithms to simply detect human voice rather that decode words. These algorithms are called Voice Activity Detection (VAD) Some code should be found in Speex project documentation http://www.speex.org/
Simplest method to handle voice recognition is to use Google Speech api wich is pretty good, and it recognize plenty of languages but need an Internet connection - and it takes a while to get result.
Faster is CMU Sphinx but it has few language models, needs more RAM memory and proccesor computation since all decoding is made on device. In my opininon it very good when dicitionary (words that are revognized) is small like commands (left,right, backward, stop, start, etc).