I need to read in a dictionary file to filter content specified in the hdfs_input, and I have uploaded it to the cluster using the put command, but I don't know how to access it in my program.
I tried to access it using path on the cluster like normal files, but it gives the error information: IOError: [Errno 2] No such file or directory
Besides, is there any way to maintain only one copy of the dictionary for all the machines that runs the job ?
So what's the correct way of access files other than the specified input in hadoop jobs?