I need to read in a dictionary file to filter content specified in the hdfs_input
, and I have uploaded it to the cluster using the put
command, but I don't know how to access it in my program.
I tried to access it using path on the cluster like normal files, but it gives the error information: IOError: [Errno 2] No such file or directory
Besides, is there any way to maintain only one copy of the dictionary for all the machines that runs the job ?
So what's the correct way of access files other than the specified input
in hadoop jobs?