dataproc notes
dataproc notes
copy a GCP bucket contents to GCP intance.
gcloud storage cp -r gs://bucket_name/*
command to run the mapper and reducer in hadoop :
hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -D mapreduce.job.reduces=1 -file *****.py -mapper "python ****.py" -file *****.py -reducer "python *****.py" -input ********* -output **********
-files
gs://bucket/map1.py,gs://databucketwasim/reducer1.py
-input
gs://bucket/words200.txt
-output
gs://bucket/output_wordcount4
-mapper
python3 map1.py
-reducer
python3 reducer1.py