mapreduce - Hadoop Streaming with very large size of stdout -
i have 2 programs hadoop streaming.
mapper (produces <k, v> pair) reducer of course, <k, v> pairs emitted stdout.
my question
if v in <k, v> large, run on hadoop efficiently?
i guess v emitted mapper 1g or more (sometimes more 4g).
i think such sizes of value cause problem, because problematic manipulate them in memory. if indeed need such huge values, can put them hdfs , make v name of file. problem should consider in case fact approach no longer functional - have side effect, example failed mapper.
Comments
Post a Comment