mapreduce - Hadoop Streaming with very large size of stdout -
i have 2 programs hadoop streaming.
mapper (produces <k, v> pair) reducer
of course, <k, v>
pairs emitted stdout
.
my question
if v
in <k, v>
large, run on hadoop efficiently?
i guess v
emitted mapper 1g or more (sometimes more 4g).
i think such sizes of value cause problem, because problematic manipulate them in memory. if indeed need such huge values, can put them hdfs , make v name of file. problem should consider in case fact approach no longer functional - have side effect, example failed mapper.
Comments
Post a Comment