mapreduce - Hadoop Streaming with very large size of stdout -


i have 2 programs hadoop streaming.

  mapper (produces <k, v> pair)   reducer 

of course, <k, v> pairs emitted stdout.

my question

if v in <k, v> large, run on hadoop efficiently?

i guess v emitted mapper 1g or more (sometimes more 4g).

i think such sizes of value cause problem, because problematic manipulate them in memory. if indeed need such huge values, can put them hdfs , make v name of file. problem should consider in case fact approach no longer functional - have side effect, example failed mapper.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -