Huge files in hadoop: how to store metadata? -


i have use case upload tera-bytes of text files sequences files on hdfs.

these text files have several layouts ranging 32 62 columns (metadata).

what way upload these files along metadata:

  1. creating key, value class per text file layout , use create , upload sequence files ?

  2. create sequencefile.metadata header in each file being uploaded sequence file individually ?

any inputs appreciated !

thanks

the simplest thing make keys , values of sequencefiles text. pick meaningful field data make key, data value text. sequencefiles designed storing key/value pairs, if that's not data don't use sequencefile. upload unprocessed text files , input hadoop.

for best performance, not make each file terabytes in size. map stage of hadoop runs 1 job per input file. want have more files have cpu cores in hadoop cluster. otherwise have 1 cpu doing 1 tb of work , lot of idle cpus. file size 64-128mb, best results should measure yourself.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -