Huge files in hadoop: how to store metadata? -

March 15, 2010

i have use case upload tera-bytes of text files sequences files on hdfs.

these text files have several layouts ranging 32 62 columns (metadata).

what way upload these files along metadata:

creating key, value class per text file layout , use create , upload sequence files ?
create sequencefile.metadata header in each file being uploaded sequence file individually ?

any inputs appreciated !

thanks

the simplest thing make keys , values of sequencefiles text. pick meaningful field data make key, data value text. sequencefiles designed storing key/value pairs, if that's not data don't use sequencefile. upload unprocessed text files , input hadoop.

for best performance, not make each file terabytes in size. map stage of hadoop runs 1 job per input file. want have more files have cpu cores in hadoop cluster. otherwise have 1 cpu doing 1 tb of work , lot of idle cpus. file size 64-128mb, best results should measure yourself.

Search This Blog

Assebmley

Huge files in hadoop: how to store metadata? -

Comments

Post a Comment

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -