Huge files in hadoop: how to store metadata? -
i have use case upload tera-bytes of text files sequences files on hdfs.
these text files have several layouts ranging 32 62 columns (metadata).
what way upload these files along metadata:
creating key, value class per text file layout , use create , upload sequence files ?
create sequencefile.metadata header in each file being uploaded sequence file individually ?
any inputs appreciated !
thanks
the simplest thing make keys , values of sequencefiles text. pick meaningful field data make key, data value text. sequencefiles designed storing key/value pairs, if that's not data don't use sequencefile. upload unprocessed text files , input hadoop.
for best performance, not make each file terabytes in size. map stage of hadoop runs 1 job per input file. want have more files have cpu cores in hadoop cluster. otherwise have 1 cpu doing 1 tb of work , lot of idle cpus. file size 64-128mb, best results should measure yourself.
Comments
Post a Comment