python - Using cPickle to serialize a large dictionary causes MemoryError -


i'm writing inverted index search engine on collection of documents. right now, i'm storing index dictionary of dictionaries. is, each keyword maps dictionary of docids->positions of occurrence.

the data model looks like: {word : { doc_name : [location_list] } }

building index in memory works fine, when try serialize disk, hit memoryerror. here's code:

# write index out disk serializedindex = open(sys.argv[3], 'wb') cpickle.dump(index, serializedindex, cpickle.highest_protocol) 

right before serialization, program using 50% memory (1.6 gb). make call cpickle, memory usage skyrockets 80% before crashing.

why cpickle using memory serialization? there better way approaching problem?

cpickle needs use bunch of memory because cycle detection. try using marshal module if sure data has no cycles


Comments

Popular posts from this blog

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

apache - Add omitted ? to URLs -

php - How can I stop spam on my custom forum/blog? -