How do I pass large numpy arrays between python subprocesses without saving to disk? -
is there way pass large chunk of data between 2 python subprocesses without using disk? here's cartoon example of i'm hoping accomplish:
import sys, subprocess, numpy cmdstring = """ import sys, numpy done = false while not done: cmd = raw_input() if cmd == 'done': done = true elif cmd == 'data': ##fake data. in real life, data hardware. data = numpy.zeros(1000000, dtype=numpy.uint8) data.dump('data.pkl') sys.stdout.write('data.pkl' + '\\n') sys.stdout.flush()""" proc = subprocess.popen( #python vs. pythonw on windows? [sys.executable, '-c %s'%cmdstring], stdin=subprocess.pipe, stdout=subprocess.pipe, stderr=subprocess.pipe) in range(3): proc.stdin.write('data\n') print proc.stdout.readline().rstrip() = numpy.load('data.pkl') print a.shape proc.stdin.write('done\n')
this creates subprocess generates numpy array , saves array disk. parent process loads array disk. works!
the problem is, our hardware can generate data 10x faster disk can read/write. there way transfer data 1 python process purely in-memory, maybe without making copy of data? can passing-by-reference?
my first attempt @ transferring data purely in-memory pretty lousy:
import sys, subprocess, numpy cmdstring = """ import sys, numpy done = false while not done: cmd = raw_input() if cmd == 'done': done = true elif cmd == 'data': ##fake data. in real life, data hardware. data = numpy.zeros(1000000, dtype=numpy.uint8) ##note nfg if there's '10' in array: sys.stdout.write(data.tostring() + '\\n') sys.stdout.flush()""" proc = subprocess.popen( #python vs. pythonw on windows? [sys.executable, '-c %s'%cmdstring], stdin=subprocess.pipe, stdout=subprocess.pipe, stderr=subprocess.pipe) in range(3): proc.stdin.write('data\n') = numpy.fromstring(proc.stdout.readline().rstrip(), dtype=numpy.uint8) print a.shape proc.stdin.write('done\n')
this extremely slow (much slower saving disk) , very, fragile. there's got better way!
i'm not married 'subprocess' module, long data-taking process doesn't block parent application. briefly tried 'multiprocessing', without success far.
background: have piece of hardware generates ~2 gb/s of data in series of ctypes buffers. python code handle these buffers has hands full dealing flood of information. want coordinate flow of information several other pieces of hardware running simultaneously in 'master' program, without subprocesses blocking each other. current approach boil data down little bit in subprocess before saving disk, it'd nice pass full monty 'master' process.
while googling around more information code joe kington posted, found numpy-sharedmem package. judging numpy/multiprocessing tutorial seems share same intellectual heritage (maybe largely same authors? -- i'm not sure).
using sharedmem module, can create shared-memory numpy array (awesome!), , use multiprocessing this:
import sharedmem shm import numpy np import multiprocessing mp def worker(q,arr): done = false while not done: cmd = q.get() if cmd == 'done': done = true elif cmd == 'data': ##fake data. in real life, data hardware. rnd=np.random.randint(100) print('rnd={0}'.format(rnd)) arr[:]=rnd q.task_done() if __name__=='__main__': n=10 arr=shm.zeros(n,dtype=np.uint8) q=mp.joinablequeue() proc = mp.process(target=worker, args=[q,arr]) proc.daemon=true proc.start() in range(3): q.put('data') # wait computation finish q.join() print arr.shape print(arr) q.put('done') proc.join()
running yields
rnd=53 (10,) [53 53 53 53 53 53 53 53 53 53] rnd=15 (10,) [15 15 15 15 15 15 15 15 15 15] rnd=87 (10,) [87 87 87 87 87 87 87 87 87 87]
Comments
Post a Comment