I am trying to build a library to help compute certain numerical computations in a parallel fashion for certain data analytics tasks.
Python's standard multiprocessing module
Summary: works well, computation time comes down drastically compared to serial computation. Uses pickle to serialise the parameters being passed to the function.
Drawback is that the class variables remain at same init state or in the same state before multiprocessing begins.
Pathos' multiprocessing module
Summary: Faster than standard multiprocessing module; uses dill instead of pickle to serialise objects.
Drawback is that the class variables remain at same init state or in the same state before multiprocessing begins.
About to try Ray library (GitHub)
Python's standard multiprocessing module
Summary: works well, computation time comes down drastically compared to serial computation. Uses pickle to serialise the parameters being passed to the function.
Drawback is that the class variables remain at same init state or in the same state before multiprocessing begins.
Pathos' multiprocessing module
Summary: Faster than standard multiprocessing module; uses dill instead of pickle to serialise objects.
Drawback is that the class variables remain at same init state or in the same state before multiprocessing begins.
About to try Ray library (GitHub)