MultiprocessingΒΆ
Conducting extensive data studies based on the HyperStudy
or
ChangepointStudy
classes may involve several 10.000 or 100.000
individual fits (see e.g.
here).
Since these individual fits with different hyper-parameter values are
independent of each other, the computational workload may be distributed
among the individual cores of a multi-core processor. To keep things
simple, bayesloop uses object
serialization to
create duplicates of the current HyperStudy
or ChangepointStudy
instance and distributes them across the predefined number of cores. In
general, this procedure may be handled by the built-in Python module
multiprocessing.
However, multiprocessing relies on the built-in module
pickle for object
serialization, which fails to serialize the classes defined in
bayesloop. We therefore use a different version of the
multiprocessing module that is part of the
pathos module.
The latest version of pathos can be installed directly via pip, but requires git:
pip install git+https://github.com/uqfoundation/pathos
Note: Windows users need to install a C compiler before installing pathos. One possible solution for 64-bit systems is to install Microsoft Visual C++ 2008 SP1 Redistributable Package (x64) and Microsoft Visual C++ Compiler for Python 2.7.
Once installed correctly, the number of cores to use in a hyper-study or
change-point study can be specified by using the keyword argument
nJobs
within the fit
method. Example:
S.fit(silent=True, nJobs=4)