First steps with bayesloop

bayesloop models feature a two-level hierarchical structure: the low-level, observation model filters out measurement noise and provides the parameters, that one is interested in (volatility of stock prices, diffusion coefficient of particles, directional persistence of migrating cancer cells, rate of randomly occurring events, ...). The observation model is, in most cases, given by a simple and well-known stochastic process: price changes are Gaussian-distributed, turning angles of moving cells follow a von-Mises distribution and the number of rare events within a given interval of time is Poisson-distributed. The aim of the observation model is to describe the measured data on a short time scale, while the parameters may change on longer time scales. The high-level, transition model describes how the parameters of the observation model change over time, i.e. whether there are abrupt parameter jumps or gradual variations. The transition model may itself depend on so-called hyper-parameters, for example the likelihood of parameter jumps, the magnitude of gradual parameter variations or the slope of a deterministic linear trend. The following tutorials show how to use the bayesloop module to infer both time-varying parameter values of the observation model as well as the hyper-parameter values of the transition model and compare different hypotheses about the parameter dynamics by approximating the model evidence, i.e. the probability of the measured data, given the observation model and transition model.

The first section of the tutorial introduces the main class of the module, Study, which enables fits of time-varying parameter models with fixed hyper-parameter values and the optimization of such hyper-parameters based on the model evidence. We provide a detailed description of how to import data, set the observation model and transition model, and perform the model fit. Finally, a plotting function to display the results is discussed briefly. This tutorial therefore provides the basis for later tutorials that discuss the extended classes HyperStudy, ChangepointStudy and OnlineStudy.

Study class

To start a new data study/analysis, create a new instance of the Study class:

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt # plotting
import seaborn as sns           # nicer plots
sns.set_style('whitegrid')      # plot styling

import bayesloop as bl

S = bl.Study()
+ Created new study.

This object is central to an analysis conducted with bayesloop. It stores the data and further provides the methods to perform probabilistic inference on the models defined within the class, as described below.

Data import

In this first study, we use a simple, yet instructive example of heterogeneous time series, the annual number of coal mining accidents in the UK from 1851 to 1962. The data is imported as a NumPy array, together with corresponding timestamps. Note that setting timestamps is optional (if none are provided, timestamps are set to an integer sequence: 0, 1, 2,...).

In [2]:
import numpy as np

data = np.array([5, 4, 1, 0, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6, 3, 3, 5, 4, 5, 3, 1, 4,
                 4, 1, 5, 5, 3, 4, 2, 5, 2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1, 1, 1, 3, 0,
                 0, 1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0,
                 0, 2, 1, 0, 0, 0, 1, 1, 0, 2, 3, 3, 1, 1, 2, 1, 1, 1, 1, 2, 3, 3, 0,
                 0, 0, 1, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0])

S.loadData(data, timestamps=np.arange(1852, 1962))
+ Successfully imported array.

Note that this particular data set is also hard-coded into the Study class, for convenient testing:

In [3]:
S.loadExampleData()
+ Successfully imported example data.

In case you have multiple observations for each time step, you may also provide the data in the form np.array([[x1,y1,z1], [x2,y2,z2], ..., [xn,yn,zn]]). Missing data points should be included as np.nan.

Observation model

The first step to create a probabilistic model to explain the data is to define the observation model, or likelihood. The observation model states the probability (density) of a data point at time \(t\), given the parameter values at time \(t\) and possibly past data points. It therefore resembles the low-level model, in contrast to the transition model which describes how the parameters of the observation model change over time.

As coal mining disasters fortunately are rare events, we may model the number of accidents per year by a Poisson distribution. In bayesloop, this is done as follows:

In [4]:
L = bl.observationModels.Poisson('accident rate', bl.oint(0, 6, 1000))
S.setObservationModel(L)
+ Observation model: Poisson. Parameter(s): ['accident rate']

We first define the observation model and provide two arguments: A name for the only parameter of the model, the 'accident rate'. We further have to provide discrete values for this parameter, as bayesloop computes all probability distributions on grids. As the Poisson distribution expects its parameter to be greater than zero, we choose an open interval between 0 and 6 with 1000 equally spaced values in between, by using the function bl.oint(). For closed intervals, one can also use bl.cint(), which acts exactly like the function linspace from NumPy. To avoid singularities in the probability values of the observation model, it is however recommended to use bl.oint() in most cases. Finally, we assign the defined observation model to our study instance with the method setObservationModel().

As the parameter boundaries depend on the data at hand, bayesloop will estimate appropriate parameter values, if one does not provide them:

In [5]:
L = bl.observationModels.Poisson('accident rate')
S.setObservationModel(L)
+ Estimated parameter interval for "accident rate": [0.00749250749251, 7.49250749251] (1000 values).
+ Observation model: Poisson. Parameter(s): ['accident rate']

Note that you can also use the following short form to define observation models: L = bl.om.Poisson(). All currently implemented observation models can be looked up in the API Docs or directly in observationModels.py. bayesloop further supports all probability distributions that are included in the scipy.stats as well as the sympy.stats module. See this tutorial for instructions on how to build custom observation models from arbitrary distributions.

In this example, the observation model only features a single parameter. If we wanted to model the annual number of accidents with a Gaussian distribution instead, we have to supply two parameter names (mean and std) and corresponding values:

L = bl.om.Gaussian('mean', bl.cint(0, 6, 200), 'std', bl.oint(0, 2, 200))
S.setObservationModel(L)

Again, if we are not sure about parameter boundaries, we may assign None to one or all parameters, and bayesloop will estimate them:

L = bl.om.Gaussian('mean', None, 'std', bl.oint(0, 2, 200))
S.setObservationModel(L)

The order has to remain Name, Value, Name, Value, ..., which is why we cannot simply omit the values and have to write None instead.

Transition model

As the dynamics of many real-world systems are the result of a multitude of underlying processes that act on different spatial and time scales, common statistical models with static parameters often miss important aspects of the systems’ dynamics (see e.g. this article). bayesloop therefore calls for a second model, the transition model, which describes the temporal changes of the model parameters.

In this example, we assume that the accident rate itself may change gradually over time and choose a Gaussian random walk with the standard deviation \(\sigma=0.2\) as transition model. As for the observation model, we supply a unique name for hyper-parameter \(\sigma\) (named sigma) that describes the standard deviation of the parameter fluctuations and therefore the magnitude of changes. Again, we have to assign values for sigma, but only choose a single fixed value of 0.2, instead of a whole set of values. This single value can be optimized, by maximizing the model evidence, see here. To analyze and compare a set of different values, one may use an instance of a HyperStudy that is described in detail here. in this first example, we simply take the value of 0.2 as given. As the observation model may contain several parameters, we further have specify the parameter accident rate as the target of this transition model.

In [6]:
T = bl.transitionModels.GaussianRandomWalk('sigma', 0.2, target='accident rate')
S.setTransitionModel(T)
+ Transition model: Gaussian random walk. Hyper-Parameter(s): ['sigma']

Note that you can also use the following short form to define transition models: M = bl.tm.GaussianRandomWalk(). All currently implemented transition models can be looked up in the API Docs or directly in transitionModels.py.

Model fit

At this point, the hierarchical time series model for the coal mining data set is properly defined and we may continue to perform the model fit. bayesloop employs a forward-backward algorithm that is based on Hidden Markov models. It basically breaks down the high-dimensional inference problem of all time steps into many low-dimensional ones for each individual time step. The inference algorithm is implemented by the fit method:

In [7]:
S.fit()
+ Started new fit:
    + Formatted data.
    + Set prior (function): jeffreys. Values have been re-normalized.

    + Finished forward pass.
    + Log10-evidence: -74.63801

    + Finished backward pass.
    + Computed mean parameter values.

By default, fit computes the so-called smoothing distribution of the model parameters for each time step. This distribution states the probability (density) of the parameter value at a time step \(t\), given all past and future data points. All distributions have the same shape as the parameter grid, and are stored in S.posteriorSequence for further analysis. Additionally, the mean values of each distribution are stored in S.posteriorMeanValues, as point estimates. Finally, the (natural) logarithmic value of the model evidence, the probability of the data given the chosen model, is stored in S.logEvidence (more details on evidence values follow).

To simulate an on-line analysis, where at each step in time \(t\), only past data points are available, one may provide the keyword-argument forwardOnly=True. In this case, only the forward-part of the algorithm in run. The resulting parameter distributions are called filtering distributions.

Plotting

To display the temporal evolution of the model parameters, the Study class provides the plot method plotParameterEvolution that displays the mean values together with the marginal distributions for one parameter of the model. The parameter to be plotted can be chosen by providing its name.

Here, we plot the original data (in red) together with the inferred disaster rate (mean value in black). The marginal parameter distribution is displayed as a blue overlay, by default with a gamma correction of \(\gamma=0.5\) to enhance relative differences in the width of the distribution (this behavior can be changed by the keyword argument gamma):

In [8]:
plt.figure(figsize=(8, 4))

# plot of raw data
plt.bar(S.rawTimestamps, S.rawData, align='center', facecolor='r', alpha=.5)

# parameter plot
S.plotParameterEvolution('accident rate')

plt.xlim([1851, 1961])
plt.xlabel('year');
../_images/tutorials_firststeps_15_0.png

From this first analysis, we may conclude that before 1880, an average of \(\approx 3\) accidents per year were recorded. This changes significantly between 1880 and 1900, when the accident-rate drops to \(\approx 1\) per year. We can also directly inspect the distribution of the accident rate at specific points in time, using the getParameterDistribution() method:

In [9]:
plt.figure(figsize=(8, 4))

r1, p1 = S.getParameterDistribution(1880, 'accident rate', plot=True, facecolor='r', alpha=0.5, label='1880')
r2, p2 = S.getParameterDistribution(1900, 'accident rate', plot=True, facecolor='b', alpha=0.5, label='1900')

plt.legend()
plt.xlim([0, 5]);
../_images/tutorials_firststeps_17_0.png

Without the plot=True argument, this method only returns the parameter values (r1, r2, as specified when setting the observation model) as well as the corresponding probability values p1 and p2. Note that the returned probability values are always normalized to 1, so that we may easily evaluate the probability of certain conditions, like the probability of an accident rate < 1 in the year 1900:

In [10]:
np.sum(p2[r2 < 1.])
Out[10]:
0.4219825605704649

Here, we first created a mask for all parameter values smaller than one (r2 < 1.), then extracted the corresponding probability values (p2[r2 < 1.]) and finally evaluated the sum to compute p(accident rate < 1) = 42%.

Saving studies

As the Study class instance (above denoted by S) of a conducted analysis contains all information about the inferred parameter values, it may be convenient to store the entire instance S to file. This way, it can be loaded again later, for example to refine the study, create different plots or perform further analyses based on the obtained results. bayesloop provides two functions, bl.save() and bl.load() to store and retrieve existing studies:

bl.save('file.bl', S)

...

S = bl.load('file.bl)