mbtr package

Submodules

mbtr.losses module

class mbtr.losses.FourierLoss(lambda_weights: float = 0.1, lambda_leaves: float = 0.1, **loss_kwargs)

Bases: mbtr.losses.Loss

Loss for the Fourier regression:

\[\mathcal{L} = \Vert y - P x\Vert_2^2\]

where \(P\) is the projection matrix:

\[P=\left[\left[\cos \left(k \frac{2 \pi t}{n_{t}}\right)^{T}, \sin \left(k \frac{2 \pi t}{n_{t}}\right)^{T}\right]^{T}\right]_{k \in \mathcal{K}}\]
eval_optimal_loss(G2, H)

Evaluate the optimal loss (using response obtained minimizing the second order loss approximation).

Parameters:
  • G2 – squared sum of gradients in the current leaf
  • H – sum of Hessians diags in the current leaf
Returns:

optimal loss, scalar

eval_optimal_response(G, H)

Evaluate optimal response, given G and H.

Parameters:
  • G – mean gradient for the current leaf.
  • H – mean Hessian for the current leaf.
Returns:

optimal response under second order loss approximation.

get_initial_guess(y)

Return an initial guess for the prediciton. This can be loss-specific.

Parameters:y – target matrix of the training set
Returns:np.ndarray with initial guess
projection_matrix

Return projection matrix for the Fourier coefficient estimation.

Parameters:n – number of observations
Returns:projection matrix P, (2*n_harmonics, n_t) where n_harmonics is the number of harmonics to fit, n_t the

target dimension

required_pars = ['n_harmonics']
set_dimension(n_dim)

Initialize all the properties which depends on the dimension of the target

Parameters:n_dims – dimension of the target
Returns:None
class mbtr.losses.LatentVariable(lambda_weights: float = 0.1, lambda_leaves: float = 0.1, **loss_kwargs)

Bases: mbtr.losses.Loss

Loss for the hierarchical reconciliation problem, in the form:

\[\mathcal{L} = \Vert y - S x\Vert_2^2\]

where \(S\) is the hierarchy matrix. The initial guess is the mean of the last columns of y.

compute_H_inv
compute_fast_H_hat
eval_optimal_loss(yy, H)

Evaluate the optimal loss (using response obtained minimizing the second order loss approximation).

Parameters:
  • G2 – squared sum of gradients in the current leaf
  • H – sum of Hessians diags in the current leaf
Returns:

optimal loss, scalar

eval_optimal_response(G, H)

Evaluate optimal response, given G and H.

Parameters:
  • G – mean gradient for the current leaf.
  • H – mean Hessian for the current leaf.
Returns:

optimal response under second order loss approximation.

get_grad_and_hessian_diags(y, y_hat, iteration, leaves_idx)

Return the loss gradient and loss Hessian’s diagonals based on the current model estimation y_hat and target y matrices. Instead of returning the full Hessian (a 3rd order tensor), the method returns only the Hessian diagonals for each observation, stored in a (n_obs, n_t) matrix. These diagonals are then used by the loss to reconstruct the full Hessian with appropriate dimensions and structure. Currently, full Hessians inferred by data are not supported.

Parameters:
  • y – target matrix (n_obs, n_t)
  • y_hat – current target estimation matrix (n_obs, n_t)
  • iteration – current number of iteration, generally not needed
  • leaves_idx – leaves’ indexes for each observation in y, (n_obs, 1). This is needed for example by mbtr.losses.QuadraticQuantileLoss.
Returns:

grad, hessian_diags tuple, each of which is a (n_obs, n_t) matrix

get_initial_guess(y)

The initial guess is generated from the last columns of the target matrix, as:

\[y_0 = \left( \mathbb{E} y_b \right) S^T\]

where \(\mathbb{E}\) is the expectation (row-mean), \(S\) is the hierarchy matrix, and \(y_b\) stands for the last columns of y, with dimension (n_obs, n_b), where n_b is the number of bottom series.

Parameters:y – target matrix of the training set
Returns:np.ndarray with initial guess
required_pars = ['S', 'precision']
set_dimension(n_dims)

For the latent loss the number of dimensions is equal to the second dimension of the S matrix, and must not be inferred from the target

class mbtr.losses.LinRegLoss(lambda_weights: float = 0.1, lambda_leaves: float = 0.1, **loss_kwargs)

Bases: mbtr.losses.Loss

eval_optimal_loss(G, x)

Evaluate the optimal loss (using response obtained minimizing the second order loss approximation).

Parameters:
  • G – gradient for the current leaf.
  • x – linear regression features for the current leaf.
Returns:

optimal loss, scalar

eval_optimal_response(G, x)

Evaluate optimal response, given G and x. This is done computing a Ridge regression with intercept

\[w = \left(\tilde{x}^T \tilde{x} + \lambda I \right)^{-1} \left(\tilde{x}^T G \right)\]

where \(\tilde{x}\) is the \(x\) matrix augmented with an unitary column and \(\lambda\) is the Ridge coefficient.

Parameters:
  • G – gradient for the current leaf.
  • x – linear regression features for the current leaf.
Returns:

optimal response under second order loss approximation.

set_dimension(n_dim)

Initialize all the properties which depends on the dimension of the target

Parameters:n_dims – dimension of the target
Returns:None
class mbtr.losses.Loss(lambda_weights: float = 0.1, lambda_leaves: float = 0.1, **loss_kwargs)

Bases: object

Loss function class. A loss is defined by its gradient and Hessians. Note that if your specific loss funciton requires some additional argument, you can specify it in the required_pars. Upon instantiation, this list will be used to check if loss_kwargs contains all the needed parameters. Each class inheriting from mbtr.losses.Loss must provide an H_inv method, computing the inverse of the Hessian.

Parameters:
  • lambda_weights – quadratic penalization parameter for the leaves weights
  • lambda_leaves – quadratic penalization parameter for the number of leaves
  • loss_kwargs – additional parameters needed for a specific loss type
H_inv

Computes the inverse of the Hessian, given the Hessian’s diagonal of the current leave. The default implements MSE inverse.

Parameters:H – current leaf Hessian’s diagonal (n_t)
Returns:inv(H), (n_t, n_t)
eval(y, y_hat, trees)

Evaluates the overall loss, which is composed by the tree’s loss plus weights and total leaves penalizations

Parameters:
  • y – observations
  • y_hat – current :class: mbtr.MBT estimations
  • trees – array of fitted trees up to the current iteartion
Returns:

tree loss and regularizations loss tuple, scalars

eval_optimal_loss(G2, H)

Evaluate the optimal loss (using response obtained minimizing the second order loss approximation).

Parameters:
  • G2 – squared sum of gradients in the current leaf
  • H – sum of Hessians diags in the current leaf
Returns:

optimal loss, scalar

eval_optimal_response(G, H)

Evaluate optimal response, given G and H.

Parameters:
  • G – mean gradient for the current leaf.
  • H – mean Hessian for the current leaf.
Returns:

optimal response under second order loss approximation.

get_grad_and_hessian_diags(y, y_hat, iteration, leaves_idx)

Return the loss gradient and loss Hessian’s diagonals based on the current model estimation y_hat and target y matrices. Instead of returning the full Hessian (a 3rd order tensor), the method returns only the Hessian diagonals for each observation, stored in a (n_obs, n_t) matrix. These diagonals are then used by the loss to reconstruct the full Hessian with appropriate dimensions and structure. Currently, full Hessians inferred by data are not supported.

Parameters:
  • y – target matrix (n_obs, n_t)
  • y_hat – current target estimation matrix (n_obs, n_t)
  • iteration – current number of iteration, generally not needed
  • leaves_idx – leaves’ indexes for each observation in y, (n_obs, 1). This is needed for example by mbtr.losses.QuadraticQuantileLoss.
Returns:

grad, hessian_diags tuple, each of which is a (n_obs, n_t) matrix

get_initial_guess(y)

Return an initial guess for the prediciton. This can be loss-specific.

Parameters:y – target matrix of the training set
Returns:np.ndarray with initial guess
required_pars = []
set_dimension(n_dims)

Initialize all the properties which depends on the dimension of the target

Parameters:n_dims – dimension of the target
Returns:None
tree_loss(y, y_hat)

Compute the tree loss (without penalizations)

Parameters:
  • y – observations of the target on the traning set
  • y_hat – current estimation of the MBT
Returns:

tree loss

class mbtr.losses.MSE(lambda_weights: float = 0.1, lambda_leaves: float = 0.1, **loss_kwargs)

Bases: mbtr.losses.Loss

Mean Squared Error loss, a.k.a. L2, Ordinary Least Squares.

\[\mathcal{L} = \Vert y - w\Vert_2^2 + \frac{1}{2} w^T \Lambda w\]

where \(\Lambda\) is the quadratic punishment matrix.

Parameters:
  • lambda_weights – quadratic penalization parameter for the leaves weights
  • lambda_leaves – quadratic penalization parameter for the number of leaves
  • loss_kwargs – additional parameters needed for a specific loss type
eval_optimal_loss(G2, H)

Evaluate the optimal loss (using response obtained minimizing the second order loss approximation).

Parameters:
  • G2 – squared sum of gradients in the current leaf
  • H – sum of Hessians diags in the current leaf
Returns:

optimal loss, scalar

eval_optimal_response(G, H)

Evaluate optimal response, given G and H.

Parameters:
  • G – mean gradient for the current leaf.
  • H – mean Hessian for the current leaf.
Returns:

optimal response under second order loss approximation.

set_dimension(n_dim)

Initialize all the properties which depends on the dimension of the target

Parameters:n_dims – dimension of the target
Returns:None
class mbtr.losses.QuadraticQuantileLoss(lambda_weights: float = 0.1, lambda_leaves: float = 0.1, **loss_kwargs)

Bases: mbtr.losses.Loss

H_inv(H)

Computes the inverse of the Hessian, given the Hessian’s diagonal of the current leave. The default implements MSE inverse.

Parameters:H – current leaf Hessian’s diagonal (n_t)
Returns:inv(H), (n_t, n_t)
exact_response(y)
get_grad_and_hessian_diags(y, y_hat, iteration, leaves_idx)

Return the loss gradient and loss Hessian’s diagonals based on the current model estimation y_hat and target y matrices. Instead of returning the full Hessian (a 3rd order tensor), the method returns only the Hessian diagonals for each observation, stored in a (n_obs, n_t) matrix. These diagonals are then used by the loss to reconstruct the full Hessian with appropriate dimensions and structure. Currently, full Hessians inferred by data are not supported.

Parameters:
  • y – target matrix (n_obs, n_t)
  • y_hat – current target estimation matrix (n_obs, n_t)
  • iteration – current number of iteration, generally not needed
  • leaves_idx – leaves’ indexes for each observation in y, (n_obs, 1). This is needed for example by mbtr.losses.QuadraticQuantileLoss.
Returns:

grad, hessian_diags tuple, each of which is a (n_obs, n_t) matrix

get_initial_guess(y)

The initial guess are the alpha quantiles of the target matrix y.

Parameters:y – target matrix of the training set
Returns:np.ndarray with initial guess
required_pars = ['alphas']
tree_loss(y, y_hat)

Compute the tree loss (without penalizations)

Parameters:
  • y – observations of the target on the traning set
  • y_hat – current estimation of the MBT
Returns:

tree loss

class mbtr.losses.QuantileLoss(lambda_weights: float = 0.1, lambda_leaves: float = 0.1, **loss_kwargs)

Bases: mbtr.losses.Loss

H_inv(H)

Computes the inverse of the Hessian, given the Hessian’s diagonal of the current leave. The default implements MSE inverse.

Parameters:H – current leaf Hessian’s diagonal (n_t)
Returns:inv(H), (n_t, n_t)
exact_response(y)
get_grad_and_hessian_diags(y, y_hat, iteration, leaves_idx)

Return the loss gradient and loss Hessian’s diagonals based on the current model estimation y_hat and target y matrices. Instead of returning the full Hessian (a 3rd order tensor), the method returns only the Hessian diagonals for each observation, stored in a (n_obs, n_t) matrix. These diagonals are then used by the loss to reconstruct the full Hessian with appropriate dimensions and structure. Currently, full Hessians inferred by data are not supported.

Parameters:
  • y – target matrix (n_obs, n_t)
  • y_hat – current target estimation matrix (n_obs, n_t)
  • iteration – current number of iteration, generally not needed
  • leaves_idx – leaves’ indexes for each observation in y, (n_obs, 1). This is needed for example by mbtr.losses.QuadraticQuantileLoss.
Returns:

grad, hessian_diags tuple, each of which is a (n_obs, n_t) matrix

get_initial_guess(y)

The initial guess are the alpha quantiles of the target matrix y.

Parameters:y – target matrix of the training set
Returns:np.ndarray with initial guess
quantile_loss(y, q_hat)

Quantile loss function, a.k.a. pinball loss.

\[ \begin{align}\begin{aligned}\epsilon (y,\hat{q})_{\alpha} &= \hat{q}_{\alpha} - y\\\mathcal{L} (y,\hat{q})_{\alpha} &= \epsilon (y,\hat{q})_{\alpha} \left( I_{\epsilon_{\alpha}\geq 0} -\alpha \right)\end{aligned}\end{align} \]
Parameters:
  • y – observations of the target on the traning set
  • q_hat – current estimation matrix of the quantiles
Returns:

tree loss

required_pars = ['alphas']
class mbtr.losses.TimeSmoother(lambda_weights: float = 0.1, lambda_leaves: float = 0.1, **loss_kwargs)

Bases: mbtr.losses.Loss

Time-smoothing loss function. Penalizes the time-derivative of the predicted signal.

\[\mathcal{L} = \frac{1}{2}\Vert y-w\Vert_2^{2} + \frac{1}{2} w^T \left(\lambda_s D^T D + \lambda I \right) w\]

where \(D\) is the second order difference matrix

\[ \begin{align}\begin{aligned}D=\left[\begin{array}{rrrrrr} 1 & -2 & 1 & & &\\& 1 & -2 & 1 & &\\& & \ddots & \ddots & \ddots &\\& & & 1 & -2 & 1\\& & & & 1 & -2 & 1 \end{array}\right]\end{aligned}\end{align} \]

and \(\lambda_s\) is the coefficient for the quadratic penalization of time-derivatives. Required parameters: lambda_smooth: coefficient for the quadratic penalization of time-derivatives

static build_filter_mat(n)

Build the second order difference matrix

Parameters:n – target dimension
Returns:D, second order difference matrix
compute_fast_H_inv
required_pars = ['lambda_smooth']
set_dimension(n_dim)

Initialize all the properties which depends on the dimension of the target

Parameters:n_dims – dimension of the target
Returns:None
update_smoothing_mat(smoothing_weights)

mbtr.mbtr module

class mbtr.mbtr.MBT(n_boosts: int = 20, early_stopping_rounds: int = 3, learning_rate: float = 0.1, val_ratio: int = 0, n_q: int = 10, min_leaf: int = 100, loss_type: str = 'mse', lambda_weights: float = 0.1, lambda_leaves: float = 0.1, verbose: int = 0, refit=True, **loss_kwargs)

Bases: object

Multivariate Boosted Tree class. Fits a multivariate tree using boosting.

Parameters:
  • n_boosts – maximum number of boosting rounds. Default: 20
  • early_stopping_rounds – if the total loss is non-decreasing after early_stopping_rounds, stop training. The final model is the one which achieved the lowest loss up to the final iteration. Default: 3.
  • learning_rate – in [0, 1]. A learning rate < 1 helps reducing overfitting. Default: 0.1.
  • val_ratio – in [0,1]. If provided, the early stop is triggered by the loss computed on a validation set, randomly extracted from the training set. The length of the validation set is val_ratio * len (training set). Default: 0.
  • n_q – number of quantiles for the split search. Default: 10.
  • min_leaf – minimum number of observations in one leaf. This parameter greatly affect generalization abilities. Default: 100.
  • loss_type

    loss type for choosing the best splits. Currently the following losses are implemented:

    mse: mean squared error loss, a.k.a. L2, ordinary least squares

    time_smoother: mse with an additional penalization on the second order differences of the response function. Requires to pass also lambda_smooth parameter.

    latent_variable: generate response function of dimension n_t from an arbitrary linear combination of n_r responses. This requires to pass also S and precision pars.

    linear_regression: mse with linear response function. Using this loss function, when calling fit and predict methods, one must also pass x_lr as additional argument, which is the matrix of features used to train the linear response inside the leaf (which can be different from the features used to grow the tree, x).

    fourier: mse with linear response function, fitted on the first n_harmonics (where the fundamental has wave-lenght equal to the target output). This requires to pass also the n_harmonics parameter.

    quantile: quantile loss function, a.k.a. pinball loss. This requires to pass also the alphas parameter, a list of quantiles to be fitted.

    quadratic_quantile: quadratic quantile loss function tailored for trees. It has a non-discontinuos derivative. This requires to pass also the alphas parameter, a list of quantiles to be fitted.

  • lambda_weights – coefficient for the quadratic regularization of the response’s parameters. Default: 0.1
  • lambda_leaves – coefficient for the quadratic regularization of the total number of leaves. This is only used when the Tree is used as a weak learner by MBT. Default: 0.1
  • verbose – in {0,1}. If set to 1, the MBT return fitting information at each iteration.
  • refit – if True, if the loss function has an “exact_response” method, use it to refit the tree
  • loss_kwargs – possible additional arguments for the loss function
fit(x, y, do_plot=False, x_lr=None)
Fits an MBT using the features specified in the matrix \(x\in\mathbb{R}^{n_{obs} \times n_{f}}\), in order to predict the targets in the matrix \(y\in\mathbb{R}^{n_{obs} \times n_{t}}\), where \(n_{obs}\) is the number of observations, \(n_{f}\) the number of features and \(n_{t}\) the dimension of the target.
Parameters:
  • x – feature matrix, np.ndarray.
  • y – target matrix, np.ndarray.
  • x_lr – features for fitting the linear response inside the leaves. This is only required if a LinearLoss is being used.
predict(x, n=None, x_lr=None)

Predicts the target based on the feature matrix x (and linear regression features x_lr).

Parameters:
  • x – feature matrix, np.ndarray.
  • n – predict up to the nth fitted tree. If None, predict all the trees. Default: None
  • x_lr – linear regression feature matrix, np.ndarray. Only required if LinearLoss has been used.
Returns:

target’s predictions

class mbtr.mbtr.Tree(n_q: int = 10, min_leaf: int = 100, loss_type: str = 'mse', lambda_weights: float = 0.1, lambda_leaves: float = 0.1, **loss_kwargs)

Bases: object

Tree class. Fits both univarite and multivariate targets. It implements histogram search for the decision of the splitting points.

Parameters:
  • n_q – number of quantiles for the split search
  • min_leaf – minimum number of observations in one leaf. This parameter greatly affect generalization abilities.
  • loss_type

    loss type for choosing the best splits. Currently the following losses are implemented:

    mse: mean squared error loss, a.k.a. L2, ordinary least squares

    time_smoother: mse with an additional penalization on the second order differences of the response function. Requires to pass also lambda_smooth parameter.

    latent_variable: generate response function of dimension n_t from an arbitrary linear combination of n_r responses. This requires to pass also S and precision pars.

    linear_regression: mse with linear response function. Using this loss function, when calling fit and predict methods, one must also pass x_lr as additional argument, which is the matrix of features used to train the linear response inside the leaf (which can be different from the features used to grow the tree, x).

    fourier: mse with linear response function, fitted on the first n_harmonics (where the fundamental has wave-lenght equal to the target output). This requires to pass also the n_harmonics parameter.

    quantile: quantile loss function, a.k.a. pinball loss. This requires to pass also the alphas parameter, a list of quantiles to be fitted.

    quadratic_quantile: quadratic quantile loss function tailored for trees. It has a non-discontinuos derivative. This requires to pass also the alphas parameter, a list of quantiles to be fitted.

  • lambda_weights – coefficient for the quadratic regularization of the response’s parameters
  • lambda_leaves – coefficient for the quadratic regularization of the total number of leaves. This is only used

when the Tree is used as a weak learner by MBT. :param loss_kwargs: possible additional arguments for the loss function

compute_loss(G2_left, G2_right, H_left, H_right, j)
fit(x, y, hessian=None, learning_rate=1.0, x_lr=None)

Fits a tree using the features specified in the matrix \(x\in\mathbb{R}^{n_{obs} \times n_{f}}\), in order to predict the targets in the matrix \(y\in\mathbb{R}^{n_{obs} \times n_{t}}\), where \(n_{obs}\) is the number of observations, \(n_{f}\) the number of features and \(n_{t}\) the dimension of the target.

Parameters:
  • x – feature matrix, np.ndarray.
  • y – target matrix, np.ndarray.
  • hessian – diagonals of the hessians \(\in\mathbb{R}^{n_{obs} \times n_{t}}\). If None, each entry is set equal to one (this will result in the default behaviour under MSE loss). Default: None
  • learning_rate – learning rate used by the MBT instance. Default: 1
  • x_lr – features for fitting the linear response inside the leaves. This is only required if a LinearLoss is being used.
predict(x, x_lr=None)

Predicts the target based on the feature matrix x (and linear regression features x_lr).

param: x: feature matrix, np.ndarray. param: x_lr: linear regression feature matrix, np.ndarray.

Returns:target’s predictions
mbtr.mbtr.bin_sums
mbtr.mbtr.leaf_stats

mbtr.utils module

class mbtr.utils.LightGBMMISO(n_estimators, lgb_pars=None)

Bases: object

fit(x, y)
predict(x)
mbtr.utils.check_pars(required_pars, **kwargs)
mbtr.utils.download_dataset()
mbtr.utils.load_dataset()
mbtr.utils.set_figure(size, subplots=(1, 1), context='paper', style='darkgrid', font_scale=1, l=0.2, w=0.1, h=0.1, b=0.1)

Module contents