Example usage¶

Here, we will test drive latentmi by estimating the mutual information between two multivariate Gaussians.

First, we’ll import the necessary packages.

import numpy as np
from latentmi import lmi

import torch

torch.manual_seed(2121)
np.random.seed(2121)

Generating synthetic multivariate Gaussian data¶

To generate two high-dimensional multivariate Gaussians with known MI, we’ll sample from one two dimensional multivariate Gaussian and project each component into 100 dimensions. Then, using the correlation between the two ``intrinsic’’ components, we can analytically determine the MI between the intrinsic dimensions, which is equal to the MI between the high dimensional projections. We’ll use 100 dimensions per variable and a generous $10^4$ samples.

intrinsic = np.random.multivariate_normal([0, 0], cov=[[6, 3], [3, 3.5]], size=10**4)
X_intrinsic = intrinsic[:, [0]]
Y_intrinsic = intrinsic[:, [1]]
                                          
X_proj = np.random.normal(size=(1, 100))
Y_proj = np.random.normal(size=(1, 100))

Xs = X_intrinsic @ X_proj
Ys = Y_intrinsic @ Y_proj

print(Xs.shape)
print(Ys.shape)

(10000, 100)
(10000, 100)

Estimating MI with the LMI approximation¶

Next, we’ll estimate the MI between the two high dimensional variables from the $10^4$ samples. The latent MI approximation involves first learning a low-dimensional representation using neural networks, then applying the Kraskov, Stoegbauer, Grassberger estimator to that learned representation.

The lmi function wraps the whole process into one function call, and returns three things:

Pointwise mutual information estimates (which can be averaged to obtain an MI estimate)
Coordinates of each sample in the low-dimensional representation space
Pytorch object for the representation learning model

By default, the learned representation has 8 dimensions, though this can be increased or decreased as desired. Also, the function defaults to quiet so training progress is not displayed. Many other parameters of the representation learning network and training can be adjusted in the function call – though in practice, we find extensive parameter tuning to be unnecessary.

If we only care about the MI estimate, we can ignore the 2., 3., and simply average the array returned for the first output. By default, the lmi function only estimates MI using validation samples and not training samples, so the pointwise mutual information array will have NaN for each of the samples in the training set. So we have to make sure we take a mean excluding NaN to get the MI estimate. Numpy has a helpful nanmean function which does this.

pmis, embedding, model = lmi.estimate(Xs, Ys, quiet=False,
                                # N_dims=8, validation_split=0.5,...
                                )

epoch 187 (of max 300) 🌻🌻🌻🌻🌻🌻 🎉🎉
success! training stopped at epoch 187
final validation loss: 1.2238998651504516

print(pmis)

[        nan  0.80080795         nan ...         nan  0.11582631
 -0.20801842]

As you can see, there are some NaN in the pointwise mutual information array. If we take the mean excluding NaN, we get our estimate.

print("LMI estimate: %.3f"  % np.nanmean(pmis))

LMI estimate: 0.433

And then we can compare this to the analytically determined ground truth (computed like this).

-0.5*np.log2((1-(3/(np.sqrt(6*3.5)))**2))

0.4036774610288021

Not too bad!