Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Multitask-ExactGPs seem to not use mBCG algorithm as Singletask-ExactGPs do #2523

Open
OliverAh opened this issue May 7, 2024 · 0 comments
Labels

Comments

@OliverAh
Copy link

OliverAh commented May 7, 2024

🐛 Bug

Might also explain #2306. Potential bug, but since the contributing guideline asks for specification of the issue I declared it as a bug right away. Please feel free to change, if it is not correct.

Using the gpytorch.settings.verbose_linalg(state=True) context revealed, that switching from single- to multi-task GPs, the linalg output changed from "CG" to "symeig". This is unexpected for me, because from the paper gardner2018gpytorch I do not see a reason why the "mBCG" algorithm (which I assume is called "CG" by the output shown below) should not be applicable in the multi-task case. Of course I could be missing that point, in this case please be so kind and point me to that.

To reproduce

Primarily, I adapted the GPyTorch Regression Tutorial (GPU) from the documentation.
I wanted to make it convenient to easily switch back and forth between single- and multi-task GPs, shapes of tensors, and CPU/GPU, so I wrapped the code to reproduce in a function run_gpytorch(...), which is called with desired kwargs. The signature is

run_gpytorch(dims_in:int, dims_out:int, num_samples:int, device:str={'cpu''gpu'})
  • num_samples:int specifies the number of data points that are generated for training the GP, as I came across gpytorch.settings.max_cholesky_size(value) in the documentation, and num_samples allows to easily change the size of the matrices
  • dims_in:int specifies the dimensionality of the inputs to the GP, one might also call it number of features of the input data
  • dims_out:int specifies the dimensionality of the outputs from the GP. This also is equal to num_tasks in the gpytorch.means.MultitaskMean, ...MultitaskKernel, and ...MultitaskGaussianLikelihood classes.
  • device:str specifies whether the GP should be trained and tested using CPU or GPU

** Code snippet to reproduce **

import torch
import gpytorch
import numpy as np
def run_gpytorch(dims_in:int, dims_out:int, num_samples:int, device:str={'cpu''gpu'}):
    # Set context for mBCG and linalg debugging
    with gpytorch.settings.verbose_linalg(state=True) \
        ,gpytorch.settings.fast_computations(covar_root_decomposition=True, log_prob=True, solves=True):
        
        # Generate inputs for training and testing
        samples_train = np.linspace(start=[0.]*dims_in, stop=[1.]*dims_in, num=num_samples)
        samples_test = np.linspace(start=[0.]*dims_in, stop=[1.]*dims_in, num=np.floor(num_samples/2.67).astype(int))
        
        # For dims_in=1 reshaping is necessary, because gpytorch.models.ExactGP expects inputs as 1-D arrays (n,) [not (n,1)]
        if dims_in == 1:
            samples_train = samples_train.reshape(-1,)
            samples_test = samples_test.reshape(-1,)
        train_x = torch.tensor(samples_train).to(torch.float)
        test_x = torch.tensor(samples_test).to(torch.float)

        # Generate outputs for training (that the model should learn to predict)
        if (dims_in>1) and (dims_out>1):
            train_y = torch.stack([torch.sin(2*torch.pi*train_x[:,0])] * dims_out , 1).to(torch.float)
        elif (dims_in==1) and (dims_out>1):
            train_y = torch.stack([torch.sin(2*torch.pi*train_x     )] * dims_out , 1).to(torch.float)
        elif (dims_in>1) and (dims_out==1):
            train_y =              torch.sin(2*torch.pi*train_x[:,0])                 .to(torch.float)
        elif (dims_in==1) and (dims_out==1):
            train_y =              torch.sin(2*torch.pi*train_x     )                 .to(torch.float)
        
        print(f'Shape of train_x: {train_x.shape}')
        print(f'Shape of test_x:  {test_x.shape}')
        print(f'Shape of train_y: {train_y.shape}' + '\n')
        
        # Define class for single-/multitask GP model
        if dims_out == 1:
            print(f'Using single-task GP as dims_out = {dims_out}' + '\n')
            class ExactGPModel(gpytorch.models.ExactGP):
                def __init__(self, train_x, train_y, likelihood):
                    super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
                    self.mean_module = gpytorch.means.ConstantMean()
                    self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

                def forward(self, x):
                    mean_x = self.mean_module(x)
                    covar_x = self.covar_module(x)
                    return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

            # Instantiate single-task likelihood and GP model
            likelihood = gpytorch.likelihoods.GaussianLikelihood()
            model = ExactGPModel(train_x, train_y, likelihood)

        else:
            print(f'Using multi-task GP as dims_out = {dims_out}' + '\n')
            class MultitaskGPModel(gpytorch.models.ExactGP):
                def __init__(self, train_x, train_y, likelihood):
                    super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
                    self.mean_module = gpytorch.means.MultitaskMean(
                        gpytorch.means.ConstantMean(), num_tasks=dims_out
                    )
                    self.covar_module = gpytorch.kernels.MultitaskKernel(
                        gpytorch.kernels.RBFKernel(), num_tasks=dims_out
                    )

                def forward(self, x):
                    mean_x = self.mean_module(x)
                    covar_x = self.covar_module(x)
                    return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)

            # Instantiate multi-task likelihood and GP model 
            likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=dims_out)
            model = MultitaskGPModel(train_x, train_y, likelihood)

        # Move data, GP model, and likelihood to gpu if desired
        if device == 'gpu':
            print('Move all structures to GPU since device=gpu' + '\n')
            train_x = train_x.cuda()
            test_x = test_x.cuda()
            train_y = train_y.cuda()
            model = model.cuda()
            likelihood = likelihood.cuda()
        else:
            print('Do not move structures to GPU since device=cpu' + '\n')
        
        # Switch to training mode
        model.train()
        likelihood.train()
        
        # Define optimizer and loss function
        optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
        mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

        # Train GP using training data
        print('Start training')
        training_iterations = 1
        for i in range(training_iterations):
            optimizer.zero_grad()
            output = model(train_x)
            loss = -mll(output, train_y)
            loss.backward()
            optimizer.step()
            print('  Finished training iteration %i/%i' % (i + 1, training_iterations))
        print('Finished training' + '\n')

        # Switch to evaluation mode, and probe trained GP using testing data
        model.eval()
        likelihood.eval()
        print('Start testing')
        with torch.no_grad():
            observed_model = model(test_x)
            print('  Testing: Finished evaluation')
            observed_pred = likelihood(observed_model)
            print('  Testing: Finished likelihood')
            mean = observed_pred.mean
            lower, upper = observed_pred.confidence_region()
        print('Finished testing' + '\n')

        return
# Create dict of inputs to run_gpytorch(...)
kwargs_profile = {'dims_in':1, 'dims_out':2, 'num_samples':1000, 'device':'gpu'}
run_gpytorch(**kwargs_profile)

Outputs

The code above generates the following output, where the usage of ... symeig ... is revealed:

Shape of train_x: torch.Size([1000])
Shape of test_x:  torch.Size([374])
Shape of train_y: torch.Size([1000, 2])

Using multi-task GP as dims_out = 2

Move all structures to GPU since device=gpu

LinAlg (Verbose) - DEBUG - Running symeig on a matrix of size torch.Size([1000, 1000]).
Start training
LinAlg (Verbose) - DEBUG - Running symeig on a matrix of size torch.Size([2, 2]).
LinAlg (Verbose) - DEBUG - Running symeig on a matrix of size torch.Size([1000, 1000]).
LinAlg (Verbose) - DEBUG - Running symeig on a matrix of size torch.Size([2, 2]).
LinAlg (Verbose) - DEBUG - Running symeig on a matrix of size torch.Size([1000, 1000]).
LinAlg (Verbose) - DEBUG - Running symeig on a matrix of size torch.Size([2, 2]).
c:\Users\{user}\micromamba\envs\gpytorch_mwe\Lib\site-packages\linear_operator\utils\interpolation.py:71: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at C:\cb\pytorch_1000000000000\work\torch\csrc\tensor\python_tensor.cpp:80.)
  summing_matrix = cls(summing_matrix_indices, summing_matrix_values, size)
c:\Users\{user}\micromamba\envs\gpytorch_mwe\Lib\site-packages\linear_operator\utils\interpolation.py:71: UserWarning: torch.sparse.SparseTensor(indices, values, shape, *, device=) is deprecated.  Please use torch.sparse_coo_tensor(indices, values, shape, dtype=, device=). (Triggered internally at C:\cb\pytorch_1000000000000\work\torch\csrc\utils\tensor_new.cpp:623.)
  summing_matrix = cls(summing_matrix_indices, summing_matrix_values, size)
LinAlg (Verbose) - DEBUG - Running symeig on a matrix of size torch.Size([1000, 1000]).
  Finished training iteration 1/1
Finished training

Start testing
LinAlg (Verbose) - DEBUG - Running symeig on a matrix of size torch.Size([2, 2]).
  Testing: Finished evaluation
  Testing: Finished likelihood
Finished testing

Expected Behavior

I expected the lines in the output starting with LinAlg (Verbose) ... to say something like
LinAlg (Verbose) - DEBUG - Running CG ... instead of
LinAlg (Verbose) - DEBUG - Running symeig...

This is because testing a single-task GP, run by (notice dims_out changed to 1)

kwargs_profile = {'dims_in':1, 'dims_out':1, 'num_samples':1000, 'device':'gpu'}
run_gpytorch(**kwargs_profile)

generates the following output, revealing ... CG ...:

Shape of train_x: torch.Size([1000])
Shape of test_x:  torch.Size([374])
Shape of train_y: torch.Size([1000])

Using single-task GP as dims_out = 1

Move all structures to GPU since device=gpu

LinAlg (Verbose) - DEBUG - Running CG on a torch.Size([1000, 11]) RHS for 1000 iterations (tol=1). Output: torch.Size([1000, 11]).
Start training
LinAlg (Verbose) - DEBUG - Running symeig on a matrix of size torch.Size([10, 11, 11]).
LinAlg (Verbose) - DEBUG - Running CG on a torch.Size([1000, 1]) RHS for 1000 iterations (tol=0.01). Output: torch.Size([1000, 1]).
LinAlg (Verbose) - DEBUG - Running CG on a torch.Size([1000, 374]) RHS for 1000 iterations (tol=0.01). Output: torch.Size([1000, 374]).
  Finished training iteration 1/1
Finished training

Start testing
  Testing: Finished evaluation
  Testing: Finished likelihood
Finished testing

System information

  • GPyTorch Version (run print(gpytorch.__version__)) --> 1.11
  • PyTorch Version (run print(torch.__version__)) --> 2.3.0
  • Python Version (run print(sys.__version__)) --> 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:20:11) [MSC v.1938 64 bit (AMD64)]
  • micromamba Version (run micromamba info) --> libmamba version: 1.5.8
  • VSCode Version (Help >> About ) --> Version: 1.89.0
  • Computer OS --> Windows 11

Additional context

I was profiling GPyTorch the other day (using cProfiler), and noticed, that in the multi-task GPs the linalg solver called by GPyTorch was torch._C._linalg.linalg_eigh. That lead to the investigation above. If you are interested in that, I can also provide the profiling information.
I am using a jupyter notebook inside VSCode and a mamba environment created by the following yaml file, using micromamba create -f .\{file_name}.yml which resulted in the specs below, using micromamba env export > {other_file_name}.yml.

name: gpytorch_mwe
channels:
  - conda-forge
  - pytorch
  - nvidia
  - gpytorch
dependencies:
  - python
  - pytorch 
  - torchvision
  - torchaudio
  - pytorch-cuda==12.1
  - gpytorch::gpytorch
  - pandas
  - matplotlib
  - ipykernel
name: gpytorch_mwe
channels:
- conda-forge
- gpytorch
- nvidia
- pytorch
dependencies:
- asttokens=2.4.1=pyhd8ed1ab_0
- blas=1.0=mkl
- brotli=1.1.0=hcfcfb64_1
- brotli-bin=1.1.0=hcfcfb64_1
- brotli-python=1.1.0=py312h53d5487_1
- bzip2=1.0.8=hcfcfb64_5
- ca-certificates=2024.2.2=h56e8100_0
- cccl=2.3.1=h84bb9a4_0
- certifi=2024.2.2=pyhd8ed1ab_0
- charset-normalizer=3.3.2=pyhd8ed1ab_0
- colorama=0.4.6=pyhd8ed1ab_0
- comm=0.2.2=pyhd8ed1ab_0
- contourpy=1.2.1=py312h0d7def4_0
- cuda-cccl=12.4.127=h57928b3_1
- cuda-cccl_win-64=12.4.127=h57928b3_1
- cuda-cudart=12.1.105=0
- cuda-cudart-dev=12.1.105=0
- cuda-cupti=12.1.105=0
- cuda-libraries=12.1.0=0
- cuda-libraries-dev=12.1.0=0
- cuda-nvrtc=12.1.105=0
- cuda-nvrtc-dev=12.1.105=0
- cuda-nvtx=12.1.105=0
- cuda-opencl=12.4.127=h63175ca_0
- cuda-opencl-dev=12.4.127=h63175ca_0
- cuda-profiler-api=12.4.127=h57928b3_1
- cuda-runtime=12.1.0=0
- cuda-version=12.4=h3060b56_3
- cycler=0.12.1=pyhd8ed1ab_0
- debugpy=1.8.1=py312h53d5487_0
- decorator=5.1.1=pyhd8ed1ab_0
- exceptiongroup=1.2.0=pyhd8ed1ab_2
- executing=2.0.1=pyhd8ed1ab_0
- filelock=3.14.0=pyhd8ed1ab_0
- fonttools=4.51.0=py312he70551f_0
- freetype=2.12.1=hdaf720e_2
- gettext=0.22.5=h5728263_2
- gettext-tools=0.22.5=h7d00a51_2
- glib=2.80.0=h39d0aa6_6
- glib-tools=2.80.0=h0a98069_6
- gpytorch=1.11=0
- gst-plugins-base=1.24.1=h001b923_1
- gstreamer=1.24.1=hb4038d2_1
- icu=73.2=h63175ca_0
- idna=3.7=pyhd8ed1ab_0
- importlib-metadata=7.1.0=pyha770c72_0
- importlib_metadata=7.1.0=hd8ed1ab_0
- intel-openmp=2024.1.0=h57928b3_965
- ipykernel=6.29.3=pyha63f2e9_0
- ipython=8.22.2=pyh7428d3b_0
- jaxtyping=0.2.28=pyhd8ed1ab_0
- jedi=0.19.1=pyhd8ed1ab_0
- jinja2=3.1.3=pyhd8ed1ab_0
- joblib=1.4.0=pyhd8ed1ab_0
- jupyter_client=8.6.1=pyhd8ed1ab_0
- jupyter_core=5.7.2=py312h2e8e312_0
- khronos-opencl-icd-loader=2023.04.17=h64bf75a_0
- kiwisolver=1.4.5=py312h0d7def4_1
- krb5=1.21.2=heb0366b_0
- lcms2=2.16=h67d730c_0
- lerc=4.0.0=h63175ca_0
- libasprintf=0.22.5=h5728263_2
- libasprintf-devel=0.22.5=h5728263_2
- libblas=3.9.0=1_h8933c1f_netlib
- libbrotlicommon=1.1.0=hcfcfb64_1
- libbrotlidec=1.1.0=hcfcfb64_1
- libbrotlienc=1.1.0=hcfcfb64_1
- libcblas=3.9.0=5_hd5c7e75_netlib
- libclang13=18.1.3=default_hf64faad_0
- libcublas=12.1.0.26=0
- libcublas-dev=12.1.0.26=0
- libcufft=11.0.2.4=0
- libcufft-dev=11.0.2.4=0
- libcurand=10.3.5.147=h63175ca_1
- libcurand-dev=10.3.5.147=h63175ca_1
- libcusolver=11.4.4.55=0
- libcusolver-dev=11.4.4.55=0
- libcusparse=12.0.2.55=0
- libcusparse-dev=12.0.2.55=0
- libdeflate=1.20=hcfcfb64_0
- libexpat=2.6.2=h63175ca_0
- libffi=3.4.2=h8ffe710_5
- libgettextpo=0.22.5=h5728263_2
- libgettextpo-devel=0.22.5=h5728263_2
- libglib=2.80.0=h39d0aa6_6
- libhwloc=2.10.0=default_h2fffb23_1000
- libiconv=1.17=hcfcfb64_2
- libintl=0.22.5=h5728263_2
- libintl-devel=0.22.5=h5728263_2
- libjpeg-turbo=3.0.0=hcfcfb64_1
- liblapack=3.9.0=5_hd5c7e75_netlib
- libnpp=12.0.2.50=0
- libnpp-dev=12.0.2.50=0
- libnvjitlink=12.1.105=0
- libnvjitlink-dev=12.1.105=0
- libnvjpeg=12.1.1.14=0
- libnvjpeg-dev=12.1.1.14=0
- libogg=1.3.4=h8ffe710_1
- libpng=1.6.43=h19919ed_0
- libsodium=1.0.18=h8d14728_1
- libsqlite=3.45.3=hcfcfb64_0
- libtiff=4.6.0=hddb2be6_3
- libuv=1.48.0=hcfcfb64_0
- libvorbis=1.3.7=h0e60522_0
- libwebp-base=1.4.0=hcfcfb64_0
- libxcb=1.15=hcd874cb_0
- libxml2=2.12.6=hc3477c8_2
- libzlib=1.2.13=hcfcfb64_5
- linear_operator=0.5.2=pyhd8ed1ab_0
- m2w64-gcc-libgfortran=5.3.0=6
- m2w64-gcc-libs=5.3.0=7
- m2w64-gcc-libs-core=5.3.0=7
- m2w64-gmp=6.1.0=2
- m2w64-libwinpthread-git=5.0.0.4634.697f757=2
- markupsafe=2.1.5=py312he70551f_0
- matplotlib=3.8.4=py312h2e8e312_0
- matplotlib-base=3.8.4=py312h26ecaf7_0
- matplotlib-inline=0.1.7=pyhd8ed1ab_0
- mkl=2023.1.0=h6a75c08_48682
- mpmath=1.3.0=pyhd8ed1ab_0
- msys2-conda-epoch=20160418=1
- munkres=1.1.4=pyh9f0ad1d_0
- nest-asyncio=1.6.0=pyhd8ed1ab_0
- networkx=3.3=pyhd8ed1ab_1
- numpy=1.26.4=py312h8753938_0
- openjpeg=2.5.2=h3d672ee_0
- openssl=3.3.0=hcfcfb64_0
- packaging=24.0=pyhd8ed1ab_0
- pandas=2.2.2=py312h2ab9e98_0
- parso=0.8.4=pyhd8ed1ab_0
- pcre2=10.43=h17e33f8_0
- pickleshare=0.7.5=py_1003
- pillow=10.3.0=py312h6f6a607_0
- pip=24.0=pyhd8ed1ab_0
- platformdirs=4.2.1=pyhd8ed1ab_0
- ply=3.11=pyhd8ed1ab_2
- prompt-toolkit=3.0.42=pyha770c72_0
- psutil=5.9.8=py312he70551f_0
- pthread-stubs=0.4=hcd874cb_1001
- pthreads-win32=2.9.1=hfa6e2cd_3
- pure_eval=0.2.2=pyhd8ed1ab_0
- pygments=2.17.2=pyhd8ed1ab_0
- pyparsing=3.1.2=pyhd8ed1ab_0
- pyqt=5.15.9=py312he09f080_5
- pyqt5-sip=12.12.2=py312h53d5487_5
- pysocks=1.7.1=pyh0701188_6
- python=3.12.3=h2628c8c_0_cpython
- python-dateutil=2.9.0=pyhd8ed1ab_0
- python-tzdata=2024.1=pyhd8ed1ab_0
- python_abi=3.12=4_cp312
- pytorch=2.3.0=py3.12_cuda12.1_cudnn8_0
- pytorch-cuda=12.1=hde6ce7c_5
- pytorch-mutex=1.0=cuda
- pytz=2024.1=pyhd8ed1ab_0
- pywin32=306=py312h53d5487_2
- pyyaml=6.0.1=py312he70551f_1
- pyzmq=26.0.2=py312hd7027bb_0
- qt-main=5.15.8=hcef0176_21
- requests=2.31.0=pyhd8ed1ab_0
- scikit-learn=1.4.2=py312hcacafb1_0
- scipy=1.13.0=py312h8753938_0
- setuptools=69.5.1=pyhd8ed1ab_0
- sip=6.7.12=py312h53d5487_0
- six=1.16.0=pyh6c4a22f_0
- stack_data=0.6.2=pyhd8ed1ab_0
- sympy=1.12=pyh04b8f61_3
- tbb=2021.12.0=h91493d7_0
- threadpoolctl=3.5.0=pyhc1e730c_0
- tk=8.6.13=h5226925_1
- toml=0.10.2=pyhd8ed1ab_0
- tomli=2.0.1=pyhd8ed1ab_0
- torchaudio=2.3.0=py312_cu121
- torchvision=0.18.0=py312_cu121
- tornado=6.4=py312he70551f_0
- traitlets=5.14.3=pyhd8ed1ab_0
- typeguard=2.13.3=pyhd8ed1ab_0
- typing-extensions=4.11.0=hd8ed1ab_0
- typing_extensions=4.11.0=pyha770c72_0
- tzdata=2024a=h0c530f3_0
- ucrt=10.0.22621.0=h57928b3_0
- urllib3=2.2.1=pyhd8ed1ab_0
- vc=14.3=hcf57466_18
- vc14_runtime=14.38.33130=h82b7239_18
- vs2015_runtime=14.38.33130=hcb4865c_18
- wcwidth=0.2.13=pyhd8ed1ab_0
- wheel=0.43.0=pyhd8ed1ab_1
- win_inet_pton=1.1.0=pyhd8ed1ab_6
- xorg-libxau=1.0.11=hcd874cb_0
- xorg-libxdmcp=1.1.3=hcd874cb_0
- xz=5.2.6=h8d14728_0
- yaml=0.2.5=h8ffe710_2
- zeromq=4.3.5=h63175ca_1
- zipp=3.17.0=pyhd8ed1ab_0
- zstd=1.5.5=h12be248_0
@OliverAh OliverAh added the bug label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant