ShapleySobolKNN

We now demonstrate how to use ShapleySobolKNN for estimating Shapley Sobol’ indices (Owen, 2014; Song et al., 2016) from scattered data. If you have not installed pyfirst, please uncomment and run %pip install pyfirst below before proceeding.

# %pip install pyfirst

Imports

import numpy as np
from pyfirst import ShapleySobolKNN

Simulate Data

We simulate clean data from the Ishigami function

$$ y = f(X) = \sin(X_{1}) + 7\sin^2(X_{2}) + 0.1X_{3}^{4}\sin(X_{1}), $$

where the input \(X\) are independent features uniformly distributed on \([-\pi,\pi]^{3}\).

def ishigami(x):
    x = -np.pi + 2 * np.pi * x
    y = np.sin(x[0]) + 7 * np.sin(x[1])**2 + 0.1 * x[2]**4 * np.sin(x[0])
    return y

np.random.seed(43)
n = 10000
p = 3
X = np.random.uniform(size=(n,p))
y = np.apply_along_axis(ishigami, 1, X)

Run ShapleySobolKNN

ShapleySobolKNN(X, y, noise=False)
array([0.42633061, 0.4428562 , 0.13081318])

Speeding Up ShapleySobolKNN

The speed-up tricks available for TotalSobolKNN are also available for ShapleySobolKNN. Please check the speed-up tricks in the TotalSobolKNN page for more details.

Noisy Data

We now look at the estimation performance on the noisy data \(y = f(X) + \epsilon\) where \(\epsilon\sim\mathcal{N}(0,1)\) is the random error. For noisy data, ShapleySobolKNN implements the Noise-Adjusted Nearest-Neighbor estimator in Huang and Joseph (2025), which corrects the bias by the Nearest-Neighbor estimator from Broto et al. (2020) when applied on noisy data.

np.random.seed(43)
n = 10000
p = 3
X = np.random.uniform(size=(n,p))
y = np.apply_along_axis(ishigami, 1, X) + np.random.normal(size=n)

ShapleySobolKNN(X, y, noise=True)
array([0.43060553, 0.4407385 , 0.12865597])

For more details about ShapleySobolKNN, please Huang and Joseph (2025).

References

Huang, C., & Joseph, V. R. (2025). Factor Importance Ranking and Selection using Total Indices. Technometrics.

Owen, A. B. (2014), “Sobol’indices and Shapley value,” SIAM/ASA Journal on Uncertainty Quantification, 2, 245–251.

Song, E., Nelson, B. L., & Staum, J. (2016), “Shapley effects for global sensitivity analysis: Theory and computation,” SIAM/ASA Journal on Uncertainty Quantification, 4, 1060-1083.

Broto, B., Bachoc, F., & Depecker, M. (2020). Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2), 693-716.

Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P.E., Lomeli, M., Hosseini, L., & Jégou, H., (2024). The Faiss library. arXiv preprint arXiv:2401.08281.

Vakayil, A., & Joseph, V. R. (2022). Data twinning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15(5), 598-610.