{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ShapleySobolKNN\n", "\n", "We now demonstrate how to use `ShapleySobolKNN` for estimating Shapley Sobol' indices (Owen, 2014; Song et al., 2016) from scattered data. If you have not installed `pyfirst`, please uncomment and run `%pip install pyfirst` below before proceeding. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# %pip install pyfirst" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from pyfirst import ShapleySobolKNN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulate Data\n", "\n", "We simulate clean data from the Ishigami function \n", "\n", "$$\n", " y = f(X) = \\sin(X_{1}) + 7\\sin^2(X_{2}) + 0.1X_{3}^{4}\\sin(X_{1}),\n", "$$\n", "\n", "where the input $X$ are independent features uniformly distributed on $[-\\pi,\\pi]^{3}$." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def ishigami(x):\n", " x = -np.pi + 2 * np.pi * x\n", " y = np.sin(x[0]) + 7 * np.sin(x[1])**2 + 0.1 * x[2]**4 * np.sin(x[0])\n", " return y\n", "\n", "np.random.seed(43)\n", "n = 10000\n", "p = 3\n", "X = np.random.uniform(size=(n,p))\n", "y = np.apply_along_axis(ishigami, 1, X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run ShapleySobolKNN" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.42633062, 0.44285617, 0.1308132 ])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ShapleySobolKNN(X, y, noise=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Speeding Up ShapleySobolKNN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The speed-up tricks available for `TotalSobolKNN` are also available for `ShapleySobolKNN`. Please check the speed-up tricks in the `TotalSobolKNN` page for more details. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Noisy Data\n", "\n", "We now look at the estimation performance on the noisy data $y = f(X) + \\epsilon$ where $\\epsilon\\sim\\mathcal{N}(0,1)$ is the random error. For noisy data, `ShapleySobolKNN` implements the Noise-Adjusted Nearest-Neighbor estimator in Huang and Joseph (2025), which corrects the bias by the Nearest-Neighbor estimator from Broto et al. (2020) when applied on noisy data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.43060419, 0.4407395 , 0.12865631])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.seed(43)\n", "n = 10000\n", "p = 3\n", "X = np.random.uniform(size=(n,p))\n", "y = np.apply_along_axis(ishigami, 1, X) + np.random.normal(size=n)\n", "\n", "ShapleySobolKNN(X, y, noise=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more details about `ShapleySobolKNN`, please Huang and Joseph (2025)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "Huang, C., & Joseph, V. R. (2025). Factor Importance Ranking and Selection using Total Indices. Technometrics.\n", "\n", "Owen, A. B. (2014), “Sobol’indices and Shapley value,” SIAM/ASA Journal on Uncertainty Quantification, 2, 245–251.\n", "\n", "Song, E., Nelson, B. L., & Staum, J. (2016), “Shapley effects for global sensitivity analysis: Theory and computation,” SIAM/ASA Journal on Uncertainty Quantification, 4, 1060-1083.\n", " \n", "Broto, B., Bachoc, F., & Depecker, M. (2020). Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2), 693-716.\n", "\n", "Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P.E., Lomeli, M., Hosseini, L., & Jégou, H., (2024). The Faiss library. arXiv preprint arXiv:2401.08281.\n", " \n", "Vakayil, A., & Joseph, V. R. (2022). Data twinning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15(5), 598-610." ] } ], "metadata": { "kernelspec": { "display_name": "3.9.7", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }