{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ShapleySobolKNN\n",
    "\n",
    "We now demonstrate how to use `ShapleySobolKNN` for estimating Shapley Sobol' indices (Owen, 2014; Song et al., 2016) from scattered data. If you have not installed `pyfirst`, please uncomment and run `%pip install pyfirst` below before proceeding. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install pyfirst"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from pyfirst import ShapleySobolKNN"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simulate Data\n",
    "\n",
    "We simulate clean data from the Ishigami function \n",
    "\n",
    "$$\n",
    "    y = f(X) = \\sin(X_{1}) + 7\\sin^2(X_{2}) + 0.1X_{3}^{4}\\sin(X_{1}),\n",
    "$$\n",
    "\n",
    "where the input $X$ are independent features uniformly distributed on $[-\\pi,\\pi]^{3}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def ishigami(x):\n",
    "    x = -np.pi + 2 * np.pi * x\n",
    "    y = np.sin(x[0]) + 7 * np.sin(x[1])**2 + 0.1 * x[2]**4 * np.sin(x[0])\n",
    "    return y\n",
    "\n",
    "np.random.seed(43)\n",
    "n = 10000\n",
    "p = 3\n",
    "X = np.random.uniform(size=(n,p))\n",
    "y = np.apply_along_axis(ishigami, 1, X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run ShapleySobolKNN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.42633062, 0.44285617, 0.1308132 ])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ShapleySobolKNN(X, y, noise=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Speeding Up ShapleySobolKNN"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The speed-up tricks available for `TotalSobolKNN` are also available for `ShapleySobolKNN`. Please check the speed-up tricks in the `TotalSobolKNN` page for more details. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Noisy Data\n",
    "\n",
    "We now look at the estimation performance on the noisy data $y = f(X) + \\epsilon$ where $\\epsilon\\sim\\mathcal{N}(0,1)$ is the random error. For noisy data, `ShapleySobolKNN` implements the Noise-Adjusted Nearest-Neighbor estimator in Huang and Joseph (2025), which corrects the bias by the Nearest-Neighbor estimator from Broto et al. (2020) when applied on noisy data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.43060419, 0.4407395 , 0.12865631])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.random.seed(43)\n",
    "n = 10000\n",
    "p = 3\n",
    "X = np.random.uniform(size=(n,p))\n",
    "y = np.apply_along_axis(ishigami, 1, X) + np.random.normal(size=n)\n",
    "\n",
    "ShapleySobolKNN(X, y, noise=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more details about `ShapleySobolKNN`, please Huang and Joseph (2025)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "Huang, C., & Joseph, V. R. (2025). Factor Importance Ranking and Selection using Total Indices. Technometrics.\n",
    "\n",
    "Owen, A. B. (2014), “Sobol’indices and Shapley value,” SIAM/ASA Journal on Uncertainty Quantification, 2, 245–251.\n",
    "\n",
    "Song, E., Nelson, B. L., & Staum, J. (2016), “Shapley effects for global sensitivity analysis: Theory and computation,” SIAM/ASA Journal on Uncertainty Quantification, 4, 1060-1083.\n",
    "    \n",
    "Broto, B., Bachoc, F., & Depecker, M. (2020). Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2), 693-716.\n",
    "\n",
    "Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P.E., Lomeli, M., Hosseini, L., & Jégou, H., (2024). The Faiss library. arXiv preprint arXiv:2401.08281.\n",
    "    \n",
    "Vakayil, A., & Joseph, V. R. (2022). Data twinning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15(5), 598-610."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "3.9.7",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}