Machine Learning issues

Environment setup

# Install LIME
!pip install lime
Requirement already satisfied: lime in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (0.2.0.1)
Requirement already satisfied: tqdm in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from lime) (4.63.0)
Requirement already satisfied: scikit-learn>=0.18 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from lime) (0.22.1)
Requirement already satisfied: matplotlib in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from lime) (3.1.3)
Requirement already satisfied: scipy in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from lime) (1.4.1)
Requirement already satisfied: scikit-image>=0.12 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from lime) (0.18.1)
Requirement already satisfied: numpy in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from lime) (1.18.1)
Requirement already satisfied: joblib>=0.11 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from scikit-learn>=0.18->lime) (0.14.1)
Requirement already satisfied: cycler>=0.10 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from matplotlib->lime) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from matplotlib->lime) (1.2.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from matplotlib->lime) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from matplotlib->lime) (2.8.1)
Requirement already satisfied: networkx>=2.0 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from scikit-image>=0.12->lime) (2.5.1)
Requirement already satisfied: pillow!=7.1.0,!=7.1.1,>=4.3.0 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from scikit-image>=0.12->lime) (7.1.2)
Requirement already satisfied: imageio>=2.3.0 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from scikit-image>=0.12->lime) (2.9.0)
Requirement already satisfied: tifffile>=2019.7.26 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from scikit-image>=0.12->lime) (2021.4.8)
Requirement already satisfied: PyWavelets>=1.1.1 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from scikit-image>=0.12->lime) (1.1.1)
Requirement already satisfied: six in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from cycler>=0.10->matplotlib->lime) (1.14.0)
Requirement already satisfied: decorator<5,>=4.3 in /Users/baptiste/miniconda3/envs/tf2/lib/python3.7/site-packages (from networkx>=2.0->scikit-image>=0.12->lime) (4.4.2)
import platform

print(f"Python version: {platform.python_version()}")
assert platform.python_version_tuple() >= ("3", "6")

import numpy as np
import pandas as pd

print(f"NumPy version: {np.__version__}")

from IPython.display import YouTubeVideo
Python version: 3.7.5
NumPy version: 1.18.1
import sklearn

print(f"scikit-learn version: {sklearn.__version__}")
assert sklearn.__version__ >= "0.20"

import sklearn.datasets
import sklearn.ensemble

import lime
import lime.lime_tabular
scikit-learn version: 0.22.1

Explainability and interpretability

A growing need

  • ML-based systems are more and more used in a wide variety of scenarios:

    • High-stakes decisions (examples: automatic driving, grid monitoring, cyber-warfare…)

    • Direct impact on people’s lives (examples: medical diagnosis, loan attribution, judiciary decisions…).

  • In some contexts, being able to understand and justify the system’s decision is critical for its acceptance.

White boxes models

Some ML models, like this decision tree, are explainable by design.

Decision Tree example

Black boxes models

Most ML models act as black boxes. For example, a neural network performs a series of non-linear transformations on its inputs to compute its result. The deeper the network, the less its decision process is intelligible to humans.

Dog or cat?

Explainability Vs interpretability

These two terms are often used interchangeably in the scientific community.

The following distinction can nonetheless be useful:

  • Interpretability is about discerning the internal mechanics of a model, e.g. understanding how.

  • Explainability is about justifying the model’s decision, e.g. understanding why.

A possible taxonomy

XAI taxomony

Explanation methods

Explanations can:

  • take various forms: textual, visual, symbolic.

  • be global, i.e. characterise the whole dataset, or local, i.e. explain individual classification or regression outcomes.

  • be model-specific, i.e. capable of explaining only a restricted class of models, or model-agnostic, i.e. applicable to an arbitrary model.

Explanation examples

“This person hasn’t be approved for a loan because her financial behavior has been questionable for 10 years”.

Visual explainability example

An attribution method: LIME

LIME (Local Interpretable Model-agnostic Explanations) aims to explain the rationale behind a model’s predictions, in order to help users decide when to trust or not to trust these predictions.

LIME objective

How LIME works

LIME trains a local linear approximation of the model’s behaviour. The linear coefficients represent the contributions of the features to the prediction.

In the following picture, the linear model (dashed line) is a good approximation in the neighborhood of the explained instance (bright red cross).

LIME in action

LIME for images (1/2)

When explaining classification results for an image, LIME starts by dividing it into interpretable components (contiguous blocks of pixels).

Image division into superpixels

LIME for images (2/2)

A dataset of “perturbed” instances (images for which some of the interpretable components are grayed) and the corresponding model probabilities is used to train a simple (linear) model. The superpixels with highest positive weights are presented as an explanation.

LIME process with images

LIME in action

# Load the Iris dataset (https://archive.ics.uci.edu/ml/datasets/iris)
iris = sklearn.datasets.load_iris()

# Create train/test sets
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(
    iris.data, iris.target, test_size=0.20
)

print(f"x_train: {x_train.shape}. y_train: {y_train.shape}")
print(f"x_test: {x_test.shape}. y_test: {y_test.shape}")
x_train: (120, 4). y_train: (120,)
x_test: (30, 4). y_test: (30,)
# Put it into a DataFrame for visualization purposes
df_iris = pd.DataFrame(iris.data, columns=iris.feature_names)

# Add target and class columns to DataFrame
df_iris["target"] = iris.target
df_iris["class"] = iris.target_names[iris.target]

# Show 8 random samples
df_iris.sample(n=8)
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target class
70 5.9 3.2 4.8 1.8 1 versicolor
129 7.2 3.0 5.8 1.6 2 virginica
120 6.9 3.2 5.7 2.3 2 virginica
18 5.7 3.8 1.7 0.3 0 setosa
66 5.6 3.0 4.5 1.5 1 versicolor
58 6.6 2.9 4.6 1.3 1 versicolor
46 5.1 3.8 1.6 0.2 0 setosa
29 4.7 3.2 1.6 0.2 0 setosa
# Train a Random Forest on the dataset
rf_model = sklearn.ensemble.RandomForestClassifier(n_estimators=50)
rf_model.fit(x_train, y_train)

# Compute accuracy on test data
test_acc = sklearn.metrics.accuracy_score(y_test, rf_model.predict(x_test))
print(f"Test accuracy: {test_acc:.5f}")
Test accuracy: 0.93333
# Create the LIME explainer
# Continuous features are discretized into quartiles for more intuitive explanations
explainer = lime.lime_tabular.LimeTabularExplainer(
    x_train,
    feature_names=iris.feature_names,
    class_names=iris.target_names,
    discretize_continuous=True,
)
# Select a random test sample
index = np.random.randint(0, x_test.shape[0])
sample = x_test[index]

# Explain this sample
exp = explainer.explain_instance(sample, rf_model.predict_proba, num_features=2, top_labels=1)
exp.show_in_notebook(show_all=False)