Advent of 2022, Day 17 – Building responsible AI dashboard with Python SDK

Posted on December 17, 2022 by tomaztsql in R bloggers | 0 Comments

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the series of Azure Machine Learning posts:

Dec 01: What is Azure Machine Learning?
Dec 02: Creating Azure Machine Learning Workspace
Dec 03: Understanding Azure Machine Learning Studio
Dec 04: Getting data to Azure Machine Learning workspace
Dec 05: Creating compute and cluster instances in Azure Machine Learning
Dec 06: Environments in Azure Machine Learning
Dec 07: Introduction to Azure CLI and Python SDK
Dec 08: Python SDK namespaces for workspace, experiments and models
Dec 09: Python SDK namespaces for environment, and pipelines
Dec 10: Connecting to client using Python SDK namespaces
Dec 11: Creating Pipelines with Python SDK
Dec 12: Creating jobs
Dec 13: Automated ML
Dec 14: Registering the models
Dec 15: Getting to know MLflow
Dec 16: MLflow in action with xgboost

Responsible AI is an approach to assessing, developing, and deploying AI systems in a safe, trustworthy, and ethical manner, and take responsible decisions and actions (source: Responsible AI toolbox is available at Microsoft’s Github ).

Azure ML has provided users with collection of model and data exploration with the Studio user interface. But it also provides compatible solutions with Azure ML and Python package responsibleai. With the help of widgets, we will create an sample of dashboard to explore the solution with assessing the responsible decisions and actions.

Create a new Notebook and do all the needed installations of packages! Once you install the raiwidgets, make sure to restart the kernel!

I am using Python 3.8 with AzureML kernel! And installation process is flawless.

I am using Adult dataset with income as Y-variable and all other demographic variables (X-variables) to use in formula to train and test the model.

And I will be using following code to split and train the model:

def create_classification_pipeline(X):
    pipe_cfg = {
        'num_cols': X.dtypes[X.dtypes == 'int64'].index.values.tolist(),
        'cat_cols': X.dtypes[X.dtypes == 'object'].index.values.tolist(),
    }
    num_pipe = Pipeline([
        ('num_imputer', SimpleImputer(strategy='median')),
        ('num_scaler', StandardScaler())
    ])
    cat_pipe = Pipeline([
        ('cat_imputer', SimpleImputer(strategy='constant', fill_value='?')),
        ('cat_encoder', OneHotEncoder(handle_unknown='ignore', sparse=False))
    ])
    feat_pipe = ColumnTransformer([
        ('num_pipe', num_pipe, pipe_cfg['num_cols']),
        ('cat_pipe', cat_pipe, pipe_cfg['cat_cols'])
    ])

    # Append classifier to preprocessing pipeline.
    # Now we have a full prediction pipeline.
    pipeline = Pipeline(steps=[('preprocessor', feat_pipe),
                               ('model', LGBMClassifier(random_state=0))])

    return pipeline

target_feature = 'income'
categorical_features = ['workclass', 'education', 'marital-status',
                        'occupation', 'relationship', 'race', 'gender', 'native-country']


train_data = pd.read_csv('adult-train.csv', skipinitialspace=True)
test_data = pd.read_csv('adult-test.csv', skipinitialspace=True)

X_train_original, y_train = split_label(train_data, target_feature)
X_test_original, y_test = split_label(test_data, target_feature)

pipeline = create_classification_pipeline(X_train_original)

y_train = y_train[target_feature].to_numpy()
y_test = y_test[target_feature].to_numpy()

test_data_sample = test_data.sample(n=500, random_state=5)
model = pipeline.fit(X_train_original, y_train)

With 500 iterations, we will rung this classification problem, using ligthgbm LGBMClassifier.

Once the classification is completed, we can start creating a dashboard.

Fig 4: Code for creating responsible dashboard

And Once completed, you can run the dashboard. It will be deliverd as a separate URL. So just copy and paste it to a new window:

https://amlblog2022-ds12-v2-5000.germanywestcentral.instances.azureml.ms/ (URL will not work for you).

Interpretation

Now that we have the error analysis in front and we can immediatelly see, that 24,11% of error rate is contribrubte by respondents with mariatl statuts == married-civ-spouse.

Error Analysis identifies cohorts of data with higher error rate than the overall benchmark. These discrepancies might occur when the system or model underperforms for specific demographic groups or infrequently observed input conditions in the training data. There are two type, decision tree and heat map. Decision tree helps you discover cohorts with higher error rates accross multiple features using binary tree visualisation.

Fig 6: Error analysis with Decision tree

Once you decide on which node you want to focus, you can switch to heat-map to further investigate and explain, which feature causes discrepancies in the model:

Fig 7: Using heatmap to better understand cohorts

In addition, you can specify the accuracy of the model with cohorts and also check the probability distributions.

And you can also analyse the feature importances by cohorts and by given variables.

This notebook was prepared on the basis of demo use case from Microsoft’s Github notebook.

Compete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Azure-Machine-Learning

Happy Advent of 2022!

To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Advent of 2022, Day 17 – Building responsible AI dashboard with Python SDK

Interpretation

Related

Interpretation

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)