papers

explainableai: making ML models explain themselves in plain english

how i built a python package used by 5,000+ developers that turns black-box models into transparent, auditable systems

october 2024. i'm sitting in a college lab watching a classmate present their ML project. the model works. 94% accuracy. but when the professor asks "why does it predict this?" the room goes silent.

that moment kept coming back to me.

not because it was unusual. because it happens every single time. someone builds a model, it performs well on the test set, and then the first non-technical person in the room asks the obvious question: why?

nobody has a good answer. and that's a problem.

the problem

ML models are opaque by default. a random forest with 500 trees doesn't explain itself. a neural network with 12 layers doesn't tell you which input mattered most. they just output a number.

data scientists can't explain WHY predictions happen. stakeholders don't trust what they can't understand. regulators increasingly require explanations (GDPR Article 22, the EU AI Act).

tools exist. SHAP exists. LIME exists. permutation importance exists. but they're fragmented. you need SHAP for feature importance. LIME for local explanations. a separate fairness library for bias audits. another tool for PDF reports. and then you still have to translate all of that into language a product manager or a CEO can understand.

no single package does: interpret + visualize + report + explain in natural language.

so i built one.

the approach

one class. XAIWrapper. you fit your model once, and you get everything: SHAP values, LIME explanations, permutation importance, partial dependence plots, fairness audit, and natural language explanations via LLM.

the design principle was simple. a data scientist should be able to go from "i have a trained model" to "here's a PDF report my VP can read" in under 5 lines of code.

not 5 libraries. not 200 lines of boilerplate. five lines.

how it works

preprocessing pipeline

handles mixed numerical and categorical data automatically. detects column types, applies appropriate transformations, and keeps track of feature names through the entire pipeline. you don't have to manually encode your categoricals or normalize your numericals. the wrapper handles it.

model selection

compares 4 models (logistic regression, random forest, gradient boosting, XGBoost) via 5-fold cross-validation. picks the best performer automatically. you can also bring your own pre-trained model and skip this step entirely.

SHAP + LIME integration

TreeExplainer for tree-based models. KernelExplainer for everything else. LIME runs in parallel for local explanations. the wrapper automatically selects the right explainer based on model type. you never have to think about which SHAP backend to use.

fairness module

demographic parity, equal opportunity, disparate impact. all computed without external fairness packages. you specify the sensitive attribute, and the module tells you if your model is biased, by how much, and which subgroups are affected.

LLM-as-translator

this is the piece that makes the whole thing click. takes technical metrics (SHAP values, accuracy scores, fairness ratios) and produces human-readable explanations via Gemini. not summaries. actual explanations. "the model predicts this customer will churn primarily because their usage dropped 40% in the last 30 days and they haven't opened a support ticket, which historically correlates with silent departure."

that sentence is what a VP needs. not a waterfall plot.

one-line PDF report

all visualizations + all explanations + fairness audit + model comparison results. one PDF. one function call. ready to attach to an email or drop in a Confluence page.

what makes this different

all-in-one orchestration

no tool-chaining. no "install this for SHAP, that for LIME, another thing for fairness." one pip install. one import. one class.

LLM-powered explanations

bridges the communication gap between data scientists and stakeholders. the technical analysis still happens (SHAP, LIME, permutation importance). but the output is bilingual: technical artifacts for the data team, natural language for everyone else.

built-in fairness audit

without external packages. fairness shouldn't be an afterthought you bolt on. it's part of the explanation. "your model is accurate, but it's 23% less accurate for this subgroup" is an explanation the same way "feature X matters most" is.

batch processing + Dask support

large datasets don't break the pipeline. Dask parallelization for datasets that don't fit in memory. batch SHAP computation for production workloads.

automated PDF reports

ReportLab generates publication-quality PDFs with embedded charts, tables, and LLM-generated narrative. the report tells a story, not just dumps numbers.

traction

5,000+ developers actively using it. 16,000+ downloads on PyPI. accepted to GSSoC 2024 and Hacktoberfest 2024. 20+ community contributors. 185 commits across the repository.

co-built with mihir amin, palak boricha, and sairaj bokand. open source from day one.

the growth was organic. no marketing budget. no product hunt launch. developers found it because they were searching for exactly this: "explain ML model python" or "SHAP + LIME together." the package shows up because it solves a real, searchable problem.

tech stack

core
Python, scikit-learn
explainability
SHAP, LIME, permutation importance
models
XGBoost, Random Forest, Gradient Boosting, Logistic Regression
LLM
Google Gemini (natural language explanations)
reporting
ReportLab (PDF generation)
visualization
Matplotlib, Seaborn, Plotly
scale
Dask (parallel processing for large datasets)

what i learned

the hard part isn't the ML. it's the translation.

turning a SHAP waterfall plot into a sentence a CEO understands. that's where LLMs actually shine. not generating code. not writing essays. generating explanations of technical systems for non-technical humans.

the other thing i learned: developer tools grow when they solve a problem people are already googling. i didn't have to convince anyone they needed explainability. GDPR did that. the EU AI Act did that. the professor asking "why does it predict this?" did that. i just had to make the solution easy enough that people would actually use it.

five lines of code. that was the bar. and it worked.

share this article

Get in Touch