causal-intervention

Here are 5 public repositories matching this topic...

Evaluate interpretability methods on localizing and disentangling concepts in LLMs.

[SIGIR 2022] Source code and datasets for "Bias Mitigation for Evidence-aware Fake News Detection by Causal Intervention".

Demystifying Verbatim Memorization in Large Language Models

[EMNLP 2023] A Causal View of Entity Bias in (Large) Language Models

A framework for evaluating auto-interp pipelines, i.e., natural language explanations of neurons.

Add a description, image, and links to the causal-intervention topic page so that developers can more easily learn about it.

To associate your repository with the causal-intervention topic, visit your repo's landing page and select "manage topics."