CCS on compound sentences (C3S)

Repository for the publication about unsupervised method for lie detection.

Finding internal knowledge representation(s) inside transformer models without supervision is certainly a challenging task which is important for scalable oversight and to mitigate the deception risk factor. We are testing Contrast-Consistent Search (CCS1) on TruthfulQA dataset for compound sentences (conjunction and disjunction) each composed of several answers to a question to see if unsupervised probes work to the same degree as on simple statements that compound ones consist of, with the goal to improve unsupervised methods to discover latent knowledge.

See https://www.lesswrong.com/posts/Lgvw4rFsGcXoyYZbw/ccs-on-compound-sentences

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
conf		conf
data		data
lib		lib
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
C3S.py		C3S.py
CCS.py		CCS.py
LICENSE		LICENSE
LogisticRegression.py		LogisticRegression.py
README.md		README.md
__init__.py		__init__.py
artifacts.tbz		artifacts.tbz
environment-cpu.yml		environment-cpu.yml
environment-cuda.yml		environment-cuda.yml
environment_2024-04-29.yml		environment_2024-04-29.yml
eval.py		eval.py
launch.batch		launch.batch
model.py		model.py
report.py		report.py
requirements.txt		requirements.txt
run.sh		run.sh
train.py		train.py
train_eval.py		train_eval.py
truthful_qa_ds.py		truthful_qa_ds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CCS on compound sentences (C3S)

About

Releases

Packages

Languages

License

artkpv/C3S

Folders and files

Latest commit

History

Repository files navigation

CCS on compound sentences (C3S)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages