Skip to content

Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reaction_Data_Cleaning

This repository contains chemical reactions data curation best practices.

In 'scripts' package you will find:

  • standardization protocol;
  • a script to run the standardization protocol in parallel mode using chunks;
  • a script for mappers comparison.

In 'data' directory you will find:

  • Our golden dataset zip archive curated and mapped manually;
  • USPTO dataset curated by the standardization protocol and mapped by RXNMapper.

Recommended way of running standardizer.py:

python3.7 standardizer.py -i ../data/golden_dataset.rdf -o ../data/golden_dataset_out.rdf -id Reaction_ID --logFile ../data/golden_dataset.log --skip_tautomerize --keep_unbalanced_ions

Corresponding Authors:

Alexandre Varnek (varnek@unistra.fr)
Timur Madzhidov (tmadzhidov@gmail.com)

Contributors:

Arkadii Lin (arkadiyl18@gmail.com)
Ramil Nugmanov (nougmanoff@hotmail.com)
Natalia Duybankova (NDyubank@its.jnj.com)
Jonas Verhoeven (jverhoe9@its.jnj.com)
Timur Madzhidov (tmadzhidov@gmail.com)
Alexandre Varnek (varnek@unistra.fr)
Joerg Wegner (jwegner@its.jnj.com)

Copyright:

Copyright 2020, MaDeSmart, Machine Design of Small Molecules by AI VLAIO project HBC.2018.2287

Credits:

Kazan Federal University, Russia
University of Strasbourg, France
University of Linz, Austria
University of Leuven, Belgium
Janssen Pharmaceutica N.V., Beerse, Belgium
Rail Suleymanov, Arcadia, St. Petersburg, Russia

Reference

Please, cite the paper when you use the data or the scripts:

Lin, Arkadii; Dyubankova, Natalia; Madzhidov, Timur; Nugmanov, Ramil; Rakhimbekova, Assima; Ibragimova, Zarina; Akhmetshin, Tagir; Gimadiev, Timur; Suleymanov, Rail; Verhoeven, Jonas; Wegner, Jörg Kurt; Ceulemans, Hugo; Varnek, Alexandre (2020): Atom-to-Atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.13012679.v1

Dependencies

  • python: 3.7
  • CGRtools: 4.0.36
  • ordered-set: 4.0.2
  • pyjnius: 1.3.0
  • JChemSuite package from ChemAxon: 19.9.0

About

Chemical reaction data cleaning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages