This repository has been archived by the owner on Dec 7, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
167 lines (111 loc) · 10.5 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: "chemspiderapi"
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
<!-- badges: start -->
[![R build status](https://github.com/NIVANorge/chemspiderapi/workflows/R-CMD-check/badge.svg)](https://github.com/NIVANorge/chemspiderapi/actions)
[![CodeCov test coverage](https://codecov.io/gh/NIVANorge/chemspiderapi/branch/master/graph/badge.svg)](https://codecov.io/gh/NIVANorge/chemspiderapi?branch=master)
<!-- badges: end -->
> R functionalities for ChemSpider's API services
ChemSpider has introduced a new API syntax in late 2018, and [the old ChemSpider API syntax will be shut down at the end of November 2018](http://link.rsc.org/rsps/m/xSq8Cm8ovjN8-Elm0eYB3Sey61zutqNIUUaMcyc14sQ). `{chemspiderapi}` provides R wrappers around the new API services from ChemSpider.
The aim of this package is to:
1) Translate the new ChemSpider API services into R-friendly functions.
2) Include thorough quality checking *before* the query is send, to avoid using up the query quota on, e.g., spelling errors.
3) Implement the R functionality in a way that is suitable for both `{base}`- and `{tidyverse}`-style programming.
`{chemspiderapi}` relies on API keys to access ChemSpider's API services. For this, we recommend storing the ChemSpider API key using the [`{keyring}`](https://cran.r-project.org/package=keyring) package.
To limit the rate of API queries, we recommend using the [`{ratelimitr}`](https://cran.r-project.org/package=ratelimitr) package or `purrr::slowly()` within the [`{tidyverse}`](https://cran.r-project.org/package=tidyverse) package collection.
To handle PNG images, the [`{magick}`](https://cran.r-project.org/package=magick) package is recommended.
We furthermore recommend the [`{memoise}`](https://cran.r-project.org/package=memoise) package to "remember" the results of API queries (*i.e.*, to not ruin the API allowance).
Additionally, the superb [`{webchem}`](https://cran.r-project.org/web/packages/webchem/) package provides access to a lot of additional chemistry-related API services and is highly recommended.
## Installation
### R package
Install the package from GitHub (using the `{remotes}` package; automatically installed as part of [`{devtools}`](https://cran.r-project.org/package=devtools)):
```{r,eval=FALSE}
# install.packages("devtools")
remotes::install_github("NIVANorge/chemspiderapi")
```
The development version can be installed with:
```{r,eval=FALSE}
# install.packages("devtools")
remotes::install_github("NIVANorge/chemspiderapi", ref = "dev")
```
Currently the only tested continuous integration environment for `{chemspiderapi}` is Linux. It *should* install smoothly on Windows 10 and mac OS machines as well. Please open an issue if you run into any troubles.
### Dependencies
`{chemspiderapi}` relies on two essential dependencies. The [`{curl}`](https://cran.r-project.org/package=curl) package is used to handle the API queries and the [`{jsonlite}`](https://cran.r-project.org/package=jsonlite) package is necessary to wrap and unwrap information for the queries.
If not already installed, these packages will be installed automatically when installing `{chemspiderapi}`. Should this result in trouble, the dependency packages can be installed manually:
```{r, eval=FALSE}
install.packages(c("curl", "jsonlite"))
```
If `{curl}` or `{jsonlite}` are missing, (almost) all functions of `{chemspiderapi}` will fail and throw an error.
## Coverage
As of `r Sys.Date()`, the following functionalities are implemented (`r signif((27 / 27) * 100, digits = 2)`% functionality with `r signif((27 / 27) * 100, digits = 2)`% annotation):
**FILTERING**
| ChemSpider Compound API | `chemspiderapi` Wrapper | `chemspiderapi` Help File |
|:---------------------------------------- |:------------------------------------- |:-------------------------:|
| filter-element-post | `post_element()` | yes |
| filter-formula-batch-post | `post_formula_batch()` | yes |
| filter-formula-batch-queryId-results-get | `get_formula_batch_queryId_results()` | yes |
| filter-formula-batch-queryId-status-get | `get_formula_batch_queryId_status()` | yes |
| filter-formula-post | `post_formula()` | yes |
| filter-inchi-post | `post_inchi()` | yes |
| filter-inchikey-post | `post_inchikey()` | yes |
| filter-intrinsicproperty-post | `post_intrinsicproperty()` | yes |
| filter-mass-batch-post | `post_mass_batch()` | yes |
| filter-mass-batch-queryId-results-get | `get_mass_batch_queryId_results()` | yes |
| filter-mass-batch-queryId-status-get | `get_mass_batch_queryId_status()` | yes |
| filter-mass-post | `post_mass()` | yes |
| filter-name-post | `post_name()` | yes |
| filter-queryId-results-get | `get_queryId_results()` | yes |
| filter-queryId-results-sdf-get | `get_queryId_results_sdf()` | yes |
| filter-queryId-status-get | `get_queryId_status()` | yes |
| filter-smiles-post | `post_smiles()` | yes |
**LOOKUPS**
| ChemSpider Compound API | `chemspiderapi` Wrapper | `chemspiderapi` Help File |
|:----------------------- |:----------------------- |:-------------------------:|
| lookups-datasources-get | `get_datasources()` | yes |
**RECORDS**
| ChemSpider Compound API | `chemspiderapi` Wrapper | `chemspiderapi` Help File |
|:--------------------------------------- |:----------------------------------- |:-------------------------:|
| records-batch-post | `post_batch()` | yes |
| records-recordId-details-get | `get_recordId_details()` | yes |
| records-recordId-externalreferences-get | `get_recordId_externalreferences()` | yes |
| records-recordId-image-get | `get_recordId_image()` | yes |
| records-recordId-mol-get | `get_recordId_mol()` | yes |
**TOOLS**
| ChemSpider Compound API | `chemspiderapi` Wrapper | `chemspiderapi` Help File |
|:---------------------------- |:-------------------------- |:-------------------------:|
| tools-convert-post | `post_convert()` | yes |
| tools-validate-inchikey-post | `post_validate_inchikey()` | yes |
## Best practices for ChemSpider's Compound APIs
_This section will be updated with practical examples in the future._
The basic workflow order for the above **FILTERING** queries is:
1) POST Query
2) GET Status
3) GET Results (after GET Status returns `"Complete"`)
In practice, this means the following possible workflows can be implemented (from left to right):
| POST Query | GET Status | GET Results |
:--------------------------- |:------------------------------------ |:------------------------------------- |
| `post_element()` | `get_queryId_status()` | `get_queryId_results()` |
| `post_formula()` | `get_queryId_status()` | `get_queryId_results()` |
| `post_formula_batch()` | `get_formula_batch_queryId_status()` | `get_formula_batch_queryId_results()` |
| `post_inchi()` | `get_queryId_status()` | `get_queryId_results()` |
| `post_inchikey()` | `get_queryId_status()` | `get_queryId_results()` |
| `post_intrinsicproperty()` | `get_queryId_status()` | `get_queryId_results()` |
| `post_mass()` | `get_queryId_status()` | `get_queryId_results()` |
| `post_mass_batch()` | `get_mass_batch_queryId_status()` | `get_mass_batch_queryId_results()` |
| `post_mass()` | `get_queryId_status()` | `get_queryId_results()` |
| `post_name()` | `get_queryId_status()` | `get_queryId_results()` |
| `post_smiles()` | `get_queryId_status()` | `get_queryId_results()` |
Typically, the result will be one or multiple ChemSpider IDs (`recordId`). They can be used as input into the above **RECORDS** queries, e.g., `get_recordId_details()`.
## Vignettes
As of `r Sys.Date()`, the following five vignettes are available:
* **Storing and Accessing API Keys**: A basic example on how to safely store and retrieve API keys using the `keyring` package.
* **Rate-Limiting API Queries**: Multiple examples on how to slow down API queries. `base`, `ratelimitr`, and `tidyverse` workflows are provided.
* **Remembering API Queries**: A basic example on how to "remember" the results of API queries using the `memoise` package. This can greatly reduce redundant API queries.
* **Saving MOL and SDF Files**: Examples on how to save MOL and SDF files as returned by `get_recordId_mol()` and `get_queryId_results_sdf()`.
* **Saving PNG Images**: Examples on how to save PNG files, as returned by `get_recordId_image()`, using functionalities of the `magick` package.
## Funding
This package was created at the [Norwegian Institute for Water Research (NIVA)](https://www.niva.no/en) in conjunction with [NIVA's Computational Toxicology Program (NCTP)](https://www.niva.no/en/projectweb/nctp) at NIVA's [Section for Ecotoxicology and Risk Assessment](https://www.niva.no/en/research/ecotoxicology_and_risk_assessment) and funded by [The Research Council of Norway (RCN)](https://www.forskningsradet.no/en/), project 268294: [Cumulative Hazard and Risk Assessment of Complex Mixtures and Multiple Stressors (MixRisk)](https://www.forskningsradet.no/prosjektbanken/#/project/NFR/268294/Sprak=en).
## License
MIT © Raoul Wolf