-
Notifications
You must be signed in to change notification settings - Fork 1
/
motif_practice.qmd
296 lines (231 loc) · 10 KB
/
motif_practice.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
---
title: "motif_practice"
author: "Andrew"
date: "11/4/2022"
output: html_document
editor_options:
chunk_output_type: console
---
## Package stuff setup
```{r}
library(devtools)
library(roxygen2)
library(lintr)
library(testthat)
library(tidyr)
# lint("motif_practice.qmd")
```
## Todo
- full workflow template (plug and play mode)
- generate spectrograms
- plots
- summary statistics
- maps showing study site locations
- import metadata (temp, sensor locations, habitat type)
- Generate a summary description sentences and/or a markdown table
(automatically generated using in-text R) based on the metadata. e.g.
- This site has _properties_ and is located _sitelocation_.
- a way to label / confirm motifs in R??
## Setup
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Testing
Testing setup in progress.
## Install/reinstall
```{r}
detach("package:mimics", unload = TRUE)
install("/Users/andrew/Documents/GitHub/mimics")
library(MIMiCS)
```
## Terminology
- Site = a geographic region/area
- Location = an actual sensor
## Step 1 - prepare the acoustic indices data
### Download or provide your own data
In this example workflow we will use data available from the A2O.
If you would like to download data from the A20, follow this guide (TODO
refactor this into own section)
```{r}
list.files(path = "data/A20", recursive = TRUE)
```
```text
data
├── A20
├── 305_BooroopkiDryA
│ ├── 20201230T050000+1100_Booroopki-Dry-A_271849.flac
│ ├── 20201230T070000+1100_Booroopki-Dry-A_271857.flac
│ ├── 20201230T090000+1100_Booroopki-Dry-A_271854.flac
│ ├── 20201231T050000+1100_Booroopki-Dry-A_271865.flac
│ ├── 20201231T070000+1100_Booroopki-Dry-A_271867.flac
│ └── 20201231T090000+1100_Booroopki-Dry-A_271866.flac
├── 307_BooroopkiDryB
│ ├── 20201230T050000+1100_Booroopki-Dry-B_278581.flac
│ ├── 20201230T070000+1100_Booroopki-Dry-B_278580.flac
│ ├── 20201230T090000+1100_Booroopki-Dry-B_278587.flac
│ ├── 20201231T050000+1100_Booroopki-Dry-B_278593.flac
│ ├── 20201231T070000+1100_Booroopki-Dry-B_278600.flac
│ └── 20201231T090000+1100_Booroopki-Dry-B_278597.flac
├── 6_BonBonStationDryA
│ ├── 20201230T050000+1030_Bon-Bon-Station-Dry-A_418011.flac
│ ├── 20201230T070000+1030_Bon-Bon-Station-Dry-A_418014.flac
│ ├── 20201230T090000+1030_Bon-Bon-Station-Dry-A_418017.flac
│ ├── 20201231T050000+1030_Bon-Bon-Station-Dry-A_418021.flac
│ └── 20201231T070000+1030_Bon-Bon-Station-Dry-A_418020.flac
└── 8_BonBonStationDryB
├── 20201230T050000+1030_Bon-Bon-Station-Dry-B_426598.flac
├── 20201230T070000+1030_Bon-Bon-Station-Dry-B_426601.flac
├── 20201230T090000+1030_Bon-Bon-Station-Dry-B_426596.flac
├── 20201231T050000+1030_Bon-Bon-Station-Dry-B_426604.flac
└── 20201231T070000+1030_Bon-Bon-Station-Dry-B_426607.flac
```
See some summary information about the files you're working with. My files are
from __. From two sites, and two locations within each site.
```{r}
# TODO: nice summary info, number of recordings/files, total hours, total sites
```
### Calculate acoustic indices
We will now calculate acoustic indices using a wrapper function to call the
`AnalysisPrograms` software. At the moment, `motifR` supports the folder
structure that is generated by `AP` only. In the future, different folder
structures may be supported. Note: If you have already generated indices with AP
you can skip this step. Or else you can generate the indices directly in your
terminal with AP.
#### Check if AP is available
If you don't have AP installed, visit this
[link](https://ap.qut.ecoacoustics.info/).
Problem: The path in RStudio terminal doesn't match the path in the system
terminal. I can call `AP` in a terminal, but not in Rstudio terminal. The PATH
imported into Rstudio, Rstudio terminal, both seem to be different. Run
`Sys.getenv("PATH")` in an Rmarkdown document, and then run `echo $PATH` in
Rstudio terminal. Using the below code adds `/Users/andrew/.local/bin` to my
path for Rstudio but not Rstudio terminal. This is the path to AP for my system
(based on the automatic installer for AP). Still need to test this flow on
windows.
Check your value for path in R terminal. echo $PATH and compare to a real
terminal.
```{r}
# check a package you know should be available on your path
find_program("ffmpeg")
# check if AP is available
find_program("AP")
# if you have installed AP but the above does not work, use the code below:
old_path <- Sys.getenv("PATH")
Sys.setenv(PATH = paste(old_path, "/Users/andrew/.local/bin", sep = ":"))
# now check AP again
# find_program("AP")
```
Now we can easily run AP commands in R:
```{r}
# AP("-help")
```
#### Errors in sound files - what happens?
Lets generate indices. All audio should be in one folder per study site (i.e.
recorder location).
```{r}
# AP_prepare("data/A2O_with_errors/", "output/indices-output-ap")
```
You might notice in the above example I used the input directory
`A2O_with_errors/`. This is data that I downloaded fresh from A2O. But I called
it this because I tried to run AP and already know it has errors, and wanted to
keep it as is to demonstrate what happens if you come across this issue. The
error given is: `System.FormatException: Failed parsing 'N/A' to get FORMAT
duration.` Nowadays, all files uploaded to A2O are automatically checked for
errors and fixed. But older files may still have errors. These files are from
January 2021 and have an issue. But don't worry, it is fixable and I will show
you how. If your files are OK you can skip this step.
We can run emu:
```{r}
# get the metadata from a2O files and output as a .csv
# fix broken audio files
# pay attention to the ' (doesn't work in mac for some reason) and directory
# structure. use backslashes to escape a space. use cd to check if your path is
# correct.
# cd ~/Documents/Data\ Science/Projects/Ecoacoustics/motifR/data/A2O/
# check the files
# ./emu fix check --all ~/Documents/Data\ Science/Projects/Ecoacoustics/motifR/data/A2O/**/*.flac
# the files have the FL010 metadata bug
# fix the files (dry run)
# ./emu fix apply -f FL010 --dry-run ~/Documents/Data\ Science/Projects/Ecoacoustics/motifR/data/A2O/**/*.flac
# real run
# ./emu fix apply -f FL010 ~/Documents/Data\ Science/Projects/Ecoacoustics/motifR/data/A2O/**/*.flac
```
#### Generating indices on fixed sound files
Now that I've fixed the files, i'll try to run AP analysis again. This will take
some time. If you had errors, clear the `indices-output` folder before running
this step on your fixed audio. Remember that AP generates values for acoustic
indices based on each 1 minute segment of audio.
```{r}
AP_prepare("data/A2O-mini-test-rename", "output/indices-output-ap-mini")
```
![Example output](motif_practice_insertimage_1.png)
## Step 2 - Time series
Construct a time series for the acoustic indices
Motifs have to be run ecosystems and months separately.
```{r}
# Notes from time_series.R to remember: ======
# next we run the motif analysis based on a geographic and month subset
# loop and make big df with all
# subset into geo and month id
# drop columns and get 1 df per index
# run hime per month, per geo, per index - keep DF as is and when doing hime, subset what you want and then run (but it has to be ordered - ordering files function - date, time, result minutes)
# hime takes .txt files
# hime runs on pwsh from R
# have to save the results files from hime
# can output everything in same directory
# set seed - when randomizing the labels etc for reproducible example important to set seed
# if they want to try follow they should get exactly same result, for RF and everything.
```
```{r}
# run the time_series
# output is a data frame of all indices data combined
# but this function also creates a folder, which stores the subset indices data that will be used for HIME input
# TODO: might need a function to return the dataframe without creating outputs again? but if you need to create the dataframe again its probably safesty to re-run the subsequent steps anyway, for consistency
# IMPORTANT: the time_series function relies on your file times and dates being labelled correctly. Otherwise it will not function correctly. The time series being ordered is a critical step in the process.
my_indices_data <-
motifR::time_series(
indicesfolder = "backup/indices-output-ap",
outputfolder = "output/timeseries"
)
```
## Step 3 - HIME
Starts with hime, hime processing, and then hime_processing_cont
Hime can be used in a terminal if you like. `run_hime` is just a helpful wrapper
function that calls HIME on each of the acoustic index time series files, and
stores the output in text files with names, and in a separate directory. Hime
was created by:
```{r}
# we can pass our own path
run_hime(
timeseriesdata = "output/timeseries",
himeoutput = "output/hime",
himepath = "~/HIME/bin/HIME_release.jar"
)
# or leave out himepath to use the default location of hime in your project working directory
# run_hime(timeseriesdata, himeoutput)
```
## Step 4 - generating some plots
Generates some time series / motif plots. Also outputs the motifs.csv files
(motif + timeseries data combined).
```{r}
motif_plots(
data_indices_all = my_indices_data,
outputfigures = "output/figures",
himedatapath = "output/hime-clean"
)
# should have finer scale options for generating plots if you want to. like just returning the plot as an object instead of saving them all?
```
`-- up to this point is working --`
## Step 5 -
-step6: 6_CompleteMotif ---- -step7: 7_CropSpectrogram ---- -step8:
8_FeatureExtraction ---- ATM this is called wavelet.R but might change. CHANGE
to feature extraction
```{r}
feature_extraction(
data_indices_all = my_indices_data,
himeclean = "output/hime-clean",
outputspecpath = "output/specs",
indicespath = "output/indices-output-ap",
outwavelet = "output/wavelets"
)
```