Alex J H Fedorec 02/06/2021
- Prerequisite Knowledge
- Installation
- A Quick Note on .csv Files
- Plate Reader Calibration
- Processing Plate Reader Data
- Flow Cytometry Processing
We have attempted to make our software usable with minimal prior knowledge of the R programming language or programming in general. You will need to be familiar with the idea of running commands from a console or writing basic scripts. For R beginners, this is a great starting point, there are some good resources here and we suggest using the RStudio application. It provides an environment for writing and running R code.
FlopR relies on several other R packages. Most of them are available through the “Comprehensive R Archive Network (CRAN)” which just means that they can be automatically installed. There are, however, a couple that need to be manually installed if you are using the flow cytometry processing functions. To do this use the following commands:
install.packages("devtools", repos = "https://cloud.r-project.org/")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("flowCore", "flowClust", "flowStats"))
Once those packages have been installed we are ready to install FlopR.
devtools::install_github("ucl-cssb/flopr")
This grabs the latest version of FlopR from GitHub and builds it on your computer. If it installed successfully we should now be able to load the FlopR package in R.
library(flopr)
The examples in this document can be run using the data in the “example”
folder. Download the whole folder and make sure that you have set your
current working directory (you can use the setwd()
command in R) to
the location where you saved it on your computer
This package makes extensive use of .csv (comma separated variables) files for data and meta-data. Unfortunately, not all .csv are created equal.
- When using a Windows OS computer we recommend that you save your .csv files explicitly with a UTF-8 file encoding (as shown below). This means that your files can be used by someone running a different operating system without error.
- Some parts of the world use decimal commas instead of decimal points. In these places, .csv files use “;” as a separator instead of “,”. Unfortunately, we are unable to automatically detect which format your .csv files are in. As such, you will have to ensure that your files are saved using “,” as the separator and decimal points for the decimal mark. This article has some help for doing so.
The experimental protocols for producing absorbance and fluorescence calibration data can be found here and here.
The data produced by plate readers from different manufacturers comes in different formats. The first step we need to do is to “parse” the data from the plate reader into a standard format. We have written an example parsers for use with data from Tecan plate readers and the Biotek Neo 2. If you need help writing a parser for your plate reader, please contact us and we’ll see what we can do.
For a Tecan plate reader, the data is saved as an Excel .xls file. As of flopR version 0.4.02, the spark_parse function accepts Excel files (.xls and .xlsx). If you are using an earlier version, the data first needs to be saved as a .csv file (open in Excel and “Save As” .csv) that can be read by R. We also need a .csv file telling us what is in each well of our microtitre plate. An example can be found in the “examples/plate_reader/tecan_spark” folder, but the first few rows looks like this:
calibrant | fluorophore | media | concentration | replicate | well |
---|---|---|---|---|---|
fluorescein | GFP | PBS | 7.53000e+14 | 1 | A1 |
fluorescein | GFP | PBS | 3.76500e+14 | 1 | A2 |
fluorescein | GFP | PBS | 1.88250e+14 | 1 | A3 |
fluorescein | GFP | PBS | 9.41250e+13 | 1 | A4 |
fluorescein | GFP | PBS | 4.70625e+13 | 1 | A5 |
fluorescein | GFP | PBS | 2.35313e+13 | 1 | A6 |
Once we have the calibration data and layout .csv files we can parse the data.
flopr::spark_parse(data_csv = "examples/plate_reader/tecan_spark/191219_calibration_membrane.csv",
layout_csv = "examples/plate_reader/tecan_spark/calibration_plate_layout.csv",
timeseries = FALSE)
The data_csv
argument takes the path to the calibration data.
layout_csv
is the path to the plate layout .csv file. Finally, the
Tecan plate readers save timeseries data differently from single
timepoint data, so we have a Boolean flag, timeseries
, that lets the
parser know that this is not a timeseries.
The spark_parse()
function saves the parsed calibration data in a new
.csv file, in the same location as the calibration data, with "_parsed"
appended to the filename. The first few rows look like this:
calibrant | fluorophore | media | concentration | replicate | well | OD600 | OD700 | GFP 40 | GFP 50 | GFP 60 | GFP 70 | GFP 80 | GFP 90 | GFP 100 | GFP 110 | GFP 120 | row | column |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
fluorescein | GFP | PBS | 10.0000 | 1 | A1 | 0.0921 | 0.0831 | 1830 | 9811 | 36911 | NA | NA | NA | NA | NA | NA | A | 1 |
fluorescein | GFP | PBS | 5.0000 | 1 | A2 | 0.0942 | 0.0852 | 932 | 4993 | 19221 | 54510 | NA | NA | NA | NA | NA | A | 2 |
fluorescein | GFP | PBS | 2.5000 | 1 | A3 | 0.1015 | 0.0928 | 453 | 2434 | 9448 | 27864 | NA | NA | NA | NA | NA | A | 3 |
fluorescein | GFP | PBS | 1.2500 | 1 | A4 | 0.0985 | 0.0899 | 232 | 1253 | 4853 | 14513 | 36349 | NA | NA | NA | NA | A | 4 |
fluorescein | GFP | PBS | 0.6250 | 1 | A5 | 0.0957 | 0.0905 | 114 | 617 | 2404 | 7128 | 18282 | 41812 | NA | NA | NA | A | 5 |
fluorescein | GFP | PBS | 0.3125 | 1 | A6 | 0.1027 | 0.0965 | 54 | 294 | 1137 | 3388 | 8672 | 20095 | 42244 | NA | NA | A | 6 |
The parser has extracted the information we need from the calibration data and merged it with the plate layout information so that we now have columns containing each of the measurements for each well.
Now we can actually calculate our calibration coefficients. To do this we just need to use one function and give it our parsed data.
flopr::generate_cfs(calibration_csv = "examples/plate_reader/tecan_spark/191219_calibration_membrane_parsed.csv")
For details about how this process works, you can read our paper here. At the end, there should be two .pdf images showing the calibration curves for absorbance and fluorescence, along with a new .csv file, appended with "_cfs", containing the parameters for use in the future, the first few rows of which look like this:
cf | beta | calibrant | fluorophore | measure |
---|---|---|---|---|
184.7274 | 0.0057476 | fluorescein | GFP | GFP 40 |
996.4472 | -0.0035424 | fluorescein | GFP | GFP 50 |
3821.0497 | -0.0004737 | fluorescein | GFP | GFP 60 |
11298.9531 | 0.0002015 | fluorescein | GFP | GFP 70 |
29128.3783 | -0.0007481 | fluorescein | GFP | GFP 80 |
65834.9342 | 0.0012751 | fluorescein | GFP | GFP 90 |
The “cf” column contains the calibration coefficients that will be used to calibrate our data in later experiments.
Before we get too excited we need to check the images to make sure that the calibration curves look sensible. The software attempts to remove data points which it deems are invalid, but this process isn’t perfect and occasionally may need you to remove data points from the "*_parsed.csv" file. Using the example data, you can see (here) that some of the fluorescein wells are considered valid absorbance measurements. In this case, it isn’t the end of the world since we would never use the parameters produced from those two curves.
n.b. Currently the software is setup to work with “microspheres” for calibrating cell number and “fluorescein” for calibrating GFP fluorescence. We hope to extend it in the near future to work with other calibrants.
As mentioned above, because plate readers from different manufacturers save the data in different formats, the first step we need to do is parse our raw data. The parser that we provide takes Tecan plate reader data in the form of a .csv file. We also need a .csv file telling us what is in each well of you microtitre plate. This can include any information that you wish; we include as much meta-data as possible as it makes our data analysis later much smoother. (Before version 0.4.01: The only requirement is that the last column must be named “well” and include an identifier (usually the well id i.e. B2) that can be matched to the same identifier in the plate reader data.) Here’s an example where we include information about the strains, plasmids, media, inducers, etc.:
strain | host | plasmid | plasmid_2 | strain_2 | media | sugar | amino_acids | inducer | concentration | init_ratio | init_dilution | well |
---|---|---|---|---|---|---|---|---|---|---|---|---|
NA | NA | NA | A1 | |||||||||
NA | NA | NA | A2 |
Now we can use our parsing function.
flopr::spark_parse(data_csv = "examples/plate_reader/tecan_spark/200228_example_data.csv",
layout_csv = "examples/plate_reader/tecan_spark/200228_example_layout.csv",
timeseries = TRUE)
The data is extracted and the meta-data from the layout is attached.
Note that this time the data a timeseries so we set the timeseries
flag to TRUE
. A new .csv file is produced containing the parsed data
with "_parsed" appended to the filename.
Now we can start processing our data. There is one function that does
all the work for: process_plate()
. There are a few arguments that we
need to give the function which will control what happens.
flopr::process_plate(data_csv = "examples/plate_reader/tecan_spark/200228_example_data_parsed.csv",
blank_well = c("C12", "D12"),
neg_well = c("C6", "D6", "E6"),
od_name = "OD700",
flu_names = c("GFP", "mCherry"),
af_model = "spline",
to_MEFL = TRUE,
flu_gains = 135,
conversion_factors_csv = "examples/plate_reader/tecan_spark/191219_calibration_membrane_parsed_cfs.csv")
Let’s walk through what each of the arguments do:
data_csv
is the path to our parsed data.blank_well
are the well identifiers of wells containing media blanks. These are used for normalising absorbance. If you only have one blank well you can specify it usingblank_well = "C12"
for example.neg_well
are the well identifiers of wells containing negative controls. These are used for normalising fluorescence. As above, if you only have one negative control well you can specify it usingneg_well = "C6"
for example.od_name
is the name of the column containing our absorbance values in the parsed data .csv file. Currently we can only use one absorbance column, so if you record absorbance at multiple wavelengths (like I do), you will have to pick one.flu_names
are the names of the columns containing our fluorescence values. You can include as many or as few of you fluorescence columns as you like. Whichever columns are named in here, we will attempt to normalise.af_model
allows you to choose the type of model that we are going to use for fluorescence normalisation. We’ll discuss the available choices below.to_MEFL
is a Boolean flag that lets you tell the function if you want to convert the fluorescence data into calibrated units. You can only do this if you have carried out the calibration as detailed above.flu_gains
is where you specify the gain at which your fluorescence data was recorded for each fluorescence channel. Here we only have calibration parameters for “GFP” so we only specify one gain value.conversion_factors_csv
is the path to the calibration parameters that you generated using the protocol detailed above.
When we run this function the absorbance is normalised, then the fluorescence and finally (if desired) the absorbance and fluorescence values are calibrated. Finally, the processed data is saved in a new .csv file with "_processed" appended to the filename and with additional columns for each of the processed values.
We also save some .pdf images comparing the raw and normalised data, and images showing the fluorescence normalisation curves. Using these plots, there are a few checks that we should make before celebrating.
- Check that none of the blank wells you used showed any contamination (it happens to the best of us). If there is growth in a blank well it will throw off the absorbance normalisation.
- Check that the fluorescence normalisation curves fit the data. Read below for more detail on possible issues.
Autofluorescence is the fluorescence produced by anything other than the fluorophores that we are interested in measuring. A small amount usually comes from growth media and can be minimised by choosing certain medias. A large contribution comes from molecules produced and secreted by our cells. Some of these molecules show particularly strong emission at similar wavelengths to GFP. We observe that the level of autofluorescence is not simply proportional to the number of cells or optical density of our culture. As cells enter stationary phase, autofluorescence increases, perhaps due increased production of the autofluorescent molecules and changes in cell size.
In order to remove this autofluorescence from our sample data we fit a curve to our negative control data. We provide four different models that the user can choose to fit their data (and if desired more models can be added). There are two smoothing models: “loess” and “spline”. The primary difference to the user between these two models is that the “spline” is able to extrapolate beyond the negative control data provided. This means that if the range of absorbance values at which you have measurements of your negative control is smaller than the range of your samples, we can still make an attempt at normalisation. However, this extrapolation is very crude (linear from the last data point) and can produce poor normalisation in the extrapolated range. Fortunately, negative controls tend to grow better than fluorescent samples, so extrapolation is often not an issue. We also provide a second-order polynomial and an exponential model, specified by “polynomial” and “exponential” repsectively. These are inherently able to make predictions beyond the range of normalised data and therefore may be good starting points.
Prolonged periods in stationary phase can cause autofluorescence to increase while absorbance remains stable and in some cases absorbance can start decreasing while autofluorescence does not. In these circumstances, none of the models perform particularly well. The performance of each of them should be checked and if none of them perform satisfactorily it may be necessary to trim the data to remove confounding timepoints.
We have two functions for processing flow cytometry data:
process_fcs
takes a single .fcs file, removes debris and doublets and saves the trimmed data in a new .fcs file.process_fcs_dir
takes a folder of .fcs files and performs the same trimming on each. It can also perform fluorescence normalisation if you have a negative control and fluorescence calibration if you have measured a calibrant.
To process a single .fcs file we can run the following command:
flopr::process_fcs(fcs_file = "examples/flow_cytometry/DATA/20191121/pWeak_None_0_1.fcs",
flu_channels = "BL1-H",
pre_cleaned = TRUE,
do_plot = TRUE)
We need to give the function four bits of information
fcs_file
is the path to the .fcs file that we want to process.flu_channels
are the names of the fluorescence channels that we recorded. When processing a single .fcs file like this we don’t actually do anything with the fluorescence data. However, if you include the channel names you can see the data in a plot that is saved. If you have more than one fluorescence channel, the argument needs to be a vector, which will look something like this:flu_channels = c("BL1-H", "BL2-H", YL2-H")
pre_cleaned
lets the function know if you have gated out debris on the flow cytometer. Most people do this when running their experiments by setting a threshold on forward-scatter and side-scatter. But some people like to record everything and process the data later.do_plot
lets the function know if you want a plot to be saved of the trimming process.
In the end we have a new .fcs file with "_processed" appended to the filename, and if you asked the function to save a plot we will have a .pdf that looks something like this.
It’s much more likely that we have more than one sample that we want to process. This is where the other function comes in handy.
flopr::process_fcs_dir(dir_path = "examples/flow_cytometry/DATA/20191121",
pattern = "*Med*.fcs",
flu_channels = "BL1-H",
pre_cleaned = TRUE,
do_plot = TRUE,
neg_fcs = "pNeg_None_0_1.fcs",
calibrate = TRUE,
mef_peaks = list(list(channel = "BL1-H",
peaks = c(0, 822, 2114, 5911, 17013, 41837, 145365, 287558))))
There are a few more arguments to this function and some of them look a bit complicated so let’s go through them.
dir_path
is the path to the folder with your .fcs files in.pattern
allows us to just process a subset of the .fcs files in the folder. Here we are just going to process files with “Med” in the filename. Without going into too much detail, this uses a simplified version of “regular expressions” called “globbing patterns”. We can use “wildcard” symbols to represent any character (? is a place holder for an single character and * is a place holder for 0 to any number of characters). In this example all we know is that “Med” appears somewhere in the filenames and they all end with “.fcs”. So we use the * character to show that there are some unknown characters before “Med” and between “Med” and “.fcs”. If you want to process all .fcs files in your folder, the pattern would be "*.fcs".flu_channels
is as above.pre_cleaned
is as above.do_plot
is as above.neg_fcs
is the filename of your negative control sample. It must be in the folder with the other .fcs files. This is used to normalise autofluorescence. If you don’t specify a filename here, the processing will be carried out without any normalisation steps.calibrate
tells the function whether you want to calibrate the fluorescence measurements. To be able to calibrate you must have an .fcs file with data from calibration beads and it must have “beads” somewhere in the filename. For discussion about calibration beads, read our paper or check out TASBE and FlowCal.mef_peaks
is where we tell the function what the true fluorophore values are for our beads. These will be available from the bead manufacturer. We need to specify peaks for each of the channels that we want to calibrate and the channel name needs to correspond to one given influ_channels
. Here we are just calibrating the “BL1-H” channel which corresponds to GFP on our flow cytometer. If you wanted to calibrate two channels it would look something like this:
mef_peaks = list(list(channel = "BL1-H",
peaks = c(0, 822, 2114, 5911, 17013, 41837, 145365, 287558)),
list(channel = "YL2-H",
peaks = c(0, 218, 581, 1963, 6236, 15267, 68766, 181945)))
The function carries out the trimming as described above. Then, if a
negative control file is given, it performs fluorescence normalisation
by negating the geometric mean of the negative control’s fluorescence in
each channel from each of the other samples. Finally, if the files are
going to be calibrated, a calibration curve is fit to the bead peaks in
each of the fluorescence channels in mef_peaks
. We use a model
developed for FlowCal for our calibration curve. We then calibrate both
the raw and normalised data since normalisation can produce fluorescence
values less than or equal to 0, which get removed during calibration due
to working with logged data (the output will contain a both sets of
calibration). The new data, any plots produced, and a data_summary.csv
file (containing geometric statistics for each .fcs file) will be saved
to a new folder with the same name as the original but with
"_processed" appended.
There are a few checks that should be made to reassure yourself that everything has worked.
- To confirm correct trimming, we recommend setting
do_plot = TRUE
so that you can see which events have been removed during processing. Check that the debris, if there is any, and doublets have been correctly identified and removed. - For each fluorescence calibration channel, a plot will be saved
showing the bead peaks that have been identified and the calibration
curve that we have fit to them. Sometime the bead peaks aren’t
identified correctly. In this case, there is another argument that
the function can take
beads_dens_bw
, which has a default value of0.025
. To identify the beads we use something called a Gaussian kernel to smooth the fluorescence data and pick the highest points. Sometimes this smoothing isn’t quite right; we might not smooth enough and one peak is identified as two, or we smooth too much and lose peaks. In the former case we need to increasebeads_dens_bw
and in the latter we decrease it. If tweaking this value doesn’t work, we also provide a way to manually specify identify the peaks using themanual_peaks
argument. For this data, a manually specified set of peaks would look like this:
manual_peaks = list(list(channel = "BL1-H",
peaks = c(1.9, 2.5, 2.9, 3.3, 3.7, 4.2, 4.6, 4.95)))