Is there a FROGS Guidelines or standard procedure?FROGS' design is highly modular and so allows users to choose their tools and processing order. However, default values are advised when possible, and a standard procedure for amplicon analysis should follow these steps:
- Pre-processing: depending on your data, assemble or not your paired-reads. Depending on your studied amplicon, fill the size parameters and primers fields.
- Clustering: thanks to Swarm capacities, clustering can be performed early in the process. It should be performed with an aggregation distance of 3 and with a denoising step.
- Removing chimeras: chimeras are PCR artifacts and should be removed at this step, using the clusters produced by Swarm.
- Filtering OTUs: a 0.005% abundance threshold should be apply to remove the remaining noisy clusters and obtain your OTUs. If your experimental design contain replicates, you should also filter clusters which are not present in at least two/three/more samples (depending on your design).
- Taxonomic affiliation: this step should be executed at the end of the process because it is the most time consuming one. Default is to produce blast affiliation and multi-affiliations, but RDP affiliation can be added.
- Visualization (optional): use “Cluster stat” (after steps 2, 3 and 4) for some supplementary figures and stats about your clusters (numbers, distributions etc.). Use “Affiliations stat” for some supplementary figures and stats (after step 5)
- Tree construction (optional): use it after step 4) if you want a phylogenetic tree of your OTUs
- Export functions (optional): use “BIOM to TSV” if you want an abundance table in tabular format. Use “BIOM to standard BIOM” if you need a BIOM file for your statistical analyses.
What data are processed by FROGS?FROGS is designed for the study of microbial communities from amplicon sequencing. The amplified area is chosen to be as distinctive as possible in the community you are interested in. For example, researchers favour 16S ribosomal RNA part when studying the bacterial composition of an environment.
However, FROGS can be used on any amplicon as long as the area of interest respects the constraints mentioned below:
- FROGS works on ribosomic RNA 16S, 18S, 23S, but also amplicons belonging from functionnal genes such as dsrB. The only limit is the reference database provided to the software.
- FROGS has been designed to manage data from Illumina sequencers (MiniSeq, MiSeq, NextSeq, HiSeq,...). It can, however, accept data from 454 if it is in fastq format. If your data is in SFF format you can use the SFF converter software to make fastq data (don't forget to use the option to remove adapter 454 and low quality sequences).
- The diagram below shows the main sequencing modes supported or not supported by FROGS:
In standard protocol, target DNA must be completely sequenced in the reads i.e. either a single-end reads starting from 5’ primer and finishing to the end of the 3’ primer or in paired-end case the forward and reverse reads must be overlapped.
Which filters should I use in FROGS-Filters?FROGS proposes various filters to meet users'needs, but depending on your data and scientific question, only a part of them should be used. The most used filters are the OTU filters based on samples and abundances.
- The “minimum number of samples” parameter should be used when processing datasets with replicated samples, or repeated samples. Except if you are particularly interested in OTUs that are not shared between your replicates or repetitions, this parameter allows you to remove all OTUs that are not detected in at least X samples (X corresponding to your level of replication/repetition).
- The “minimum proportion/number” parameter should always be used unless you are focusing on extremely rare OTUs and don’t care about false positive OTUs. Indeed, to ensure a good community description, a 0.005% abundance filter should be apply to your data.
- The “N biggest OTU” parameter is available if you have any reason to focus only on the biggest OTUs and want to remove the smallest ones. It should be use only with well-known samples or when interested in specific major OTUs.
- The taxonomic filters should be used only in specific situations, with well-known communities when not interested in poorly characterized taxa, or if you are interested only in some chosen and known taxa. They allow you to filter OTUs with bad taxonomic affiliations, based on thresholds applied to RDP bootstraps, blast e-value, identity or coverage.
- Finally, the contamination filter allow you to remove known contaminants, using a user-provided list. Remaining phiX sequences, technical artifacts introduced during Illumina could also be filtered with this filter.
How to fill the 5’ and 3’ primers fields in FROGS-Preprocess tool?The most frequent error encountered by FROGS new users is a wrong completion of their 5’ and 3’ primers. Make sure that you followed the instructions available at the bottom of the tool: primers should be provided as they are read on a 5’->3’ sequence. Generally, that means that your 3’ primers should be reverse-transcripted.
Example:5' ATGCCC GTCGTCGTAAAATGC ATTTCAG 3'
Value for parameter 5' primer: ATGCC
Value for parameter 3' primer: ATTTCAG
Degenerated nucleotides are accepted.