Command lines
General
FROGS, MOTHUR, USEARCH and QIIME have been launched on all the datasets by the script <FROGS_dir>/assessment/assessment.py
To get it: github.com/geraldinepascal/FROGS/tree/master/assessment/bin
This script is called with the following command:
<FROGS_dir>/assessment/bin/assessment.py \
--nb-cpus 1 \
--datasets-directory /save/frogs/assessment_datasets/datasets_utax \
--frogs-directory <FROGS_dir> \
--affiliation-databank-fasta /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa \
--affiliation-databank-tax /save/frogs/assessment_datasets/databank/uparse/refdb_clean.tax \
--affiliation-databank-udb /save/frogs/assessment_datasets/databank/uparse/refdb.udb \
--mothur-databank /save/frogs/assessment_datasets/databank/mothur/silva.bacteria.fasta \
--mothur-taxonomy /save/frogs/assessment_datasets/databank/mothur/silva.bacteria.rdp6.tax \
--output-directory /work/frogs/assessment_datasets_utax/last/results
FROGS
Version: 1.4.0
Protocol: guidelines at sept-2016
Command lines example:
preprocess.py illumina \
--nb-cpus 1 \
--min-amplicon-size 350 --max-amplicon-size 550 \
--without-primers --already-contiged \
--input-R1 reads/sample01-20sp-Powerlaw.fastq reads/sample02-20sp-Powerlaw.fastq reads/sample03-20sp-Powerlaw.fastq reads/sample04-20sp-Powerlaw.fastq reads/sample05-20sp-Powerlaw.fastq reads/sample06-20sp-Powerlaw.fastq reads/sample07-20sp-Powerlaw.fastq reads/sample08-20sp-Powerlaw.fastq reads/sample09-20sp-Powerlaw.fastq reads/sample10-20sp-Powerlaw.fastq \
--output-dereplicated frogs/prepro.fasta \
--output-count frogs/prepro.tsv \
--summary frogs/prepro_summary.html \
--log-file frogs/prepro_log.txt
clustering.py \
--nb-cpus 1 \
--input-fasta frogs/prepro.fasta \
--input-count frogs/prepro.tsv \
--output-biom frogs/clustering.biom \
--output-fasta frogs/clustering.fasta \
--output-compo frogs/clustering_compo.tsv \
--log-file frogs/clustering_log.txt \
--distance 3 --denoising
remove_chimera.py \
--nb-cpus 1 \
--input-fasta frogs/clustering.fasta \
--input-biom frogs/clustering.biom \
--non-chimera frogs/removeChimera.fasta \
--out-abundance frogs/removeChimera.biom \
--summary frogs/removeChimera_summary.html \
--log-file frogs/removeChimera_log.txt
filters.py \
--input-biom frogs/removeChimera.biom \
--input-fasta frogs/removeChimera.fasta \
--output-fasta frogs/frogs.fasta \
--output-biom frogs/filters.biom \
--excluded frogs/filters_excluded.txt \
--summary frogs/filters_summary.html \
--log-file frogs/filters_log.txt \
--min-abundance 0.00005
affiliation_OTU.py \
--nb-cpus 1 \
--reference /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa \
--input-fasta frogs/frogs.fasta \
--input-biom frogs/filters.biom \
--output-biom frogs/frogs.biom \
--summary frogs/affiliationOTU_summary.html \
--log-file frogs/affiliationOTU_log.txt \
--java-mem 20
Usearch
Version: v8.1.1861_i86linux32
Protocol: guidelines at sept-2016 from http://drive5.com/usearch/manual/uparse_pipeline.html
Command lines example:
usearch -fastq_filter reads/sample02-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_1.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample03-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_2.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample04-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_3.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample05-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_4.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample06-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_5.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample07-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_6.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample08-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_7.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample09-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_8.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample10-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_9.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample01-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_10.fastq -threads 1 -relabel @
cat uparse/uparse_tmp_spl_1.fastq uparse/uparse_tmp_spl_2.fastq uparse/uparse_tmp_spl_3.fastq uparse/uparse_tmp_spl_4.fastq uparse/uparse_tmp_spl_5.fastq uparse/uparse_tmp_spl_6.fastq uparse/uparse_tmp_spl_7.fastq uparse/uparse_tmp_spl_8.fastq uparse/uparse_tmp_spl_9.fastq uparse/uparse_tmp_spl_10.fastq > uparse/uparse_tmp_merged.fastq
usearch \
-fastq_filter uparse/uparse_tmp_merged.fastq \
-fastaout uparse/uparse_tmp_filtered.fasta \
-threads 1 \
-fastq_maxee 1.0 -fastq_maxns 0
usearch \
-derep_fulllength uparse/uparse_tmp_filtered.fasta \
-fastaout uparse/uparse_tmp_uniques.fasta \
-threads 1 \
-sizeout
usearch \
-sortbysize uparse/uparse_tmp_uniques.fasta \
-fastaout uparse/uparse_tmp_sorted \
-minsize 2
usearch \
-cluster_otus uparse/uparse_tmp_sorted \
-otus uparse/uparse_tmp_seeds.fasta \
-uparseout uparse/uparse_tmp_clusters.txt \
-relabel Cluster_ \
-sizein -sizeout
usearch \
-utax uparse/uparse_tmp_seeds.fasta \
-db /save/frogs/assessment_datasets/databank/uparse/refdb.udb \
-fastaout uparse/uparse_tmp_affiliation.fasta \
-strand both \
-threads 1
usearch \
-usearch_global uparse/uparse_tmp_merged.fastq \
-db uparse/uparse_tmp_seeds.fasta \
-biomout uparse/uparse_tmp_woAffi.biom \
-strand both -id 0.97 \
-threads 1
addUtaxFromFasta.py \
--input-fasta uparse/uparse_tmp_affiliation.fasta \
--input-biom uparse/uparse_tmp_woAffi.biom \
--output-biom uparse/uparse.biom \
--taxonomy-tag taxonomy
addSeedsRef.py \
--seeds-fasta uparse/uparse_tmp_seeds.fasta \
--reads reads/sample02-20sp-Powerlaw.fastq reads/sample03-20sp-Powerlaw.fastq reads/sample04-20sp-Powerlaw.fastq reads/sample05-20sp-Powerlaw.fastq reads/sample06-20sp-Powerlaw.fastq reads/sample07-20sp-Powerlaw.fastq reads/sample08-20sp-Powerlaw.fastq reads/sample09-20sp-Powerlaw.fastq reads/sample10-20sp-Powerlaw.fastq reads/sample01-20sp-Powerlaw.fastq \
--annotated-fasta uparse/uparse.fasta
Mothur
Version: v.1.33.1
Protocol: guidelines at sept-2016 from http://www.mothur.org/wiki/MiSeq_SOP
rvc.py --input reads/sample10-20sp-Powerlaw.fastq --output reads/sample10-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample06-20sp-Powerlaw.fastq --output reads/sample06-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample03-20sp-Powerlaw.fastq --output reads/sample03-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample05-20sp-Powerlaw.fastq --output reads/sample05-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample01-20sp-Powerlaw.fastq --output reads/sample01-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample08-20sp-Powerlaw.fastq --output reads/sample08-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample02-20sp-Powerlaw.fastq --output reads/sample02-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample07-20sp-Powerlaw.fastq --output reads/sample07-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample04-20sp-Powerlaw.fastq --output reads/sample04-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample09-20sp-Powerlaw.fastq --output reads/sample09-20sp-Powerlaw.fastq.RC
mothur "#make.contigs(file=stability.file, processors=1)"
mothur "#screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, maxambig=0, maxn=0, minlength=350, maxlength=550, processors=1)"
mothur "#unique.seqs(fasta=stability.trim.contigs.good.fasta)"
mothur "#count.seqs(name=stability.trim.contigs.good.names, group=stability.contigs.good.groups)"
cp /save/frogs/assessment_datasets/databank/mothur/silva.bacteria.fasta restriction_db.fasta
mothur "#pcr.seqs(fasta=restriction_db.fasta, keepprimer=T, start=6000, end=27000, keepdots=F, processors=1)"
mothur "#align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=restriction_db.pcr.fasta, processors=1)"
mothur "#summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, processors=1)"
mothur "#screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=428, end=16451, processors=1)"
mothur "#filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=., processors=1)"
mothur "#unique.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, count=stability.trim.contigs.good.good.count_table)"
mothur "#pre.cluster(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)"
mothur "#chimera.uchime(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t, processors=1)"
mothur "#remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos)"
cp /save/frogs/assessment_datasets/databank/mothur/silva.bacteria.rdp6.tax restriction_db.tax
mothur "#classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=restriction_db.fasta, taxonomy=restriction_db.tax, cutoff=80)"
mothur "#remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.restriction_db.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)"
mothur "#dist.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20, processors=1)"
mothur "#cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table)"
cp /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa affiliation_db.fasta
cp /save/frogs/assessment_datasets/databank/uparse/refdb_clean.tax affiliation_db.tax
mothur "#classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=affiliation_db.fasta, taxonomy=affiliation_db.tax, cutoff=0)"
mothur "#classify.otu(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.affiliation_db.wang.taxonomy, label=0.03)"
mothur "#make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, label=0.03)"
mothur "#make.biom(shared=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.shared, constaxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.taxonomy)"
ln -sf stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.biom mothur/mothur.biom
mothur "#get.oturep(method=abundance, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.03)"
mothurDeGapSeeds.py \
--input stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.rep.fasta \
--output stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.rep.degap.fasta
mothurAddSeedRef.py \
--input stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.rep.degap.fasta \
--reads reads/sample10-20sp-Powerlaw.fastq reads/sample06-20sp-Powerlaw.fastq reads/sample03-20sp-Powerlaw.fastq reads/sample05-20sp-Powerlaw.fastq reads/sample01-20sp-Powerlaw.fastq reads/sample08-20sp-Powerlaw.fastq reads/sample02-20sp-Powerlaw.fastq reads/sample07-20sp-Powerlaw.fastq reads/sample04-20sp-Powerlaw.fastq reads/sample09-20sp-Powerlaw.fastq \
--trimmed-reads stability.trim.contigs.good.unique.good.filter.fasta \
--output mothur/mothur.fasta
Qiime
Version: v.1.9.0
Protocol: guidelines at sept-2016 from http://qiime.org/tutorials/otu_picking.html and from Rideout et al. (2014) with denovo chimera removal.
Command lines example:
qiime; split_libraries_fastq.py \
-i reads/sample01-20sp-Powerlaw.fastq,reads/sample02-20sp-Powerlaw.fastq,reads/sample03-20sp-Powerlaw.fastq,reads/sample04-20sp-Powerlaw.fastq,reads/sample05-20sp-Powerlaw.fastq,reads/sample06-20sp-Powerlaw.fastq,reads/sample07-20sp-Powerlaw.fastq,reads/sample08-20sp-Powerlaw.fastq,reads/sample09-20sp-Powerlaw.fastq,reads/sample10-20sp-Powerlaw.fastq \
--sample_ids sample01,sample02,sample03,sample04,sample05,sample06,sample07,sample08,sample09,sample10 \
-o qiime/qiime_workdir/qiime_preprocess \
--barcode_type 'not-barcoded' \
--phred_offset 33
qiime; identify_chimeric_seqs.py \
-i qiime/qiime_workdir/qiime_preprocess/seqs.fna \
-m usearch61 \
--suppress_usearch61_ref \
-o qiime/qiime_workdir/usearch61_chimeras
qiime; filter_fasta.py \
-f qiime/qiime_workdir/qiime_preprocess/seqs.fna \
-o qiime/qiime_workdir/usearch61_chimeras/seqs_chimeras_filtered.fna \
-s qiime/qiime_workdir/usearch61_chimeras/chimeras.txt \
-n
qiime; pick_open_reference_otus.py \
-i qiime/qiime_workdir/usearch61_chimeras/seqs_chimeras_filtered.fna \
-o qiime/qiime_workdir/pick_open_reference_otus \
-r /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa \
--suppress_align_and_tree \
--suppress_taxonomy_assignment
qiime; assign_taxonomy.py \
-o qiime/qiime_workdir/uclust_assigned_taxonomy \
-i qiime/qiime_workdir/pick_open_reference_otus/rep_set.fna \
-t /save/frogs/assessment_datasets/databank/uparse/refdb_clean.tax \
-r /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa
qiime; completTax.py \
-i qiime/qiime_workdir/uclust_assigned_taxonomy/rep_set_tax_assignments.txt \
-o qiime/qiime_workdir/uclust_assigned_taxonomy/rep_set_completeTax_assignments.txt
biom add-metadata \
-i qiime/qiime_workdir/pick_open_reference_otus/otu_table_mc2.biom \
--observation-metadata-fp qiime/qiime_workdir/uclust_assigned_taxonomy/rep_set_completeTax_assignments.txt \
-o qiime/qiime_workdir/otu_table_mc2_w_tax.biom \
--sc-separated taxonomy \
--observation-header OTUID,taxonomy
biom convert \
-i qiime/qiime_workdir/otu_table_mc2_w_tax.biom \
-o qiime/qiime.biom \
--table-type="OTU table" \
--to-json
addSeedsRef.py \
-s qiime/qiime_workdir/pick_open_reference_otus/rep_set.fna \
-r reads/sample01-20sp-Powerlaw.fastq reads/sample02-20sp-Powerlaw.fastq reads/sample03-20sp-Powerlaw.fastq reads/sample04-20sp-Powerlaw.fastq reads/sample05-20sp-Powerlaw.fastq reads/sample06-20sp-Powerlaw.fastq reads/sample07-20sp-Powerlaw.fastq reads/sample08-20sp-Powerlaw.fastq reads/sample09-20sp-Powerlaw.fastq reads/sample10-20sp-Powerlaw.fastq \
-a qiime/qiime.fasta
Execution time
For this evaluation the FROGS pipeline has been run on an Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz.