Skip to main content Link Menu Expand (external link) Document Search Copy Copied

assay metagenomics annotation template

Template for file-based Metagenomics annotations [Download]

Attribute Description Required columnType DependsOn Source Parent Valid Values
ModelSystemType Type of model system False BOOLEAN Sage Bionetworks ManifestColumn animal, cerebral organoid, immortalized cell line, iPSC, organoid, primary cell culture, Not assigned
isModelSystem Boolean flag indicating whether or not a file has data from a model system True BOOLEAN Sage Bionetworks ManifestColumn True, False, Not assigned
DNAextraction DNA extraction kit method. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
adapters Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters. Example - 'AATGATACGGCGACCACCGAGATCTACACGCT; CAAGCAGAAGACGGCATACGAGAT'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
analysisType Type of analysis False STRING Sage Bionetworks ManifestColumn ANOVA,batch effect correction,clustering,data normalization,de-novo assembly,dose response study,enrichment analysis,genome-wide association,mendelian randomization analysis,network analysis,polygenic risk scores, Not assigned
assemblyName Name/version of the assembly provided by the submitter used in the genome browsers and the community. Example - 'HuRef, JCVI_ISG_3_1.0'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
assemblyQual Assembly quality. The assembly quality category is based on sets of criteria outlined for each assembly quality category.High Quality- Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs.Medium Quality- Many fragments with little to no review of assembly other than reporting of standard assembly statistics.Low Quality- Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to, total assembly size, number of contigs, contig N50/L50, and maximum contig length. Example values - high-quality genome, medium-quality genome, low-quality genome True STRING ManifestColumn
assemblySoftware Tool(s) used for assembly, including version number and parameters. Example - 'metaSPAdes;3.11.0; kmer set 21,33,55,77,99,121, default parameters otherwise'. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
associatedResource Relevant electronic resources. A related resource that is referenced, cited, or otherwise associated to the sequence. Example - 'http-//www.earthmicrobiome.org/'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
consortium The name of the consortium True STRING Sage Bionetworks ManifestColumn ELITE, ELITE CDCP, Not assigned
dataFile Name of additional file(s) accompanying the data. The file(s) provide, for example, the name of any raw files generated by the instrument, generated reports from a vendor, an Rscript, or an instructions document. The files can be instrument raw files, converted peak lists such as mzML, MGF, or result files like mzIdentML. The files can be additional documentation submitted alongside the data needed for reuse and sharing purposes. Provide the file names of any additional files submitted alongside the data (ex., an ADAT file). Multiple file names can be separated by (;)Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
dataSubtype Further qualification of dataType, which may be used to indicate the state of processing of the data, aggregation of the data, or presence of metadata True STRING Sage Bionetworks ManifestColumn raw,processed,results,normalized,metadata, Not assigned
dataType The category or format of data generated or collected in an experiment, describing the kind of information the dataset contains (for example, genomics, imaging, proteomics, or behavioral data). True STRING_LIST Sage Bionetworks ManifestColumn clinical,drugScreen,electrophysiology,epigenetics,geneExpression,genomeAssembly,genomicVariants,imaging,lipidomics,metabolomics,metagenomics,Not applicable,Not collected,Not specified,Other,phenotype,proteomics,Unknown,wearableData
databaseLibrary The name(s) of the associated database library. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
databaseName The name of the search database (nr, SwissProt or est_human, and/or mass spectral library). Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
databaseSource The name of the organization, project, or laboratory from where the database is obtained (UniProt, NCBI, EBI, other). True STRING ManifestColumn
databaseVersion The version of the database. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
databaseWeblink An internet address that may provide details of a database or software search, as well as for labs that generate their own search method (web link). Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
experimentType The type of experiment used. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
experimentalFactor Variable aspects of an experiment design that can be used to describe an experiment or set of experiments in an increasingly detailed manner.This field accepts ontology terms from Experimental Factor Ontology (EFO) and/or Ontology for Biomedical Investigations (OBI). Example - 'time series design [EFO-0001779]'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
fileFormat Defined format of the data file, typically corresponding to extension, but sometimes indicating more general group of files produced by the same tool or software True STRING Sage Bionetworks ManifestColumn FASTQ, FASTA, SAM, BAM, CRAM, VCF, BCF, GTF, GFF, GFF3, BED, BigBed, WIG, BigWig, CSV, TSV, MTX, H5AD, LOOM, RDS,PED, MAP, BED_PLINK, TPED, TFAM, BEDGRAPH, NARROWPEAK, BROADPEAK, TAGALIGN, OME-TIFF, ND2, CZI, LSM, H5, HDF5, PARQUET, PDB, MMCIF, JSON, YAML, XML, TXT, XLSX, CSV, Not assigned
instrumentModel The model of the instrument used. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
libLayout Library layout. Specify whether to expect single, paired, or other configurations of reads. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
libReadsSeqd Library reads sequenced. Total number of clones sequenced from the library. Example - '20'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True INTEGER ManifestColumn
libSize Library size. Total number of clones in the library prepared for the project. Example - '50'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True INTEGER ManifestColumn
libVector Library vector. Cloning vector type(s) used in the construction of libraries. Example - 'Bacteriophage P1'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
measurementTechnique The name of the measurement technique describing the assay method. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn Bisulfite sequencing,Clinical data,Genome-wide association study,High-performance liquid chromatography tandem mass spectrometry,Liquid chromatography mass spectrometry,Liquid chromatography tandem mass spectrometry,Mass spectrometry,Metabolomics,MetaX-processed metabolomics data,Proximity extension assay,RNA sequencing,Single-cell RNA sequencing,Shotgun metagenomic sequencing,Single nucleotide polymorphism array,Tandem Mass Tag proteomics,Whole-genome sequencing,Unknown,Other,Not collected,Not applicable,Not specified
mid Multiplex identifiers. Molecular barcodes, called Multiplex Identifiers (MIDs), are used to tag unique samples in a sequencing run specifically. Sequence should be reported in uppercase letters. Example - 'GTGAATAT'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
nucleAcidExt Nucleic acid extraction. A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample. (example, https-//mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf). Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
numberContig Number of contigs. Total number of contigs in the cleaned/submitted assembly that comprise a given genome, SAG, MAG, or UViG. Example - '40'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified False INTEGER ManifestColumn
resourceType The type of resource being stored and annotated True STRING Sage Bionetworks ManifestColumn experimentalData,metadata,tool,analysis,computationalNotebook,softwareTool,Not assigned
samplePrepProtocol An internet address that may provide more details of the protocol information (web link). Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
seqMethod Sequencing method. Sequencing method used; Example - 'Sanger, pyrosequencing, ABI-solid'. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
seqPlatform Sequencing platform information. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
sop Relevant standard operating procedures. Standard operating procedures used in the assembly and/or annotation of genomes, metagenomes, or environmental sequences. Example - 'http-//press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/)'Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
sourceMatID Source material identifiers. A unique identifier assigned to a material sample (as defined by http-//rs.tdwg.org/dwc/terms/materialSampleID and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing.The identifier can refer to the original material collected or any derived sub-samples. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
speciesGroup The taxonomic ranking including both species and subspecies the individual belongs to. True STRING ManifestColumn Amphibian,Bird,Fish,Invertebrate,Mammal,Not applicable,Not collected,Not specified,Reptile,Unknown
speciesName The scientific name of the species (typically a taxonomic group, ex. "Eremophila alpestris) the individual belongs to.""" True STRING ManifestColumn Acomys cahirinus, Acomys russatus, Accipiter cooperii, Actitis macularius, Aix sponsa, Anas acuta, Anas carolinensis, Anas platyrhynchos, Antigone canadensis, Archilochus colubris, Artibeus jamaicensis, Baeolophus bicolor, Balaena mysticetus, Blarina brevicauda, Bombycilla cedrorum, Bonasa umbellus, Bos taurus, Branta canadensis, Bubo virginianus, Buteo jamaicensis, Buteo lineatus, Canis latrans, Cardinalis cardinalis, Castor canadensis, Cavia porcellus, Charadrius vociferus, Chinchilla lanigera, Columba livia, Condylura cristata, Corvus brachyrhynchos, Cricetomys ansorgei, Cricetulus barabensis, Cricetulus griseus, Cryptomys damarensis, Cuniculus paca, Cyanocitta cristata, Cygnus olor, Dryobates pubescens, Dumetella carolinensis, Ellobius lutescens, Ellobius talpinus, Eonycteris spelaea, Eptesicus fuscus, Equus caballus, Eremophila alpestris, Fukomys damarensis, Haemorhous mexicanus, Heterocephalus glaber, Hirundo rustica, Homo sapiens, Hydrochoerus hydrochaeris, Hydroprogne caspia, Hylocichla mustelina, Icteria virens, Larus argentatus, Larus delawarensis, Macaca fascicularis, Macaca mulatta, Mareca strepera, Marmota monax, Melanerpes carolinus, Meleagris gallopavo, Melospiza melodia, Meriones unguiculatus, Mesocricetus auratus, Microtus pennsylvanicus, Mimus polyglottos, Molothrus ater, Multi-species, Mus musculus, Myocastor coypus, Myotis lucifugus, Nannospalax galili, Neosciurus carolinensis, Neotoma cinerea, Neotoma floridana, Not applicable, Not collected, Not provided, Not specified, Octodon degus, Odocoileus virginianus, Ondatra zibethicus, Other, Pan troglodytes, Papio anubis, Passer domesticus, Passerina caerulea, Passerina cyanea, Peromyscus gossypinus, Peromyscus leucopus, Peromyscus maniculatus, Phalacrocorax auritus, Phasianus colchicus, Picoides villosus, Pipilo erythrophthalmus, Poecile carolinensis, Quiscalus quiscula, Rattus norvegicus, Rattus rattus, Regulus calendula, Saimiri boliviensis, Sayornis phoebe, Scalopus aquaticus, Sciurus carolinensis, Sciurus niger, Sciurus vulgaris, Scolopax minor, Setophaga citrina, Setophaga coronata, Setophaga dominica, Setophaga petechia, Setophaga pinus, Sialia sialis, Sigmodon hispidus, Sitta carolinensis, Spatula clypeata, Spinus tristis, Spizella passerina, Spizella pusilla, Spizelloides arborea, Struthio camelus, Sturnus vulgaris, Sus scrofa, Sylvilagus floridanus, Tachycineta bicolor, Tamias striatus, Tamiasciurus hudsonicus, Thryothorus ludovicianus, Toxostoma rufum, Troglodytes aedon, Turdus migratorius, Tursiops truncatus, Vicugna pacos, Vireo griseus, Vireo olivaceus, Zalophus californianus, Zenaida macroura, Zonotrichia albicollis, Unknown
specimenType Type of biological material sample taken from a biological entity for research purposes True STRING_LIST ManifestColumn blood, brain, buffy coat, cell line, cells, nervous system, organoid, plasma, saliva, serum, skin, stool, tissue, urine, Not applicable, Not collected, Not specified, Other, Unknown
studyKey The short acronym for a study name in a URL-friendly format (ex. LLFS_Metabolomics OR ELPSCRNA) True STRING Sage Bionetworks ManifestColumn
targetGene Targeted gene or locus name for marker gene studies. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
technologyPlatformVersion The specific version (application, manufacturer, model, lab, etc.) of a technology that is used to carry out a laboratory or computational experiment. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING ManifestColumn
Filename False STRING
metadataType For files of dataSubtype: metadata, a description of the type of metadata in the file False STRING Sage Bionetworks ManifestColumn individual,biospecimen,assay,supplementary files,Not Assigned
project The ELITE project short name associated with the tool True STRING Sage Bionetworks ILO BU, ILO TGEN, LC, LG, LLFS, NECS APOE, Not assigned
organ Indicate the organ the specimen is from. An organ is a unique macroscopic (gross) anatomic structure that performs specific functions. It is composed of various tissues. True STRING sage.annotations-experimentalData.organ-0.0.4 ManifestColumn blood, bone marrow, brain, breast, Bursa Of Fabricius, cerebrospinal fluid, colon, kidney, large intestine, liver, lung, lymph node, mammary gland, nerves, nose, ovary, pancreas, prostate, skin, spleen, Not collected, Not specified, Not applicable, Other, Unknown, plasma, gonadal fat, inguinal fat, gastrocnemius muscle
tissue Indicate the tissue the specimen is from. A tissue is a multicellular anatomical structure that consists of many cells of one or a few types arranged in an extracellular matrix. True STRING_LIST sage.annotations-experimentalData.tissue-0.0.11 ManifestColumn amygdala, amygdaloid complex, anterior cingulate cortex, angular gyrus, blood, bone marrow, Buccal Mucosa, Buffy Coat, caudate nucleus, cecum derived fecal material, cerebellar cortex, cerebellum, cerebral cortex, cortical plate, dorsal anterior cingulate cortex, dorsal pallium, Dorsal Root Ganglion, dorsolateral prefrontal cortex, dorsomedial prefrontal cortex, embryonic tissue, entorhinal cortex, fecal material, forebrain, frontal cortex, frontal lobe, frontal pole, fusiform gyrus, hippocampus, head of caudate nucleus, inferior frontal gyrus, inferior temporal cortex, inferior temporal gyrus, inferolateral temporal cortex, insula, insular cortex, lateral entorhinal cortex, left cerebral hemisphere, liver, mammillary body, medial dorsal nucleus of thalamus, medial entorhinal cortex, medial frontal cortex, medial ganglionic eminence, medial orbital frontal cortex, medial prefrontal cortex, meninges, midbrain, middle frontal gyrus, middle temporal gyrus, nerve tissue, Not Applicable, nucleus accumbens, occipital lobe, occipital visual cortex, olfactory neuroepithelium, orbitofrontal cortex, parahippocampal gyrus, parietal cortex, parietal lobe, plasma, posterior cingulate cortex, posteroinferior parietal cortex, posterior inferior parietal cortex, posterior superior temporal cortex, precentral gyrus, prefrontal cortex, primary auditory cortex, primary motor cortex, primary somatosensory cortex, primary tumor, primary visual cortex, putamen, right cerebral hemisphere, serum, splenocyte, striatum, subgenual anterior cingulate cortex, subgenual cingulate cortex, superior parietal lobe, superior temporal gyrus, temporal cortex, temporal lobe, temporal pole, thalamus, unspecified, ventricular zone, ventrolateral prefrontal cortex, VZ/SVZ, whole brain, gonadal fat, inguinal fat, kidney, plasma, liver, gastrocnemius muscle
familyStudyParticipant Indicates whether or not a file has data from a human participant involved in a family study (ex. LLFS) False STRING sage.annotations-demographics.ethnicityfamilyStudyParticipant-0.0.2 ManifestColumn Yes, No, Not Assigned
assay The analysis or technology used to generate the data in this file True STRING_LIST sage.annotations-experimentalData.assay-0.0.26 ManifestColumn 10x multiome, 16SrRNAseq, active avoidance learning behavior, anxiety-related behavior, ATACSeq, atomicForceMicroscopy, autoradiography, Baker Lipidomics, Biocrates Bile Acids, Biocrates p180, Biocrates Q500, bisulfiteSeq, Blood Chemistry Measurement, brightfieldMicroscopy, cellViabilityAssay, ChIPSeq, CITESeq, contextual conditioning behavior, CUT&Tag, DIA, DNA optical mapping, electrochemiluminescence, elevated plus maze test, elevated T maze apparatus method, ELISA, errBisulfiteSeq, exomeSeq, FIA-MSMS, FitBark, frailty assessment, Genotyping, HI-C, HiChIPseq, high content screen, HPLC, HPLC-MSMS, Immunocytochemistry, immunofluorescence, immunohistochemistry, in vivo bioluminescence, ISOSeq, jumpingLibrary, kinesthetic behavior, label free mass spectrometry, Laser Speckle Imaging, LC-MS, LC-MSMS, LC-SRM, Leiden Oxylipins, lentiMPRA, LFP, liquid chromatography-electrochemical detection, lncrnaSeq, locomotor activation behavior, long-read rnaSeq, LTP, MDMS-SL, memory behavior, Metabolon, methylationArray, MIB/MS, microRNAcounts, mirnaArray, mirnaSeq, MRI, mRNAcounts, MudPIT, m6A-rnaSeq, nextGenerationTargetedSequencing, Nightingale NMR, NOMe-Seq, novelty response behavior, open field test, oxBS-Seq, pharmacodynamics, pharmacokinetics, photograph, polymeraseChainReaction, Positron Emission Tomography, proximity extension assay, questionnaire, Rader Lipidomics, Real Time PCR, Ribo-Seq, rotarod performance test, rnaArray, rnaSeq, RPPA, sandwich ELISA, Sanger sequencing, scale, scATACSeq, scCGIseq, scirnaSeq, scrnaSeq, scwholeGenomeSeq, SiMoA, snpArray, snATACSeq, snrnaSeq, spontaneous alternation, STARRSeq, TMT quantitation, tractionForceMicroscopy, UPLC-MSMS, UPLC-ESI-QTOF-MS, UC Davis GCTOF, UCSD Untargeted Metabolomics, Vernier Caliper, von Frey test, westernBlot, wheel running, whole-cell patch clamp, wholeGenomeSeq, Wishart Catecholamines, Wishart High Value Metabolites, Zeno Electronic Walkway, Not collected, Not specified, Not applicable, Other, Unknown
platformLocation The name of the laboratory, facility, vendor, company, or location where the data generation platform was located, provided by the data contributor. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING sage.annotations-experimentalData.platformLocation-0.0.2 ManifestColumn
isMultiSpecimen Boolean flag indicating whether or not a file has data for multiple specimens True STRING Sage Bionetworks ManifestColumn true,false,Not assigned
libraryPrep The general strategy by which the library was prepared. Provide a value OR provide one of these values - Unknown Not collected, Not applicable, Not specified True STRING sage.annotations-ngs.libraryPrep-0.0.13 ManifestColumn amplicon, cellHashing, Chromium Single Cell 3', DNALibraryConstruction, EndItDNAEndRepairKit, KapaHyperPrep, lncRNAenrichment, methylSeq, miRNAenrichment, multiome, MULTIseq, PCRfree, polyAselection, proximity ligation, rRNAdepletion, snIsoSeq, SPLITseq, STARRSeq, SureCell, totalRNA, Unknown, Not collected, Not applicable, Not specified