Data Sources
Data Sources for Bioinformatics
This page lists reliable datasets and databases for sequence, structure, expression, and pathway analysis. Each resource includes common use cases and citation guidance.
Core sequence and genome resources
- NCBI (GenBank, RefSeq, GEO, SRA)
- https://www.ncbi.nlm.nih.gov/
- Use for: general sequence records, curated reference genomes, gene expression data, and raw reads.
- Cite: follow the dataset accession page; many provide a preferred citation.
- ENA (European Nucleotide Archive)
- https://www.ebi.ac.uk/ena/
- Use for: raw sequencing data and assembled sequences.
- Ensembl
- https://www.ensembl.org/
- Use for: genome browsers, annotations, comparative genomics.
- UCSC Genome Browser
- https://genome.ucsc.edu/
- Use for: visualization, genome tracks, custom annotations.
Protein, structure, and function
- UniProt
- https://www.uniprot.org/
- Use for: protein sequences, function, and cross references.
- PDB (Protein Data Bank)
- https://www.rcsb.org/
- Use for: 3D protein and nucleic acid structures.
- AlphaFold DB
- https://alphafold.ebi.ac.uk/
- Use for: predicted protein structures.
- InterPro
- https://www.ebi.ac.uk/interpro/
- Use for: domains, families, and functional analysis.
Expression, variation, and pathways
- GEO (Gene Expression Omnibus)
- https://www.ncbi.nlm.nih.gov/geo/
- Use for: microarray and RNA-seq expression datasets.
- GTEx (Genotype-Tissue Expression)
- https://gtexportal.org/
- Use for: tissue-specific expression patterns.
- ClinVar
- https://www.ncbi.nlm.nih.gov/clinvar/
- Use for: variant clinical significance.
- KEGG
- https://www.kegg.jp/
- Use for: pathways, reactions, and pathway maps.
- Reactome
- https://reactome.org/
- Use for: curated biological pathways.
Citation tips
- Prefer accession numbers over informal dataset names.
- Check the dataset page for a citation or DOI.
- Include database name and version when available.
Suggested class activities
- Pick one database and write a short guide: what it stores, how to search it, and how to cite it.
- Compare two sources for the same gene or protein and note differences in annotation.
