Computational Bioinformatics

Data Sources

Data Sources for Bioinformatics

This page lists reliable datasets and databases for sequence, structure, expression, and pathway analysis. Each resource includes common use cases and citation guidance.

Core sequence and genome resources

  • NCBI (GenBank, RefSeq, GEO, SRA)
    • https://www.ncbi.nlm.nih.gov/
    • Use for: general sequence records, curated reference genomes, gene expression data, and raw reads.
    • Cite: follow the dataset accession page; many provide a preferred citation.
  • ENA (European Nucleotide Archive)
  • Ensembl
  • UCSC Genome Browser

Protein, structure, and function

Expression, variation, and pathways

Citation tips

  • Prefer accession numbers over informal dataset names.
  • Check the dataset page for a citation or DOI.
  • Include database name and version when available.

Suggested class activities

  • Pick one database and write a short guide: what it stores, how to search it, and how to cite it.
  • Compare two sources for the same gene or protein and note differences in annotation.