Computational Genomics

Computation Genomics

Central Dogma of Biology

  • DNA: DNA is the genetic code

    • A, C, G, T
    • DNA is transcribed into RNA in a sequence dependent fashion
  • Messenger RNA: DNA is converted through a process called transcription to what’s called messenger RNA

    • A, C, G, U
    • three letters constitute what’s called a codon
    • a codon encode for one amino acid
    • messenger RNA is read 3 letters at a time to give you the different amino acids that make up the proteins
    • It’s the sequential non-overlapping reading of the sequence
  • Proteins: Messenger RNA is converted to proteins

    • Proteins are composed of amino acids

Complementary Bases

  • Say we have a DNA sequence
    • a DNA strand: ATG ATC TCG TAA
    • DNA has a direction, adding 5’ end and 3’ end: 5’ ATG ATC TCG TAA 3’
    • Can have the other strand of DNA going back the other way with complimentary bases
      • C <-> G
      • T <-> A
    • Once we have one strand, we can always calculate the other:
  • When transcription happens, it opens the DNA strand to top and bottom

    • Full DNA strand:

      • Top strand: 5’ ATG ATC TCG TAA 3’
      • Bottom strand: 3’ TAC TAG AGC ATT 5’
    • a RNA is made by template strand

      • T -> A
      • A -> U
      • C <-> G
    • Full RNA strand:

      • Coding strand: 5’ ATG ATC TCG TAA 3’
      • Template strand: 3’ AUG AUC UCG UAA 5’
    • Condon Table

      • So for AUG, it refers to Methionine

Sequence a Genome

How big is a human Genome?

  • Human Genome: 3.1 billion base pairs (3.1 Gbp)
  • Bacteria Genome: 2 millon bp (2 Mbp)


  • Sequencing the genome: Understanding and deciphering the order of A, G, C, T in the genomes
  • Although human genome has 3.1 Gbp, it’s only a combination of 4 bases: A,G,C,T

What is genome sequencing

  • Bacteria: mostly circular genome

Some Sequencing Terms

  • Open Reading Frame (ORF): A stretch of amino acids with no stop codon
  • Coding Sequence (CDS): An ORF that could encode a protein
  • Protein encoding gene (PEG)” An gene that could encode a protein
  • Hypothetical protein: Something that has not been experimentally shown
  • Polypeptide: A short stretch of amino acids (typically about 20 amino acids or less)
  • Contig: A contiguous piece of DNA sequence that has been assembled from more than one reads. It is compiled because, as noted above, the 5’ end of one sequence overlaps the 3’ end of another.
  • Read: The unit of DNA sequence that comes from a sequencing instrument. A single piece of DNA sequence.

   Reprint policy

《Computational Genomics》 by Isaac Zhou is licensed under a Creative Commons Attribution 4.0 International License