Amino Acid | 3-Letter[122] | 1-Letter[122] | Side-chain polarity[122] | Side-chain charge (pH 7.4)[122] | Hydropathy index[123] | Absorbanceλmax(nm)[124] | ε at λmax (mM−1 cm−1)[124] | MW(Weight)[125] |
---|---|---|---|---|---|---|---|---|
Alanine | Ala | A | nonpolar | neutral | 1.8 | 89 | ||
Arginine | Arg | R | basic polar | positive | −4.5 | 174 | ||
Asparagine | Asn | N | polar | neutral | −3.5 | 132 | ||
Aspartic acid | Asp | D | acidic polar | negative | −3.5 | 133 | ||
Cysteine | Cys | C | nonpolar | neutral | 2.5 | 250 | 0.3 | 121 |
Glutamic acid | Glu | E | acidic polar | negative | −3.5 | 147 | ||
Glutamine | Gln | Q | polar | neutral | −3.5 | 146 | ||
Glycine | Gly | G | nonpolar | neutral | −0.4 | 75 | ||
Histidine | His | H | basic polar |
positive(10%) neutral(90%) |
−3.2 | 211 | 5.9 | 155 |
Isoleucine | Ile | I | nonpolar | neutral | 4.5 | 131 | ||
Leucine | Leu | L | nonpolar | neutral | 3.8 | 131 | ||
Lysine | Lys | K | basic polar | positive | −3.9 | 146 | ||
Methionine | Met | M | nonpolar | neutral | 1.9 | 149 | ||
Phenylalanine | Phe | F | nonpolar | neutral | 2.8 | 257, 206, 188 | 0.2, 9.3, 60.0 | 165 |
Proline | Pro | P | nonpolar | neutral | −1.6 | 115 | ||
Serine | Ser | S | polar | neutral | −0.8 | 105 | ||
Threonine | Thr | T | polar | neutral | −0.7 | 119 | ||
Tryptophan | Trp | W | nonpolar | neutral | −0.9 | 280, 219 | 5.6, 47.0 | 204 |
Tyrosine | Tyr | Y | polar | neutral | −1.3 | 274, 222, 193 | 1.4, 8.0, 48.0 | 181 |
Valine | Val | V | nonpolar | neutral | 4.2 | 117 |
Two additional amino acids are in some species coded for by codons that are usually interpreted as stop codons:
21st and 22nd amino acids | 3-Letter | 1-Letter |
---|---|---|
Selenocysteine | Sec | U |
Pyrrolysine | Pyl | O |
In addition to the specific amino acid codes, placeholders are used in cases where chemical or crystallographic analysis of a peptide or protein cannot conclusively determine the identity of a residue.
Ambiguous Amino Acids | 3-Letter | 1-Letter |
---|---|---|
Asparagine or aspartic acid | Asx | B |
Glutamine or glutamic acid | Glx | Z |
Leucine or Isoleucine | Xle | J |
Unspecified or unknown amino acid | Xaa | X |
BACKGROUND OF PROJECT
The initial purpose for developing this library was to find all sequences similar to a consensus sequence for a protein's DNA-binding domain in a genome. It was hypothesized that this protein could act to inhibit transcription by occluding the binding of RNA polymerase in multiple locations. I wanted a tool that could generate a list of all of these potential sites of inhibition (sites that the protein could potentially bind) ordered by their similarity to a consensus sequence.
I had previous experimental results listing a number of nucleotide sequences that this DNA-binding domain had high-affinity for. I had to use multiple tools to A) generate the consensus from identified binding sequences for this protein, B) use BLAST to try and find sequences that matched. Unfortunately, BLAST did not support the use the degenerate consensus sequence that I felt would give the best and largest set of results (potential binding sites in the genome) to test.
Using NtSeq,
the Nt.Seq#cover
method
can generate consensus sequences quickly (though the resulting sequence is
unweighted), and Nt.MatchMap
supports
degenerate nucleotide matching and can provide allungapped
matches (ordered by relevance) of moderately-sized query sequences in the
genomic data I was looking through (~200kbp) in milliseconds.
This project sat unfinished for years, and I felt the need to clean it up and release it. I hope a new generation of young scientists and developers will be help develop and permeate small, focused, well-documented open source JavaScript libraries to create beautiful online experiences. :)
FASTA Mode?
===