Table of standard amino acid abbreviations and properties[edit]

Main article: Proteinogenic amino acid
Amino Acid 3-Letter[122] 1-Letter[122] Side-chain polarity[122] Side-chain charge (pH 7.4)[122] Hydropathy index[123] Absorbanceλmax(nm)[124] ε at λmax (mM−1 cm−1)[124] MW(Weight)[125]
Alanine Ala A nonpolar neutral 1.8     89
Arginine Arg R basic polar positive −4.5     174
Asparagine Asn N polar neutral −3.5     132
Aspartic acid Asp D acidic polar negative −3.5     133
Cysteine Cys C nonpolar neutral 2.5 250 0.3 121
Glutamic acid Glu E acidic polar negative −3.5     147
Glutamine Gln Q polar neutral −3.5     146
Glycine Gly G nonpolar neutral −0.4     75
Histidine His H basic polar positive(10%)

neutral(90%)

−3.2 211 5.9 155
Isoleucine Ile I nonpolar neutral 4.5     131
Leucine Leu L nonpolar neutral 3.8     131
Lysine Lys K basic polar positive −3.9     146
Methionine Met M nonpolar neutral 1.9     149
Phenylalanine Phe F nonpolar neutral 2.8 257, 206, 188 0.2, 9.3, 60.0 165
Proline Pro P nonpolar neutral −1.6     115
Serine Ser S polar neutral −0.8     105
Threonine Thr T polar neutral −0.7     119
Tryptophan Trp W nonpolar neutral −0.9 280, 219 5.6, 47.0 204
Tyrosine Tyr Y polar neutral −1.3 274, 222, 193 1.4, 8.0, 48.0 181
Valine Val V nonpolar neutral 4.2     117

Two additional amino acids are in some species coded for by codons that are usually interpreted as stop codons:

21st and 22nd amino acids 3-Letter 1-Letter
Selenocysteine Sec U
Pyrrolysine Pyl O

In addition to the specific amino acid codes, placeholders are used in cases where chemical or crystallographic analysis of a peptide or protein cannot conclusively determine the identity of a residue.

Ambiguous Amino Acids 3-Letter 1-Letter
Asparagine or aspartic acid Asx B
Glutamine or glutamic acid Glx Z
Leucine or Isoleucine Xle J
Unspecified or unknown amino acid Xaa X

 

BACKGROUND OF PROJECT

The initial purpose for developing this library was to find all sequences similar to a consensus sequence for a protein's DNA-binding domain in a genome. It was hypothesized that this protein could act to inhibit transcription by occluding the binding of RNA polymerase in multiple locations. I wanted a tool that could generate a list of all of these potential sites of inhibition (sites that the protein could potentially bind) ordered by their similarity to a consensus sequence.

I had previous experimental results listing a number of nucleotide sequences that this DNA-binding domain had high-affinity for. I had to use multiple tools to A) generate the consensus from identified binding sequences for this protein, B) use BLAST to try and find sequences that matched. Unfortunately, BLAST did not support the use the degenerate consensus sequence that I felt would give the best and largest set of results (potential binding sites in the genome) to test.

Using NtSeq, the Nt.Seq#cover method can generate consensus sequences quickly (though the resulting sequence is unweighted), and Nt.MatchMap supports degenerate nucleotide matching and can provide allungapped matches (ordered by relevance) of moderately-sized query sequences in the genomic data I was looking through (~200kbp) in milliseconds.

This project sat unfinished for years, and I felt the need to clean it up and release it. I hope a new generation of young scientists and developers will be help develop and permeate small, focused, well-documented open source JavaScript libraries to create beautiful online experiences. :)

 

 

 


FASTA Mode?
===