Sample NtSeq Project

Table of standard amino acid abbreviations and properties[edit]

Amino Acid	3-Letter^[122]	1-Letter^[122]	Side-chain polarity^[122]	Side-chain charge (pH 7.4)^[122]	Hydropathy index^[123]	Absorbanceλ_max(nm)^[124]	ε at λ_max (mM⁻¹ cm⁻¹)^[124]	MW(Weight)^[125]
Alanine	Ala	A	nonpolar	neutral	1.8			89
Arginine	Arg	R	basic polar	positive	−4.5			174
Asparagine	Asn	N	polar	neutral	−3.5			132
Aspartic acid	Asp	D	acidic polar	negative	−3.5			133
Cysteine	Cys	C	nonpolar	neutral	2.5	250	0.3	121
Glutamic acid	Glu	E	acidic polar	negative	−3.5			147
Glutamine	Gln	Q	polar	neutral	−3.5			146
Glycine	Gly	G	nonpolar	neutral	−0.4			75
Histidine	His	H	basic polar	positive(10%) neutral(90%)	−3.2	211	5.9	155
Isoleucine	Ile	I	nonpolar	neutral	4.5			131
Leucine	Leu	L	nonpolar	neutral	3.8			131
Lysine	Lys	K	basic polar	positive	−3.9			146
Methionine	Met	M	nonpolar	neutral	1.9			149
Phenylalanine	Phe	F	nonpolar	neutral	2.8	257, 206, 188	0.2, 9.3, 60.0	165
Proline	Pro	P	nonpolar	neutral	−1.6			115
Serine	Ser	S	polar	neutral	−0.8			105
Threonine	Thr	T	polar	neutral	−0.7			119
Tryptophan	Trp	W	nonpolar	neutral	−0.9	280, 219	5.6, 47.0	204
Tyrosine	Tyr	Y	polar	neutral	−1.3	274, 222, 193	1.4, 8.0, 48.0	181
Valine	Val	V	nonpolar	neutral	4.2			117

Two additional amino acids are in some species coded for by codons that are usually interpreted as stop codons:

21st and 22nd amino acids	3-Letter	1-Letter
Selenocysteine	Sec	U
Pyrrolysine	Pyl	O

In addition to the specific amino acid codes, placeholders are used in cases where chemical or crystallographic analysis of a peptide or protein cannot conclusively determine the identity of a residue.

Ambiguous Amino Acids	3-Letter	1-Letter
Asparagine or aspartic acid	Asx	B
Glutamine or glutamic acid	Glx	Z
Leucine or Isoleucine	Xle	J
Unspecified or unknown amino acid	Xaa	X

BACKGROUND OF PROJECT

The initial purpose for developing this library was to find all sequences similar to a consensus sequence for a protein's DNA-binding domain in a genome. It was hypothesized that this protein could act to inhibit transcription by occluding the binding of RNA polymerase in multiple locations. I wanted a tool that could generate a list of all of these potential sites of inhibition (sites that the protein could potentially bind) ordered by their similarity to a consensus sequence.

I had previous experimental results listing a number of nucleotide sequences that this DNA-binding domain had high-affinity for. I had to use multiple tools to A) generate the consensus from identified binding sequences for this protein, B) use BLAST to try and find sequences that matched. Unfortunately, BLAST did not support the use the degenerate consensus sequence that I felt would give the best and largest set of results (potential binding sites in the genome) to test.

Using NtSeq, the Nt.Seq#cover method can generate consensus sequences quickly (though the resulting sequence is unweighted), and Nt.MatchMap supports degenerate nucleotide matching and can provide allungapped matches (ordered by relevance) of moderately-sized query sequences in the genomic data I was looking through (~200kbp) in milliseconds.

This project sat unfinished for years, and I felt the need to clean it up and release it. I hope a new generation of young scientists and developers will be help develop and permeate small, focused, well-documented open source JavaScript libraries to create beautiful online experiences. :)

ATGCCCGACTGCA
FASTA Mode?
===