What is Expression vector?
The expression vectors are vectors which act as vehicles for DNA insert and also allow the DNA insert to be expressed efficiently. These may be plasmids or viruses. The expression vectors are also known as expression constructs.
The expression vectors are genetically engineered for the introduction of genes into the target cells. In addition to the gene of interest, these expression constructs also contain regulatory elements like enhancers and promoters so that efficient transcription of the gene of interest occurs.
The simplest expression constructs are also known as transcription vectors; only because they allow transcription of the cloned foreign gene and not its translation.
The vectors which facilitate both transcription and translation of the cloned foreign gene are known as protein expression vectors. These protein expression constructs also lead to the production of recombinant protein.
Now, for transcription and translation, a promoter and a termination sequence are a must. Transcription initiates at the promoter and ends at the termination site. The promoters of expression vectors must have on/off switches.
These switches help in the regulation of production of the gene product. Excessive amounts of product of the gene of interest can be toxic for the cell. A common promoter utilized in the expression constructs is the mutant version of the lac promoter, lacUV.
The lacUV promoter initiates a high level of transcription under induced conditions. Moreover, in some expression vectors, a ribosomal binding site is present upstream to the start codon. The ribosomal binding site facilitates the efficient translation of the cloned foreign gene.
Expression vectors are used extensively in molecular biology; in techniques like site-directed mutagenesis.
How do Expression Vectors work?
Once the expression construct is inside the host cell, the protein encoded by the gene of interest is produced by the transcription. Thereafter, it utilizes the translation machinery and ribosomal complexes of the host organism.
Frequently, the plasmid is genetically engineered to harbor regulatory elements like enhancers and promoters. These regulator sequences aid in efficient transcription of the gene of interest.
Expression vectors are extensively used as tools which help in the production of mRNAs and, in turn, stable proteins. They are of much interest in biotechnology and molecular biology for the production of proteins like insulin. Insulin is the chief ingredient in the treatment of the complex disease, Diabetes.
When the protein product is expressed, it is to be then purified. The purification of a protein poses a challenge since the protein of interest, whose gene is carried on the expression vector, is to be purified independently of the proteins of the host organism.
To make the process of purification simpler, the gene of interest carried on the expression vector should always have a ‘tag’. This tag can be any marker peptide or histidine (His tag).
Expression vectors are considerably exploited in techniques like site-directed mutagenesis. Cloning vectors introduce the gene of interest into a plasmid which in turn replicates in bacteria. These cloning vectors need not necessarily result in the expression of a protein.
Therefore, expression vectors must have the following expression signals:
- Strong promoter,
- Strong termination codon,
- Adjustment of distance between the promoter and cloned gene,
- Inserted transcription termination sequence, and
- Portable translation initiation sequence.
What is Promoter expression Vector?
A promoter ensures a reliable transcription of the gene of interest. Also, strong promoters are also necessary for an efficient mRNA synthesis with RNA polymerase.
Regulation of the promoter is another critical aspect which should always be kept in mind while constructing an expression vector.
The strongest promoters are those found in bacteriophages T5 and T7.
In E. coli, the promoter is regulated in two ways:
- Induction: the addition of chemical switches on the transcription of the gene.
- Repression: addition of chemical switches off the transcription of the gene.
The most commonly used promoters in E. coli expression system are:
#1. lac promoter.
It regulates the transcription of the lac Z gene. The lac Z gene is responsible for the production of β- galactosidase.
The lac Z gene can be induced by IPTG, isopropylthiogalactosidase.
The lac promoter sequences can be fused to the target gene. It will, then, result in lactose- dependent expression of the target gene.
Nevertheless, the lac promoter has its drawbacks. It is quite weak and cannot be utilized for the high levels of production of the desired protein. In addition to this, the lac genes carry out the basal level of transcription even in the absence of induction (inducer molecule).
#2. trp promoter.
- It is responsible for the regulation of a cluster of genes which are involved in tryptophan biosynthesis.
- Tryptophan acts as its repressor molecule, and it is induced by 3-β-indoleacrylic acid.
#3. tac promoter.
- It is formed by hybridization of the lac and trp promoter. However, it is stronger than either of them.
- The tac promoter is induced by IPTG, isopropylthiogalactosidase.
#4. λPL.
It is a strong promoter and is responsible for transcription of λDNA in E. coli
The product of λcI gene acts as its repressor. It is called λ repressor.
The expression construct with the λPL promoter is used in combination with the E. coli mutant host. It is responsible for the production of a temperature sensitive form of λ repressor.
At low temperatures, the repressor protein represses the transcription whereas the transcription of the cloned gene occurs at high temperatures; because the repressor is inactivated at high temperature.
For the expression of proteins in mammalian cells, the promoter must be located upstream of the cloned cDNA for its efficient transcription.
In most of the cases, viral promoters are employed only because they are reliable for a strong constitutive expression.
The widely used promoters are CMV promoter (derived from cytomegalovirus) and the SV40 promoter (derived from simian virus 40).
The promoters in the commercially available yeast expression vectors may be active constitutively or inducible ones.
Constitutive Promoter
A constitutive promoter is a kind of promoter which is unregulated and allows continual transcription of its associated gene.
Example of a constitutive promoter: GAP promoter of the gene encoding glyceraldehyde-3-phosphate dehydrogenase
Inducible promoter
An inducible promoter is the one which works in a regulated manner and the expression of genes associated with them can be switched on or off at a particular stage of development or at a certain point of time.
Examples of inducible promoters: AOX1, GAL1, GAL10, nmt1, nmt42, and nmt81.
The AOX1 promoter of the gene encoding alcohol oxidase. It is induced by methanol and is best-suited for expression of the protein in Pichia pastoris.
The GAL1 and GAL10 promoters are other examples. They are induced by galactose and are suitable for protein expression in Saccharomyces cerevisiae.
The nmt1, nmt42, and nmt81 promoters which are induced by thiamine for protein expression in Schizosaccharomyces pombe.
What is Reporter Gene expression vector?
The reporter gene is responsible for the production of the protein which can be detected and quantified with the help of a simple assay.
They serve as a tool to measure the efficiency of the gene expression and also to detect the intracellular localization of the protein.
The rate of expression of the structural gene is dependent upon the regulatory sequences which are located upstream to it.
The rate of expression of the gene can be measured by replacement of its protein-encoding portion. Also, it can be fused to another gene which expresses another protein. The presence of this another protein can be easily identified.
Reporter genes are useful in the identification of promoters, enhancers, and other proteins or regulatory elements which bind to them.
The most commonly utilized reporter genes are:
1. lac Z gene of E. coli
- It acts as a reporter in the presence of X- gal.
- Its levels are easily detected by the intensity of colour which is produced. The intensity of the blue colour produced is quantified.
2. CAT (chloramphenicol acetyltransferase) encoding gene of E. coli
- The CAT gene encodes chloramphenicol acetyltransferase.
- The transferase enzyme is responsible for the transfer of acetyl groups from acetyl CoA to the recipient antibiotic, chloramphenicol
3. Luciferase encoding gene of firefly, Photinus pyralis
- Luciferase is accountable for the oxidation of luciferin.
- The oxidation of luciferin results in the emission of yellow-green light. The emission of light is easily detected irrespective of the low levels.
4. Green fluorescent protein (GFP) encoding gene of jellyfish, Aequorea victoria
- GFP was discovered by Shimomura.
- It is an autofluorescent protein with 238 amino acid residues produced by the bioluminescent jellyfish Aequorea victoria.
- In GFP, β-barrel is formed by eleven β strands. An α- helix runs through the center. The chromophore is located in the middle of the barrel. The amino acid residues from 65 to 67 with sequence Ser-Tyr-Gly form the chromophore, p- hydroxybenzylideneimidazolinone, which is fluorescent. The chromophore fluoresces at a peak wavelength of 508 nm (green light) when it is irradiated with UV or blue light (400 nm).
- GFP serves as a tool for determining protein localization.
- It serves as a tag whereby it is fused with a protein whose expression is to be monitored. Basically, the subcellular localization of the protein is investigated.
- Genetic engineering techniques help in the production of vectors which contain the coding sequence of the unidentified protein, X, cloned in the coding sequence of the GFP.
- This fusion product of GFP-X can now be transfected into target cells and the expression, as well as the subcellular location of the X protein, can easily be monitored and detected.
Ribosome Binding Site and Translation Initiation Site
The ribosomal binding site (RBS) follows the promoter. It is responsible for the efficient translation of the cloned gene.
The translation initiation site in case of prokaryotes is known as the Shine Dalgarno sequence. This sequence is enclosed within the RBS only.
The consensus sequence of the translation initiation site includes a set of 8 base pairs present upstream the AUG start codon.
The translation in eukaryotes is initiated at a particular sequence called Kozak sequence.
The ribosomal machinery for the translation of mRNA is assembled on this site.
Polylinkers
- Each vector contains particular recognition sites for restriction enzymes. It is at the restriction site that the vector is excised to clone the foreign gene of interest.
- These sites often lie close together and, hence, are called polylinkers or multiple cloning sites (MCS).
- These regions are 50 to 100 base pair in length and may have a cluster of up to 25 restriction sites.
Poly-A (polyadenylation) Tail
- The poly-A tail present, at the end of the mRNA formed, protects the mRNA from degradation by the exonucleases or endonucleases.
- It is extremely critical for the stability of the mRNA.
- It is also responsible for the termination of transcription and translation and stabilizes the mRNA production.
- A nucleolytic enzyme complex and a poly-A-polymerase are prerequisites for the addition of poly-A tail at the end of the mRNA.
Expression System
The production of a protein requires an expression system. There are two types of expression systems, prokaryotic and eukaryotic expression system.
Each of them has its own advantages and drawbacks which can be taken into consideration while constructing an expression system. However, there is no such expression system which can be considered universal for the heterologous protein production.
Prokaryotic Expression System
- The specificity of the promoter of an RNA polymerase, in the case of prokaryotes, is mediated by sigma factor.
- E. coli is the widely used prokaryotic expression system.
- It expresses high levels of the protein.
- The E. coli strains are manipulated genetically for the production of recombinant protein so that they are rendered safe for large-scale experiments and fermentation.
- The purification of the protein has become easier since recombinant-fusion proteins can be purified by affinity chromatography, say for example glutathione-S-transferase and maltose-binding fusion proteins.
Regardless of the advancements and improvements occurring,in the prokaryotic expression system, there are still many difficulties associated and challenges posed by the production of protein from the cloned foreign genes. These kinds of challenges can be grouped together into 2 categories:
1. Challenges because of the nature of the sequence of the foreign gene
The presence of introns in foreign genes,
The presence of the termination signals, and
The genes in prokaryotic as well as eukaryotic expression systems observe a defined utilization of synonymous codons. This is referred to as codon bias. Since the codons are degenerate, there is a bias for two or more codons. In some cases, different genes prefer only certain codons.
These specific codons are used frequently regardless of the abundance of the protein taken into consideration, for example, CCG is a widely accepted codon for proline.
The genes with high levels of expression exhibit codon bias towards certain codons compared to the ones which are expressed at low levels. The frequency of utilization of synonymous codons reflects the degree of abundance of their corresponding tRNAs.
All these observations transport us to the result that genes with codons which are hardly used by the E. coli expression system may not be efficiently expressed in the E. coli.
2. Challenges due to the prokaryotic host, E. coli
The processing of proteins poses one challenge. Prokaryotes carry out post-translational modifications of their proteins in a rather different way than eukaryotes do. This, in turn, can affect a protein’s stability, activity, and response to antibiotics.
The folding of proteins renders yet another challenge. The protein products of eukaryotic foreign cloned genes may fold incorrectly in the prokaryotic expression system.
This may lead to the formation of insoluble aggregates, also known as inclusion bodies, which are not recovered as functional proteins.
The foreign proteins may fold incorrectly either because of exposure of the hydrophobic residues, which are generally present inside the core of the protein or because of lack of interactions which occur in the normal environment or inappropriate post-translational modifications.
Eukaryotic Expression System
The challenges posed when cloning genes in a prokaryotic expression system can be overcome by using a eukaryotic expression system.
The eukaryotic expression systems include yeast, mammalian cells, and baculovirus cells (insect).
Advantages of using eukaryotic expression system are:
- The protein product of the cloned gene is expressed at high levels.
- The proteins can be easily purified by using particular tags which are included in the vector itself like His, Myc, etc.
The disadvantage of eukaryotic expression system include:
- The eukaryotic cells grow at a slow pace compared to the prokaryotic cells.
Fusion Proteins
Proteins expressed by the expression vectors may be expressed as native polypeptides or fusion proteins.
Fusion proteins facilitate the purification and analysis of the protein.
Fusion proteins also referred to as the chimeric or hybrid proteins, are the end product of the coding sequence of different genes which are cloned together and yield single polypeptide sequence after translation.
They protect the gene of interest from the proteases present in the host cell.
The cloned gene proteins are resistant to degradation when they are present in combination with the fusion protein. When these proteins are expressed as separate entities they are vulnerable to degradation and undergo proteolysis.
A fusion vector system has a target gene inserted into the coding sequence of the cloned host gene.
At the level of DNA, fusion proteins are constructed by ligating coding sequence of different genes. For this, the knowledge regarding the nucleotide sequence of the coding genes or segments is a prerequisite for ensuring that ligation gives rise to the correct reading frame.