Bioinformatics from the Institute of Mathematical Problems of Biology of the Russian Academy of Sciences and MIPT discovered previously unknown features of gene structure associated with the work of the brain. The study was published in the journal Plos One.
It is known that DNA encodes information about the structure and functioning of living organisms. In this “book of life” consistently, nucleotide by nucleotide recorded information about all proteins and RNA formed in the cell. The DNA fragment that encodes information about a single protein is called a genome, and the way to “translate” a DNA sequence into an amino acid sequence of a protein is called a genetic code. Back in the 1960s, the main properties of the genetic code were discovered, among which triplet is important. This property means that three consecutive standing nucleotides (codon) encode one amino acid. For example, the sequence of ATG (adenin-timin guanine) nucleotides encodes the amino acid methionine, which is usually the starting point for all proteins in living organisms at the synthesis stage.
Since the discovery of the genetic code, a lot of information about the gene structure of living organisms has been obtained. It has become known that the genes of eukaryotes (organisms with a nucleus) are “fragmented”. Inside the gene, there are non-coding fragments of DNA – introns – between encoding sites called exons. When the RNA of such genes matures, the introns are cut out and the exons are cross-linked, a process called splicing.
Scientists make various hypotheses about how and how long the introns occurred. In particular, the presence of introns makes possible an alternative splicing – the process of selective cross-linking of different exons, which allows to obtain different protein sequences from a single gene. This provides the number of different proteins in the cells, far exceeding the total number of genes. Also an important mechanism of genes evolution, in which introns participate, is the so-called “mixing of exons”. In this process, for example, an “extra” exon can be added between two other exons of a gene during recombination. In this way, new genes are created.
Because of the availability of a complete genome sequence in many organisms, today scientists have the opportunity to analyze the evolution of introns in detail. It is known that introns may have different lengths (from several tens of pairs to several hundred thousand nucleotide pairs), as well as different phase. The introns of phase 0 are located between the codons, phase 1 – after the first nucleotide of the codon, phase 2 – after the second nucleotide.
Bioinformatics specialists from MIPT and IMPB RAS analyzed the relationship between the length and phase of introns in humans and mice.
“Before us it never occurred to anyone to study the relationship between the length of the intron and its phase – because common sense says that there should be no relationship between them (as between the growth of man and the color of his eyes, for example),” – comments Eugene Baulin, employee of the Laboratory of Applied Mathematics, IMPB RAS and the Department of Algorithms and Programming Technology, MIPT.
To the authors’ surprise, a group of genes containing unusually long (more than 50 thousand pairs of nucleotides) introns of phase 1 was discovered. And such genes were associated with the transmission of nerve impulses in the brain.
Having carried out a detailed analysis of many scientific publications, the researchers were able to collect a puzzle of disparate facts in a coherent picture. It turned out that the presence of phase 1 introns in this group of genes is explained in most cases by the presence of a special amino acid sequence at the beginning of proteins, a signal peptide. The task of this peptide is to direct the protein to its place of work, in case of nerve cell receptors – to the plasma membrane. In turn, the relatively long length of these introns is also indirectly related to the presence of the signal peptide. The signal peptide in such proteins is always located at the beginning of the molecule, and the DNA fragment encoding it is always located at the beginning of the gene. Namely, long introns are very often located at the beginning of the gene, because they contain regulatory DNA sequences important for the synthesis of this protein.
As a result of the work, the authors were able to decipher a slender and complete picture of the mechanism of mixing exons and participation of long phase 1 introns in it.