Why isoleucine?
Harris Wang, a systems and synthetic biologist at Columbia University, has long been interested in whether organisms can function with fewer amino acids. The first forms of life could not have emerged with a complete set of twenty — evolution must have started from something simpler.
His team first analyzed how often individual amino acids are naturally replaced by others in proteins. Isoleucine was by far the most frequently substituted amino acid — typically replaced by the structurally similar valine or leucine. This made it an ideal candidate for experimental elimination.
The team then chose one of the most difficult possible targets: the ribosome, a giant protein complex that functions as a protein factory in the cell. "Eliminating an amino acid from the ribosome is almost the hardest thing imaginable — it's the largest and most complex protein complex in the cell," explains Wang.
How artificial intelligence entered the game
The scientists modified the genes encoding all 50 ribosomal protein subunits that contain isoleucine. For each of them, they rewrote the DNA sequences for isoleucine to sequences for valine. For 18 of the 50 subunits, this direct exchange was sufficient, and the resulting bacterial strains grew normally.
The remaining 32 subunits, however, could not tolerate such a simple modification — the proteins lost their spatial structure and ceased to function. This is where artificial intelligence tools for protein structure prediction came into play, operating on a principle similar to Google DeepMind's AlphaFold or Meta AI's ESMFold.
AI models proposed compensatory changes: sometimes it was enough to swap amino acids in close proximity to the replaced isoleucine, other times it was necessary to modify a distant amino acid that interacts with the problematic site in the folded protein. In several cases, scientists had to resort to systematic testing — replacing one amino acid after another until they found a combination that restored bacterial growth.
After successfully modifying all proteins individually, the team attempted to combine 21 rewritten ribosomal proteins in a single bacterium. It didn't work the first time, but after further modifications, a strain emerged that grew — albeit slower than standard E. coli.
What this means for science and biotechnology
"It's truly a monumental achievement — to trim the alphabet of life to 19 amino acids," comments Christopher Snow, a protein engineer at Colorado State University, who was not involved in the research. According to him, the study "deepens our understanding of life's design rules."
The research has several practical implications. Organisms with reduced dependence on specific amino acids could better survive in hostile environments or resist viral infections, Wang speculates. If a bacterium doesn't need isoleucine, a virus trying to attack it and use its protein synthesis machinery might have a problem.
Even more interesting is the second implication: by removing isoleucine, the DNA sequences that normally code for this amino acid are "freed up." These freed codons could be reassigned to code for other, perhaps synthetic, amino acids, explains Tom Ellis from Imperial College London. This would allow the creation of proteins with new functions — for example, more stable drugs or enzymes for industrial use.
Certain synthetic amino acids are already used in some drugs today — for example, semaglutide, known from weight loss and type 2 diabetes medications. So-called non-canonical amino acids increase the stability of such compounds in the body. Wang's research could push this technology to a new level.
450 generations of stability — what's next?
An interesting finding is the stability of the changes made. After 450 generations, the modified E. coli did not revert to using isoleucine in the altered proteins. This, according to Snow, "supports the idea that early life probably managed with a smaller set of colors on its palette for some time."
Wang's team now plans to apply the approach to the entire E. coli genome and, in the future, perhaps attempt a bacterium functioning with only 18 amino acids. The work shows that AI tools for protein structure prediction — which were considered purely academic curiosities just a few years ago — are finding concrete application in synthetic biology and genetic engineering.
For the Czech and European scientific scene, the research is relevant on several levels. The European Union regulates genetically modified organisms (GMOs) through Directive 2001/18/EC and related regulations — any organism with such a modified genome would be subject to a strict approval process. At the same time, the EU invests in synthetic biology within the Horizon Europe program, and similar research is also being conducted at European institutions, for example, at EMBL (European Molecular Biology Laboratory) in Heidelberg or at ETH Zürich.
AI as an indispensable tool in modern biology
Wang's experiment is another example of how artificial intelligence is becoming an indispensable tool in laboratory science. Before 2020, when AlphaFold first broke the boundaries of protein structure prediction accuracy, such complex protein mutation design would have taken months of experimental work. Today, AI can design it in a matter of hours.
Similar tools are now freely available to researchers worldwide, including Czech scientific institutions. AlphaFold and its protein structure database are publicly accessible, ESMFold from Meta AI is open-source, and many other tools offer free academic licenses. For Czech researchers in molecular biology, bioinformatics, and synthetic biology, this means access to technologies that did not exist five years ago.
Could a bacterium with 19 amino acids survive outside the laboratory?
Not yet. The modified E. coli strains in the current experiment grow slower than standard bacteria and were only modified in a portion of ribosomal proteins. For survival in a natural environment, the elimination of isoleucine would have to be completed throughout the entire genome, which is the next step in the research. Furthermore, any release of such an organism would be subject to strict EU GMO regulations.
What specific AI tool was used in the experiment?
The authors of the study in the Science article mention the use of tools based on the principle of protein structure prediction from amino acid sequences and vice versa — i.e., technologies similar to Google DeepMind's AlphaFold. The specific implementation is not publicly specified, but in principle, these are the same deep-learning models that are now standard in the field of structural biology. In expert circles, the terms "protein structure prediction tools" or "inverse folding models" are used to describe this type of AI.
Is this research relevant to medicine?
Yes, in two ways. Firstly, understanding how proteins tolerate amino acid substitutions helps in designing more stable therapeutic proteins and enzymes. Secondly, freeing up codons — three-letter DNA "words" that normally code for isoleucine — would allow these sequences to be assigned synthetic amino acids. This is already used today in the production of drugs such as semaglutide (the active ingredient in Ozempic and Wegovy), where non-canonical amino acids increase the stability of the molecule in the bloodstream.