Scientists at Boğaziçi University are studying to decode the language of proteins by using artificial intelligence

The "Language of Life" (LifeLU) project is led by Associate Professor Arzucan Özgür from the Department of Computer Engineering at Boğaziçi University. The project aims to decode the language of proteins, the basic building blocks of life, using artificial intelligence algorithms to be developed. If successful, the project, which is part of the European Research Council's prestigious unified funding programme (ERC Consolidator Grant), will pave the way for a better understanding of diseases such as cancer.

Serkan Karakoyun

The project "Understanding the Language of Life: Identifying and Characterising Language Units in Protein Sequences (LifeLU)" by Assoc. Prof. Dr. Arzucan Özgür and her team from the Department of Computer Engineering at Boğaziçi University has been selected by the European Commission for the European Research Council's (ERC) Consolidator Grant.

Under the competitive and prestigious programme, the project will receive a grant of about EUR 2 million. The language of the protein, which consists of 20 types of essential amino acids, will be decoded using artificial intelligence algorithms. These are being developed at Boğaziçi University.


Stating that they will work intensively for five years to understand proteins like human language, Dr. Özgür explains the groundbreaking nature of the project as follows:

"Proteins play an important role in life-sustaining biological processes. Although amino acid sequences are three-dimensional molecules, they can also be represented textually. In other words, protein sequences can be thought as texts written in a special language, the language of life. The main objective of the LifeLU project can be summarized as follows: Develop innovative methods to identify the smallest meaningful units of proteins - like words in human language - and their meaning and function to provide a basis for a better understanding of the language of life. If this high-risk project succeeds, it will be groundbreaking. This challenge has led to the project being selected for the European Research Council's unified funding programme. Our PhD students Gökçe Uludoğan, Burak Suyunu and Enes Taylan are also part of the project team."


According to the scientist, deciphering the language of proteins will build a linguistic bridge between humans and their basic organic structures. Assoc. Prof. Dr. Özgür explained that this could change our daily lives, adding, "If we understand what the proteins are telling us, we can more easily understand diseases like cancer, which can turn our lives upside down today, by reading what the proteins in our bodies are telling us. Just like the bases in DNA, the basic amino acids have a correspondence in letters. We can think of proteins as texts written with a 20-letter alphabet. With the artificial intelligence algorithms that our team will develop, we want to understand what meaningful words and even sentences these letters form. For this reason, our project is interdisciplinary and involves not only computer science but also many other fields such as molecular biology".

Who is Assoc. Prof. Dr. Arzucan Özgür?

Assoc. Prof. Dr. Özgür is one of the founding members of the Text Analytics and Bioinformatics Research Laboratory (TABILAB) at Boğaziçi University. She received her PhD in Computer Science and Engineering from the University of Michigan, USA, in 2010 and her BSc and MSc degrees in Computer Engineering from Boğaziçi University in 2002 and 2004, respectively. Her research interests include bioinformatics and natural language processing. She has been awarded the Science Academy Young Scientist Award (BAGEP 2016), the Turkish Science Academy Outstanding Young Scientist Award (TÜBA-GEBİP 2019), and the Boğaziçi University Foundation (BÜVAK) Outstanding Achievement in Research Award. Assoc. Prof. Dr. Arzucan Özgür has continued her academic studies as a faculty member in the Department of Computer Engineering at Boğaziçi University since 2011.