Skip to main content

Using Deep Learning to Accelerate Protein-Folding Prediction

By Jas Mehta

For decades, a fundamental question in biology remained largely unanswered: how do proteins fold? Proteins, large, complex molecules, play crucial roles in virtually every biological process within our cells. These building blocks, the workhorses of our cells, contort their amino acid chains into intricate 3D shapes that dictate their function. Unveiling these structures has been a slow and expensive endeavor, hindering progress in medicine, drug discovery, and our understanding of life itself.

Researchers have grappled with the challenge of deciphering protein structures using methods such as X-ray crystallography and computational modeling. However, these approaches often fell short in terms of accuracy and efficiency. Scientists and software developers using artificial intelligence (AI) concepts are creating powerful new tools to address this challenge. One example, called AlphaFold, was developed by the DeepMind subsidiary of Google’s parent company, Alphabet. AlphaFold represents a paradigm shift in protein structure prediction, building upon decades of research engaging with the intricate puzzle of protein folding, and leveraging the power of deep learning to achieve near-atomic accuracy in predicting 3D protein structures from amino acid sequences. (See the image below for an example of how researchers are computing protein structure from amino acid sequence data.)

Computing protein structure from amino acid sequence


This breakthrough has not only streamlined the process, reducing prediction times from months to minutes, but has also opened new avenues for drug discovery and biomedical research, promising to revolutionize our understanding of proteins and their functions within cells. This represents a monumental leap compared to traditional methods like X-ray crystallography, which can take months or even years. This breakthrough not only accelerates research cycles and slashes costs but also holds profound implications for fields ranging from medicine to materials science.

In the 14th Critical Assessment of Protein Structure Prediction (CASP), a biennial competition, AlphaFold achieved a staggering feat. It matched or surpassed the accuracy of experimental methods for a whopping 90% of proteins, showcasing the immense power of deep learning for this complex task. Historically, determining a protein structure could cost upwards of $100,000 and take months. AlphaFold slashes this time to minutes, with a projected cost per prediction of mere cents. This translates to significant cost savings and faster research cycles.

Designing drugs often hinges on knowing a protein’s structure. AlphaFold’s speed and accuracy streamline this process. A recent study used AlphaFold to identify a potential drug target for a baffling neurodegenerative disease, a process that would have taken significantly longer using traditional methods. Moving beyond snapshots, the next frontier is understanding how proteins fold, move, and interact within the cell. This will provide invaluable insights into cellular processes and protein function. Deep learning thrives on data. Integrating protein interaction databases, cellular environment data, and real-time folding kinetics will further enhance the accuracy and applicability of protein structure prediction. Open-source platforms like AlphaFold are making these powerful tools accessible to researchers worldwide. This fosters collaboration and accelerates scientific progress across disciplines.

The success of AlphaFold stands as a testament to the indispensable role played by the Protein Data Bank (PDB), a vast repository housing experimentally determined protein structures. Mr. Darnell Granberry, a distinguished machine-learning (ML) engineer at the New York Structural Biology Center, sheds light on the critical importance of open data in driving groundbreaking advancements in protein research. “The PDB contains nearly all of the protein structures that have been experimentally determined, and the fact that it’s open source is a major enabler of AlphaFold and other protein ML models,” remarks Mr. Granberry. “If we didn’t have it, I think we’d likely have been limited to in-house models developed at pharma/biologics companies on proprietary data.”

His insights offer a nuanced understanding of the symbiotic relationship between computational methods and protein research, emphasizing the transformative impact of accessible data on scientific innovation. Furthermore, Mr. Granberry eloquently articulates a foundational principle of biology, stating, “There’s that central dogma of biology: DNA to RNA, RNA to protein, protein to function. So basically, anything that you’re interested in, basically in any living thing, is going to be rooted in some sort of protein or complex of proteins, or collection of them that interact with each other.”

In his words, we discern a profound appreciation for the pivotal role played by proteins in shaping the essence of life itself, underscoring the fundamental importance of unraveling their structures and functions in driving progress across diverse realms of scientific inquiry.

In a recently published study, researchers used AlphaFold to predict the structure of a protein implicated in amyotrophic lateral sclerosis (ALS), a debilitating neurodegenerative disease. The predicted structure revealed a never-before-seen binding site, paving the way for the design of drugs that could potentially slow or halt disease progression. This exemplifies AlphaFold’s potential to revolutionize drug discovery, particularly for complex and previously untreatable diseases.


The chart above depicts the median accuracy of protein-folding predictions in the free-modeling category of the CASP competition over the years. As you can see, there was a significant jump in accuracy in 2018 and 2020, coinciding with the introduction of DeepMind’s AlphaFold systems. This dramatic improvement highlights the transformative power of deep learning in protein-folding prediction.

Deep learning has irrevocably transformed protein-folding prediction. As we delve deeper into protein dynamics and leverage the power of big data, the potential applications are truly boundless. From developing new medicines and biomaterials to a fundamental understanding of how life works at the molecular level, AlphaFold and its successors promise to usher in a new era of biological discovery.

Get Involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Toward Building Quality Relationships: How Chatbots Can Help Us Practice Self-Disclosure

By Qining Wang

Under the turmoil of social events, from global pandemics to wars and social unrests, mental health is becoming an increasingly greater concern among the public.

According to the Anxiety and Depression Association of America (AADA), anxiety disorders are the most common mental illness in the USA, affecting 40 million adults. Another common mental health illness, depression, affects 16 million adults in the USA, according to statistics from the Centers for Disease Control and Prevention (CDC). The greater awareness and gradual destigmatization of mental health issues have led more people to seek professional help to improve their overall mental well-being.

When working with mental health professionals, self-disclosure is vital to finding the roots and triggers of mental health issues. Self-disclosure is a process through which a person reveals personal or sensitive information to others. It is a crucial way to relieve stress, anxiety, and depression.

Meanwhile, self-disclosure is a skill that one needs to cultivate through practice. It’s a skill we can only practice through constant self-exploration and the courage to be vulnerable.

To investigate alternative ways of practicing self-disclosure, a research team at the University of Illinois at Urbana-Champaign (UIUC) explored chatbots and conversational AIs as potential mediators in the self-disclosure process in a study in 2020. The team leader, Dr. Yun Huang, is an assistant professor in the School of Information Sciences at UIUC and the co-director of the Social Computing Systems (SALT) Lab. The team is mainly interested in context-based social computing system research.

Chatbots are ubiquitous in today’s online world. They are computer programs interacting with humans back-and-forth, like having a conversation. Some chatbots are task-oriented. An example can be a frequently-asked-questions (FAQ) chatbot that recognizes the keywords a person types and spits out a preset answer according to the keywords. Other more sophisticated chatbots, such as Apple’s Siri and Amazon’s Alexa, are data-driven. They are more contextually aware and can tailor their responses based on user input. Both are ideal qualities for designing an empathetic and tone-aware chatbot capable of self-disclosure.

As such, Dr. Huang’s team built a self-disclosing chatbot that can engage in conversation more naturally and spontaneously. The chatbot would initiate self-disclosure during small-talk sessions. It would gradually move to more sensitive questions that encourage users to self-disclose.

To study how chatbots’ self-disclosure can affect humans’ willingness to self-disclose, the team recruited university students and divided them into three groups. Each group would interact with the chatbot at different levels of self-disclosure, from no self-disclosure to low and high levels of self-disclosure.

During the four-week study, the student participants would interact with the chatbot every day for 7–10 minutes. At the end of the third week, the chatbot would recommend that students interact with a human mental health specialist. The researchers would then evaluate students’ willingness to self-disclose to the professional.

The team found that the groups that self-disclosed to the chatbot reported greater trust in the mental health professional than the control group. Participants felt “confused” when the chatbot brought up the human professional. In the experimental groups, they felt that they could listen to the chatbot and share sensitive experiences.

The team noted that, for participants interacting with the chatbot with the highest level of self-disclosure, their trust for the mental health professional stemmed from the trust of the chatbot. Participants’ trust was mainly directed toward the research team and professionals behind the chatbot for the other two groups.

This study highlights how chatbots can be a great tool to help users practice self-disclosure, making them more comfortable seeking human professionals. It is worth noting that, regardless of how sophisticated chatbots can be, they are just mediators between users and mental health professionals.

At the end of the day, the most meaningful kind of self-disclosure can only be found through care, empathy, and understanding. Human to human.

Get Involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.