Deep Learning Engineer Ali Taghibakhshi and the Magic of Text-to-Image AI Generation

By Ken Ogata

Artificial intelligence (AI), a field where the boundaries of imagination are constantly being pushed, is witnessing remarkable advances in language and vision in ways that were unimaginable just a few years ago. At the forefront of this technological revolution is Ali Taghibakhshi, a Deep Learning Algorithm Engineer at NVIDIA, whose work epitomizes the blending of these realms.

Taghibakhshi works as a Deep Learning Algorithm Engineer at NVIDIA, which he describes as being a mixture of research and engineering centered around large-scale generative vision and language models. In the vocabulary of AI and machine learning, Taghibakhshi primarily works with large language models (LLMs) and multimodal Generative AI models. In other words, he helps create methods for machine-learning models to generate accurate and high-quality images based on a text input.

While text-to-image can be a harder concept to grasp than text-to-text, Taghibakhshi states that the machine-learning methods used for text-to-image models are not so different.

“The components are the same for both,” Taghibakhshi said. “They all use transformer architecture that have been revolutionizing the field since their introduction in 2017. Although [text-to-text and text-to-image] are different modalities, they still have a lot of things in common. Essentially, you’re combining these two modalities and they have to be in the same space.”

At NVIDIA, Taghibakhshi works on projects such as NeMo, a platform that allows individuals to develop custom, pretrained generative AI, ranging from language to vision and speech models. Taghibakhshi is currently working on methods for fine-tuning text-to-image diffusion models to ensure more accurate image generation. (For more information, Taghibakhshi summarizes his team’s research in this NVIDIA Developer blog).

Landscape generated from simple blocks of color on a computer using AI. — Image courtesy of Ars Electronica / Martin Hieslmair

NeMo follows in the footsteps of previous image generation diffusion models created by NVIDIA, namely GauGAN, a model that allowed individuals to draw simple blobs on a screen to which the model would output a high-fidelity, picturesque landscape based on the user’s input. The second version, GauGAN2, had a text-to-image feature, adorned with the ability to turn simple phrases such as “misty mountains covered in snow” or “sunset at rocky beach” into photorealistic images in real time. According to the creators of GauGAN, the model was named after the French post-impressionist painter Paul Gauguin.

Despite the exponential growth of AI and machine learning in recent years, there still remains a great white whale that Taghibakhshi and other deep-learning engineers continue to pursue: allowing AI to think out of the box.

“These models are good at interpolation. We provide all the data within a circle, and it learns that circle pretty well. However, [these models] can’t extrapolate. This isn’t limited to any certain models, but all machine-learning models in general,” Taghibakhshi said. “If you only train it on cat images, it’s never going to generate a horse or something like that.”

In January, Google published a paper in Nature to introduce AlphaGeometry, an AI model that can solve geometry problems at the level of an International Mathematical Olympiad gold medalist. While models such as these may seem like they are thinking outside the box, Taghibakhshi explains that it is still far from it.

“It’s really impressive, but Mathematical Olympiad questions and their solutions are known, and it has been trained on thousands and thousands of problems. [AlphaGeometry] cannot solve unsolved problems in mathematics yet because again, they’re really good at interpolation and not extrapolation,” Taghibakhshi said.

The potential of AI to begin thinking outside the box and even surpass human intelligence is what many call “technological singularity”—a hypothetical point in time in the near future when technological growth becomes uncontrollable, whether that be to the benefit or detriment of civilization.

“Things are moving super fast. For example, I was reading a paper and we were trying to prove it, and then the next week, another paper with the same idea had already come out. Taghibakhshi said. “The window is getting smaller and smaller for AIs to surpass human ability and we get the AGI that OpenAI is after.”

The “AGI” that Taghibakhshi mentions is short for artificial general intelligence, a type of AI that will perform cognitive tasks at a human level or better. It remains up to debate whether AGI could pose an existential threat to humanity.

“Not only is AI improving, but computing power is increasing every single day as well. So there’s a lot of things that promote each other,” Taghibakhshi said. “If you consider the videos that OpenAI’s Sora generated recently, versus the videos that were generated just one year ago, it’s amazing how different they are. Again, all these things are only five, six years old.”

While AI researchers estimate that AGI will be achieved by 2050, there are still many sectors of life that AI is influencing today, even in its solely interpolation form. One of the most controversial topics surrounding AI today is its implications in the realm of art. While Taghibakhshi agrees that AI will have a significant effect on human artists, he doesn’t believe that artists will be replaced completely.

“I think [AI] will change the nature of how artists work. Maybe they [use AI] to narrow down to a certain style or ask it to redefine their work,” Taghibakhshi said. “I don’t think it will completely take away all artists. You don’t want a robot to start playing guitar for you.”

As we venture deeper into the terra incognita of the AI world, it remains up to debate whether the pursuit for AGI and a superintelligent machine-learning model will benefit humanity or sink all of us down with it. However, even after years of working with machine learning and mathematics, Ali Taghibakhshi’s sense of awe towards AI remains unclouded.

“Even though it’s stapled to the Earth and I know how these diffusion and language models work, it is still amazing. It doesn’t matter how much you understand these things. It’s still super magical to me.”

Get Involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.