Skip to main content

Physics-Based Machine Learning for Sub-Seasonal Climate Forecasting

By Raleigh Butler

We’ve all heard the old adage that if you don’t like the weather in the Midwest, wait a minute and it will change. So how can we possibly forecast conditions weeks in advance?

In 2019, an NSF collaborative grant was awarded to six institutions to sponsor the study of sub-seasonal climate forecasting (SSF)—with machine learning (ML). This topic addresses three core themes of the Midwest Big Data Innovation Hub—resilient communities, digital agriculture, and cyberinfrastructure. A project of the NSF Harnessing the Data Revolution (HDR) program, this award was to researchers at the following six universities: University of Minnesota–Twin Cities, University of Chicago, University of Wisconsin–Madison, Carnegie Mellon University, George Mason University, and the University of Illinois at Urbana-Champaign.

What is Sub-Seasonal Climate Forecasting?

Sub-seasonal climate forecasting focuses on predicting weather 2–8 weeks away. Interestingly, this is an area of higher difficulty than other types of forecasting. As the research team states on its website, “SSF is considered more challenging than either weather forecasting or even seasonal forecasting.” This effort ties ML together with agriculture in an effort to make these difficult predictions.

Computing’s Place in Forecasting

What is ML compared with deep learning (DL)? Machine learning builds methods for machines to “learn” or change their procedures based on input over time. Deep learning is a specific type of ML and is based on how the human brain operates.

In the linked article below from the SSF team, some difficulties in building models are discussed. Many of these difficulties are tied to the relationship between ML and physics. Therefore, systems have been created for physics-guided ML and ML-enhanced physics. Here’s what some of these systems take into account to overcome the difficulties:

  • • Physics-guided ML takes physics into account to produce output (such as forces affecting movement of clouds, gravity in rainfall, etc.). Unfortunately, existing data that includes physics-related information is limited.
  • • The other approach is ML-enhanced physics. One example of this, among many, is the Monte Carlo Tree Search (MCTS). The MCTS works by applying a hierarchical partition tree to the data. By using this approach, the program follows the sub-“branches” that are most likely in a given situation to produce a prediction. In short, the MCTS works as a decision tree and is optimized to predict the most likely path down each branch with each decision. A visual is provided in the image below.

Drawing of a decision-tree flowchart. Photo by Kelly Sikkema.
Credit: Unsplash, Kelly Sikkema

Sub-Seasonal Agriculture

How does this tie into agriculture? First, we will examine the key planning that takes place during sub-seasonal periods. According to a graph on the SSF project site, these are some important decisions that are made during those periods:

  • Maritime Planning: Designate ship routing
  • Agriculture: Schedule planting
  • Agriculture: Irrigate and apply nutrients
  • Emergency Management: Pre-stage emergency supplies
  • Aviation: Plan evacuations and sorties
  • Water Resources: Manage reservoir levels for flood control
  • Energy: Plan for spikes in energy demand

Making these decisions is a delicate process; there is a high price to pay if predictions are incorrect. Increasing the ability to accurately forecast sub-seasonally is, of course, monetarily valuable; however, it is also valuable in terms of product production and delivery.

These studies have resulted in several scientific publications since the conclusion of the funding. One of these papers, published by many team members of the original study, is published here (available for download as a pdf). The paper, published in June 2020, discusses challenges, analyses, and advances associated with ML climate forecasting. The paper includes several diagrams of how various models predict sub-seasonal weather differently. The models also discuss forecasting in various climate zones (over the ocean, and different areas over land).

Scientists are still collecting data to use as input for the models and to increase accuracy. As mentioned, this area of forecasting is more difficult than forecasting over time horizons that are nearer or further away. Although climate prediction may still be difficult, there is progress being made in the field. The paper mentioned above states, “Overall, XGBoost and Encoder (LSTM)-Decoder (FNN) perform the best. Qualitatively, coastal and south regions are easier to predict than inland regions (e.g., Midwest).”

Get Involved

Learn more about the SSF project on their site.

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

University of Nebraska researchers extend smart rural bridge health initiatives

By Raleigh Butler

Did you know that, despite increases in technology, bridge health across the United States is decreasing? Bridges currently score a C on the country’s infrastructure report card, which is a fall from last year’s grade.

Within the Midwest, the percentage of structurally deficient bridges per state include the following:

  • • Iowa has the largest percentage, 19.0%.
  • • Minnesota has the smallest percentage, 4.7%.

The Midwest Big Data Innovation Hub’s Smart & Resilient Communities priority area spans a range of disciplines, sectors, data, and cyberinfrastructure in its work to connect researchers and practitioners focused on community resilience. Bridges play key roles in community planning, resilient supply chains for food and goods, and in transportation capacity management.


In 2018, a new regional innovation center project, “Smart Big Data Pipeline for Aging Rural Bridge Transportation Infrastructure (SMARTI),” was funded by a $1 million National Science Foundation (NSF) grant. The grant was aimed toward “rural bridge health management” and included faculty from both the University of Nebraska–Lincoln (UNL) and University of Nebraska Omaha (UNO). The work began with a planning grant in 2016, and both awards were part of the NSF’s Big Data Spoke program, in collaboration with the regional Big Data Innovation Hub program.

The principal investigator for the project, Robin Gandhi, is from UNO’s College of Information Science and Technology. The 16 research team members also include Daniel Linzell and Chungwook Sim, both from UNL’s College of Engineering.

The SMARTI project focused on “mining existing data sets from private, state and federal partners, as well as collect[ing] new data through sensors on targeted rural bridges throughout Nebraska.” The outputs of this work were presented through workshops and made available to researchers through the Bridging Big Data website.

“Our government and industry partners can better manage their aging rural bridges, improve their health and ultimately keep people safe using data and tools developed from our research,” said Robin Gandhi. “We continue to engage stakeholders through companion research projects and by presenting our work at relevant technical meetings and conferences. For example, we will be presenting at the Midwest Bridge Preservation Partnership, the American Society of Civil Engineers Structures Congress in April, and the International Association for Bridge Management and Safety Conference in July 2022.”

Student engagement

Six students from both the Lincoln and Omaha campuses who are working on these projects presented their research in October 2021 at the Midwest Big Data Innovation Hub’s Regional Community Meeting, with a focus on the data sets and data science tools that are important to this work. Recordings of their presentations are available on the MBDH YouTube channel.

Next steps

Approximately three years after the start of the SMARTI project, the Nebraska team was awarded $5 million by the Department of Defense Army Corps of Engineers for research to extend the lifespan of bridges through new monitoring technology. This award was announced in October 2021.

The researchers will continue with their work on bridge safety. The team will use rural Nebraska as testbeds for locations to safely collect data, as well as to analyze “socio-technical impacts such as fairness of data, algorithms, and analysis; and intelligent decision-making and support systems.”

“This project brings bridge owners, designers, and builders, big data solution providers, and academics together to discuss data-informed bridge infrastructure health and resilience in times of crisis,” said Daniel Linzell. “Attendees at our last workshop heard from several stakeholders about the pandemic’s impact on bridge infrastructure resilience from design, sensing, economic, and socio-political perspectives. Discussions such as these keep the research team focused on the importance of the work: developing sensing and big data technology applications that support smart, resilient, big data pipelines for aging rural bridge transportation infrastructure; highlighting solutions to data discovery and controlled sharing challenges; and unveiling novel data-driven decision-making tools.”

Get involved

New activities to build the community of Midwest researchers and practitioners in the Smart & Resilient Communities priority area of the Midwest Big Data Innovation Hub are beginning in spring 2022. Contact the Midwest Big Data Innovation Hub if you’re interested in participating, or aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Professor Kimberly Zarecor on Community-Based Research and Building Interdisciplinary Research Teams

By Qining Wang

An expert in Eastern European Architecture, Professor Kimberly Zarecor tells us about her journey of building a highly interdisciplinary research team that takes data science into research on rural communities in Iowa.

Kimberly Zarecor

To some, architectural history and data science research may sound like oil and water—two fields that are almost impossible to mix well. However, Kimberly Zarecor, professor of Architecture at Iowa State University (ISU), leads her research team to create the perfect emulsion of many seemingly unrelated fields: sociology, statistics, industrial design, data science, architecture, and beyond.

With a research focus on small and shrinking communities in rural Iowa, not only does the team uncover the community efforts that keep some of these towns thriving, but the team is also offering the broader research community a valuable lesson on how to bring a wide range of expertise to projects and how experts from different fields can work together in harmony.

Zarecor found her inspiration to study Iowa’s shrinking towns from Ostrava, in the Czech Republic, a city she studied during her PhD research and later lived in for a semester as a Fulbright scholar. “[Ostrava] was part of a study in Europe called the Shrink Smart project, where [researchers] were looking at Ostrava as a shrinking post-industrial European city and questioned how to manage the governance of a relatively large city in the context of population loss.” As Zarecor shifted her primary research focus from architectural history in Eastern European cities to rural population loss in the Midwest, she realized the concept of shrinking smart could also be applied.

Zarecor and her collaborators started exploring the data-science component of shrinking smart with funding from a Smart & Connected Communities planning grant from the National Science Foundation (NSF) in 2017. Researchers at Iowa State University have been collecting data about the quality of life in small Iowa towns through the Iowa Small Town Poll since 1994, but “nobody had ever brought a data-science mindset to the analysis of [this] data.” The sociologists who had been collecting the data did not “think of [the poll] as a large set” and had not thought to build “a predictive model” from it.

Zarecor invited a computer scientist to be part of the planning grant team to transform the Small Town Poll data into training data, from which they could construct models to understand and predict the factors that influence people’s perceptions of quality of life in small rural communities. “We realized that what we were trying to understand is what are the actions that people in communities take as inputs into a system that results as outputs on the other side, as increases in perceptions of quality of life,” Zarecor explained. The planning grant team, consisting of a computer scientist, a sociologist, a community and regional planner, and two architects, found that “the best way to define [rural smart shrinkage] is that you are actively pursuing specific activities that you as a community can do together” that contribute to improved perceptions of quality of life even as population loss continues.

In 2020, Zarecor received another NSF grant of $1.5 million to continue this research and investigate strategies to address the data deficit in shrinking rural communities.

As the scope of the research expanded, so has Zarecor’s team. In addition to Zarecor and rural sociologist David Peters, who was also a Co-PI on the planning grant, the team now includes a community economic development specialist and a community arts specialist from ISU Extension and Outreach (both are also faculty in the College of Design at Iowa State), an industrial design faculty member, masters students from industrial design and community and regional planning, and for the data science work, three statistics faculty and three statistics PhD students. The Iowa League of Cities is also a partner on the project.

Coming to data science with little technical understanding, Zarecor approaches the data science component more from an intuitive rather than conceptual perspective: “It’s not that I understand the statistics, but I understand [the goals] as we go step by step . . . [and] the power of the tools that [the statisticians] are building.”

To lead such a highly interdisciplinary team, Zarecor thinks of herself as a bridge-builder within the team. Zarecor helps the members of her team understand data science by asking questions in a way that they can elicit responses that deepen the understandings of the nontechnical team members. “I like having that [bridge] function because it’s asking questions as a way of learning. For me, just the conversations with the data scientists helped me to better understand the data science part of our project.”

And the bridge function goes both ways. In addition to helping non-data-science experts learn more about the potential of data science, Zarecor also cultivates data scientists’ ability to contribute to projects that are community-based. “When it comes to community-based work, the assumption that this is not an expertise of its own is something that’s a challenge for the field, because doing work in communities is its own expertise,” Zarecor explained. Even though the residents in rural Iowa are the direct beneficiaries of the work from Zarecor’s team, the knowledge gap with respect to finding and using data makes those benefits inaccessible to some residents. Meanwhile, data scientists often lack the skills to convey their findings to an audience outside their academic circle. “As a field, data science, in my opinion, has not done a good job to educate necessarily well-rounded [data scientists].”

To overcome this bottleneck, Zarecor’s team works on creating dashboards that visualize the data and make the data more interpretable to the rural communities. Zarecor also encourages the statisticians on her team to talk to residents of the communities they study and ask what kind of data they would like to have. “When we ask what they want, it’s not because they know everything that’s available. We’re doing a mix of hearing from them what they want, and also guessing some things that they probably don’t know are out there that we can also give them in a usable form.”

Zarecor believes that similar types of highly collaborative and interdisciplinary research would benefit the entire research community, and those collaborations start with abandoning assumptions of different fields.

She gives an example in the discipline of architecture, where architects would assume themselves to be capable of doing graphic design or planning. Many don’t realize that those tasks are outside of their expertise even though these fields are seemingly adjacent. “And I would transfer that over to data scientists who know that data science is a synthetic and integrative discipline. [. . .] It doesn’t mean, though, that there are not all of these soft skills, all of this other communication, and people-related aspects of the data science work that you can handle without help.”

Therefore, Zarecor suggests that data scientists should work in conjunction with domain experts to make their research more relatable to a broader audience. Team members also need to respect the importance and specificity of other kinds of expertise beyond the technical or data-driven parts of a project. When a team successfully works this way, “the data science gets improved and amplified and becomes more useful. If you actually think horizontally on the project, you know that there’s not a pyramid, but that you are a team that’s working across the group [of collaborators]. This would be a much healthier way of [working with] data and for data scientists to interact with people.”

In this regard, Zarecor noted that the Midwest Big Data Innovation Hub, as a highly integrated and inclusive organization, has the potential to cultivate different layers of collaboration across various disciplines. “But it does require the data scientists who were the first audience, or the more explicit audience [for the Hub], to be willing to open up.”

Get Involved

New community-building activities in the Smart & Resilient Communities priority area of the Midwest Big Data Innovation Hub are beginning in spring 2022. Contact the Hub if you’re interested in participating, or are aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

How Do Scientists Help AI Cope with a Messy Physical World?

By Qining Wang

When we see a stop sign at an intersection, we won’t mistake it for a yield sign. Our eyes recognize the white “STOP” letters printed on the red hexagon. It doesn’t matter if the sign is under sunlight or streetlight. It doesn’t matter if a tree branch gets in the way or someone puts graffiti and stickers on the sign. In other words, our eyes can perceive objects under different physical conditions.

A stop sign. Photo by Anwaar Ali.
Photo by Anwaar Ali via Unsplash

However, identifying road signs accurately is very different, if not more difficult, for artificial intelligence (AI). Even though, according to Alan Turning, AIs are systems that can “think like humans,” they can still present limitations in mimicking the human mind, depending on how they acquire their intelligence.

One of the potential hurdles is to correctly interpret variations in the physical environment. Such a limitation is commonly referred to as an “adversarial example.”

What Are Adversarial Examples?

Currently, the most common method to train an AI application is machine learning, a type of AI process that helps AI systems learn and improve from experience. Machine learning is like the driving class an AI needs to take before it can hit the road. Yet machine-learning-trained AIs are not immune to adversarial examples.

Circling back to reading the stop sign, an adversarial example could be the stop sign turning into a slightly darker shade of red at night. The machine-learning model captures these tiny color differences that human eyes cannot discern and might interpret the signs as something else. Another adversarial example could be a spam detector that fails to filter a spam email formatted like a normal email.

Just like how unpredictable individual human minds can be, it is also difficult to pinpoint the exact origin of what and why machine learning makes certain predictions. Neither is it a simple task to develop a machine-learning model that comprehends the messiness of a physical world. To improve the safety of self-driving cars and the quality of spam filters, data scientists are continuously tackling the vulnerabilities in the machine-learning processes that help AI applications “see” and “read” better.

What Are Humans Doing to Correct AI’s Mistakes?

To defend against adversarial examples, the most straightforward mechanism is to let machine-learning models analyze existing adversarial examples. For example, to help the AI of a self-driving car to recognize stop signs under different physical circumstances, we could expose the machine-learning model that controls the AI to pictures of stop signs under different lightings or at various distances and angles.

Google’s reCAPTCHA service is an example of such a defense. As an online safety measure, users need to click on images of traffic lights or road signs from a selection of pictures to prove that they are humans. What users might not be aware of is that they are also teaching the machine-learning model what different objects look like under different circumstances at the same time.

Alternatively, data scientists can improve AI by teaching them simulated adversarial examples during the machine-learning process. One way is to implement a Generative Adversarial Network (GAN).

GANs consist of two components: a generator and a discriminator. The generator “translates” a “real” input image from the training set (clean example) into an almost indistinguishable “fake” output image (adversarial example) by introducing random variations to the image. This “fake” image is then fed to the discriminator, where the discriminator tries to tell the modified and unmodified images apart.

The generator and the discriminator are inherently in competition: The generator strives to “fool” the discriminator, while the discriminator attempts to see through all its tricks. This cycle of fooling and being fooled repeats. Both become better at their own designated tasks over time. The cycle continues until the generator outcompetes the discriminator, creating adversarial examples that are indistinguishable to the discriminator. In the end, the generator is kept to defend against different types of real-life adversarial attacks.

AI Risks and Responses

GANs can be valuable tools to tackle adversarial examples in machine learning, but they can also serve malicious purposes. For instance, one other common application of GANs is face generation. This so-called “deepfake” makes it virtually impossible for humans to tell a real face from a GAN-generated face. Deepfakes could result in devastating consequences, such as corporate scams, social media manipulation, identity theft, or disinformation attacks, to name a few.

This shows how, as our physical lives become more and more entangled with our digital presence, we can never neglect the other side of the coin while enjoying the benefits brought to us by technological breakthroughs. Understanding both would serve as a starting point for practicing responsible AI principles and creating policies that enforce data ethics.

Tackling vulnerabilities in machine learning matters, and so does protecting ourselves and the community from the damage that those technologies could cause.

Learn More and Get Involved

Curious whether you can tell a real human face from a GAN-generated face? Check out this website. And keep an eye out for the Smart & Resilient Communities priority area of MBDH, if you wish to learn more about how data scientists use novel data science research to benefit communities in the Midwest. There are also several NSF-funded AI Institutes in the Midwest that are engaged in related research and education.

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Guest post – Diverse programs from ISU address sustainable cities challenges

By Iowa State University’s Sustainable Cities team

Researchers with the Sustainable Cities team at Iowa State University recognize the difficulty that public officials face in transforming vast amounts of climate and energy research into contextualized public policy. In attempting to address this critical issue, the team’s mission goes beyond the creation of new climate analysis tools to also investigate new methods for integrating communities into the discourse of data creation and energy conservation. To accomplish this agenda, our team engages in various research avenues that range from the creation of new spatial-data tools to enabling community youth activism. Here are just a few highlights of the team’s most recent achievements:

Sustainable Cities’ team leader Ulrike Passe, associate professor of architecture, presented our hybrid physics data modeling framework at the National Science Foundation-sponsored Research Coordination Networking (RCN) workshop held at Carnegie Mellon University on May 17, 2018. The presentation, which capstones one of the major branches of the Sustainable Cities initiatives, demonstrated the integration of our recently developed thermo-physical data simulator with our research into human energy-use behavior to demonstrate how a more holistic neighborhood energy model could be constructed. This same model was presented by graduate research assistant Himanshu Sharma at the fifth High Performance Building’s Conference on July 9, 2018, at Purdue University.

image from Krejci et al. (2016)

The Community Growers Program, a public-engagement initiative started back in March of 2017, has become another core pillar of the Sustainable Cities group research. Spanning a course of eight weeks, researchers worked with 22 leadership-minded youth in the Baker Chapter of the Boys and Girls Club at Hiatt Middle School in Des Moines, Iowa, to create a community garden based on a methodology of spatial, socio-technical storytelling. Through this process, the youth participants were able to learn more about their community through access to geographic information system (GIS) and spatial mapping tools. Associate English professor Linda Shenk, our community engagement lead, and Mallory Riesberg, a collaborator with the Baker Chapter of the Boys and Girls Club, presented this methodology in a presentation titled, “Fostering the Next Generation of Big Data Scientists and Sustainable City Planners” at The Growing Sustainable Communities Conference in Dubuque, Iowa, on Oct. 4, 2017. Team members Linda Shenk, Passe and Alenka Poplin, assistant professor of community and regional planning, would later be published in the 35th Journal of Interaction Design and Architectures for the inclusion of this work in their entry, titled, Engaging Youth with Pervasive Technologies for Resilient Communities.

Poplin, an established researcher in the field of geo-spatial mapping, also leads a research group that seeks to understand how to better develop feedback loops through innovative user-interfaces. An inquiry into mapping places of emotional power was highlighted in a 2017 paper entry to the second edition of Kartographische Nachrichten on Empirical Cartography Journal, titled, “Mapping Expressed Emotions: Empirical Experiments on Power Places.” More recently, Poplin and her researcher team have begun testing an energy survey game they have developed called E-Footprints. The framework of this game includes the extraction of user-performance data to measure and analyze what learning opportunities may help guide more environmentally efficient decision making. This feedback is then generated back into learning mini-games throughout the game, such that the user gets more “energy savvy” as they play. This project begins field-testing in November 2018.

With a diverse, multifaceted research team of nearly 50 members, the Sustainable Cities group continues to advance the capabilities of communities and cities to think sustainably about a better future.


Image reference:

Krejci, C. C., Passe, U., Dorneich, M. C., & Peters, N. (2016), “A Hybrid Simulation Model for Urban Weatherization Programs”, Proceedings of the 2016 Winter Simulation Conference, Arlington, VA, December 11–14. T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds. (pdf)


Read more about the MBDH’s Smart, Connected, and Resilient Communities initiatives.

MBDH partners on US Ignite Reverse Pitch challenge

part of Hub’s focus on Smart, Connected, and Resilient Communities

US Ignite Hackathon
UIUC collaborators and mentors meet with HackIllinois teams on US Ignite Challenge

The University of Illinois at Urbana-Champaign (UIUC) was awarded a $20,000 grant from US Ignite to host a Smart Gigabit Communities Reverse Pitch Challenge. The MBDH, along with other local partners (see below), contributed towards matching the grant, bringing to $40,000 the total resources available to support the development of smart gigabit applications for the benefit of the local community. Read More