Skip to main content


Meet the MBDH Fall 2022 science communications and outreach interns

For Fall 2022, the Midwest Big Data Innovation Hub (MBDH) has four new interns joining the team to work on a variety of projects. One intern, Shruti Ravichandran, is focused on outreach to help build our student community platforms. Three others, Aisha Tepede, Isabel Alviar, and Sasha Zvenigorodsky, will be focused on science communication, helping to tell the stories of our collaborators and amplify the many community-led data science projects in the Hub’s 12-state region. All will learn about the range of activities and communities the MBDH is involved in, will receive mentoring, and will have opportunities for career development. Below are details on the great backgrounds and interests the students bring to the MBDH community.

Aisha Tepede

Aisha Tepede (she/her) is a Science Communications Intern at MBDH this semester. She is a second-year Master of Public Health (MPH) student in the College of Applied Health Sciences at the University of Illinois at Urbana-Champaign (UIUC). She will be graduating this December with a concentration in Health Promotion and Education. Aisha has previously worked with Cook County Health as a case investigator for COVID-19 and with the National Institutes of Health as a clinical research fellow, focusing on a rare disease called multiple endocrine neoplasia type 1.

She has a social and behavioral health background involving chronic diseases and underrepresented populations. With this interest, she branched out into the global health research realm. She had the opportunity to spend the summer in Kenya by participating in the Minority Health and Health Disparities Research Training Program (MHRT) funded by National Institute on Minority Health and Health Disparities (NIMHD), where she spent time researching sexual and reproductive health training with adolescent students.

Long term, Aisha has a goal of becoming a public health physician-scientist. She states, “I plan to use my experiences and background to be able to improve communication between physicians and marginalized patients—whether that means patients with a rare disease or a part of an underserved community.” Apart from her aspiration for proper clinician and patient communication, she says “I envision myself as a physician who will actively engage in improving the health of underserved populations, through a combination of community health research and culturally sensitive approaches to medicine and patient care.”

Isabel Alviar

Isabel Alviar is joining MBDH as a Science Communications Intern this semester. She is a senior at UIUC studying Computer Engineering with a minor in Statistics. Next year, she plans on pursuing her master’s degree in Computer Science, specializing in either artificial intelligence or data science. Currently, she is developing parallel-computing machine problems for programming classes at UIUC, and analyzing and summarizing data for an engineering education research conference.

Isabel is interested in pursuing a career that revolves around using data, whether as a software engineer or data scientist/analyst. This summer, she worked at Procter & Gamble (P&G) as a software engineer intern in their Data & Analytics department, automating the process of importing and updating metadata between objects in data platforms to a central Data Catalog. She also pitched the idea of a smart chatbot for the catalog and created a prototype using artificial intelligence/machine learning (AI/ML) that will continue being implemented by P&G based on her code and research.

She believes that the work being done by the Midwest Big Data Innovation Hub is exciting and inspiring. Isabel hopes to use her passion for science and technology to bring people’s stories, research, and scientific discoveries to life through writing. One of her favorite quotes is, “The science of today is the technology of tomorrow.”

Sasha Zvenigorodsky

Sasha Zvenigorodsky is joining MDBH this semester as a Science Communications Intern. As a senior at UIUC, Sasha is pursuing a BS degree in Crop Sciences. Outside of class, Sasha has been conducting research with UIUC’s Small Grains Improvement lab under Dr. Jessica Rutkoski, studying the correlation between vernalization and overall grain yield of winter wheat.

As a scientific researcher herself, Sasha is conscious of the important intersections between science and writing. Sasha says, “A major part of scientific research is the process of converting it into a language that can be easily understood by both experts and nonexperts alike.” Through writing, she hopes to make new scientific findings and developments more accessible to the public.

Sasha aspires to use her own experience working within a STEM field as well as her passion for creative writing to raise awareness for new innovations and findings in science. “Ultimately, giving individuals the right tools to stay educated and aware is the best way to catalyze positive change in society today,” she says.

Shruti Ravichandran

Shruti Ravichandran is joining MBDH as a Project Coordination Intern in Fall 2022. She is a first-year master’s student majoring in Information Management.

She gained interest in the field of data during her undergraduate degree in Electronics and Telecommunication Engineering, while researching about this field online to write an article for a technical magazine published by her school. She began building her skill set in analytics and landed a job at ZS Associates, India, as a Decision Analyst after she graduated in 2020. At ZS, she worked in the healthcare vertical on several big data analytics and data science projects in therapy areas such as leukemia, multiple sclerosis, and glaucoma. These experiences brought her the realization that information management has immense potential to influence actions and decisions that make the world a better place. She aspires to work on such endeavors during her career as a data professional.

She sees working at the Midwest Big Data Innovation Hub as a huge opportunity for her to give back to the community of data professionals by bringing together student groups across the region that are interested in this field. Her goal is to help build a community of data enthusiasts that understand the power of analytics, the responsibility they have to uphold the ethics of handling information, and the positive change that it can bring in a wide range of fields such as education, agriculture, and healthcare, among others.

MBDH Executive Director John MacMullen said, “We’re excited to be able to continue this intern program for another year. The incoming students bring diverse experiences and a wide range of interests. We look forward to having the MBDH community engage with them to tell the stories of the innovative work happening across the region.”

The MBDH has a number of events planned for Fall 2022, including our ongoing webinar series: the Collaboration Cafe, Midwest Carpentries Community, and Data Science Student Groups series, and the Water Data Forum, all open to participation from people across the region.

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in our activities, which include a data science student community.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

National Workshop on Data Science Education Featured Multiple Hub Talks

Kim Bruch, West Big Data Innovation Hub Science Writer

Organized by UC Berkeley’s Division of Computing, Data Science, and Society, with support from Microsoft and the West Big Data Innovation Hub, the Summer 2022 National Workshop on Data Science Education offered an array of insight about current data science education initiatives across the academic spectrum, from high school to undergraduate and graduate level programs as well as adult learners.

The latter two days of the workshop focused on national perspectives and programs for data science education, including student driven data science communities of support and learning. The National Science Foundation (NSF) Big Data Innovation Hubs hosted two panels alongside a program of presenters that discussed topics such as investigating the ethics behind algorithms, incorporating Python into statistics and computer science classes, and the latest developments in data science education and community building.

“The West Hub was happy to coordinate the NSF Big Data Hubs’ contribution to this workshop,” said West Big Data Innovation Hub Executive Director Ashley Atkins. “It was an opportunity to share with a national audience some of the undergraduate-focused work the Hubs are pursuing across the country.”

Many lessons learned were discussed during the NSF Big Data Hub panel entitled “Building National Capacity for Student-Driven Data Science Communities.” The panel was moderated by Northeast Big Data Innovation Hub Executive Director Florence Hudson and included presentations by John MacMullen, Emily Rothenberg, Scott Blender, Abhishek Sinha and Rajeev Bukralia.

“The National Student Data Corps began as a grassroots effort in the Northeast region in 2021, and grew to nearly 3,000 community members by June 2022 across the U.S. and in 20 countries around the world,” said Hudson. “Students, professors, industry and nonprofit data science professionals worked together to build this dynamic community of support to increase data science awareness and provide free open online data science resources for students and educators, along with data science career panels, mentoring via a 500-person slack channel, career and chapter resources. We are working together to democratize data science for all.”

Temple University Engineering and Data Science Student Scott Blender talked about the National Student Data Corps (NSDC) from a student perspective—focusing on goals of the chapter systems. He said that their aim is “to inspire, educate, and serve local communities with professional development opportunities by leveraging NSDC resources and events.”

A similar student-aimed program discussed was the Midwest Big Data Innovation Hub’s Data Science Student Groups Community. Rajeev Bukralia, professor at Minnesota State University, Mankato, also spoke about his development of the Data Resources for Eager and Analytical Minds (DREAM) student group, which is the largest registered student organization on campus, and brings data science perspectives to students from many disciplines. Details about both DREAM and NSDC can be found on their respective websites.

“We are focused on building a group of student leaders to share best practices about how to grow inclusive, multi-disciplinary student organizations,” said Executive Director of the Midwest Big Data Innovation Hub John MacMullen. “Learning from more established groups such as DREAM can help newer student organizations understand how to build strong, diverse teams with engaged participants.”

Another great NSF Big Data Hub Panel at the workshop was entitled Data Science Program Development. South Big Data Hub Executive Director Renata Rawlings-Goss of Georgia Tech opened the panel with a thorough explanation of how they developed their data science education efforts.

West Hub principal investigator Jennifer Chayes gave an overview of Berkeley’s Division of Computing, Data Science, and Society (CDSS), where she serves as associate provost.

Eric Van Dusen speaking during a panel discussion. Photo by KLCfotos.
Workshop organizer Eric Van Dusen, outreach and technology lead for the Data Science Undergraduate Studies program, speaks during a panel discussion. (Photo/ KLCfotos)

“This is the fifth annual conference and the West Big Data Hub has always been a key partner-stakeholder in convening folks in this space. It was great to have multiple hubs collaborating to share so many perspectives,” said CDSS Technology and Outreach Lead Eric Van Dusen, who organized the workshop.

New Precision Agriculture Initiatives in the Midwest

By Raleigh Butler

Recently, there has been a large amount of U.S. federal funding directed toward next-generation precision-agriculture initiatives. This article summarizes a few such projects based in the Midwest.


A new project called I-FARM was recently awarded funding by the USDA’s National Institute of Food and Agriculture (NIFA) in May 2022 under the “Farm of the Future” program. The Illinois Farming and Regenerative Management project will focus on sustainability in farming practices. I-FARM, led from the University of Illinois, is a collaborative study across the Institute for Sustainability, Energy, and Environment (iSEE) and the Center for Digital Agriculture (CDA), which is based at the National Center for Supercomputing Applications (NCSA). The project, funded with $3.9 million in grant money, is planned to last three years. For this very competitive program, only one project across the nation received funding.

According to the NIFA website, “The Farm of the Future Program integrates advances in precision agriculture, smart automation, resilient agricultural practices, socioeconomics, and plant and animal performance.”

The I-FARM project will focus on bettering these aspects of agriculture. Of course, as the world changes due to climate change and pollution, sustainability is an area of increasing concern. “Together, this integrated suite of solutions will lead to sustainable ways of meeting growing demand for agriculture in a changing climate,” said Co-PI and iSEE Interim Director Madhu Khanna, the Distinguished Professor of Agricultural & Consumer Economics at the University of Illinois.

I-FARM was seed-funded by iSEE’s “Campus as a Living Laboratory” program and now has received the grant from USDA NIFA. During the three years, the 80-acre I-FARM test bed “will feature improved precision farming with remote sensing; new under-canopy autonomous robotic solutions for cover-crop planting, variable-rate input applications, and mechanical weeding; and artificial intelligence-enabled remote sensing for animal health prediction, nutrient quantification, and soil health.”


Other recently funded projects focus on leveraging artificial intelligence (AI) to benefit agricultural research and translations of this work to impact practitioners and communities. One project is AIFARMS, or “Artificial Intelligence for Future Agricultural Resilience, Management, and Sustainability.” Led by PI Vikram Adve in the Center for Digital Agriculture at the National Center for Supercomputing Applications, AIFARMS “covers autonomous farming, efficiency for livestock operations, environmental resilience, soil health, and technology adoption.”


The ICICLE project combines elements similar to those of both I-FARM and AIFARMS. Led by The Ohio State University (OSU), the institute’s acronym stands for “Intelligent Cyberinfrastructure with Computational Learning in the Environment.” The project will integrate AI (like AIFARMS) but focus primarily on crops and soil. It will use technology such as field sensors to help maximize agricultural production. According to an OSU article, “The institute (led by Dhabaleswar K. Panda) will build the next generation of cyberinfrastructure with a goal of making AI data and infrastructure more accessible to the larger society.”


AIFARMS, ICICLE, and a third project, AIIRA, were all funded under the NSF AI Institutes program, which includes a partnership with the USDA’s National Institute of Food and Agriculture (NIFA), which is providing the funding for the AIIRA project. AIIRA is the “AI Institute for Resilient Agriculture,” and includes stakeholders from academia, government, and industry. Led by PI Baskar Ganapathysubramanian from Iowa State University, the project has a vision “to create new AI-driven, predictive digital twins for modeling plants, and deploy them to increase the resiliency of the nation’s agricultural systems.”

All of these projects demonstrate high interest across sectors in precision-agriculture innovations that can make the transition from academic research labs and demonstration projects to deployment at scale for agricultural production that can meet the country’s changing needs.

Get Involved

The Midwest Big Data Innovation Hub (MBDH) co-leads a new working group sponsored by the Institute of Electrical and Electronics Engineers Standards Association (IEEE SA) to understand agricultural data needs across the food supply chain.

Contact the Midwest Big Data Innovation Hub to learn more, or if you’re aware of other people or projects we should profile here. We invite participation in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Toward Building Quality Relationships: How Chatbots Can Help Us Practice Self-Disclosure

By Qining Wang

Under the turmoil of social events, from global pandemics to wars and social unrests, mental health is becoming an increasingly greater concern among the public.

According to the Anxiety and Depression Association of America (AADA), anxiety disorders are the most common mental illness in the USA, affecting 40 million adults. Another common mental health illness, depression, affects 16 million adults in the USA, according to statistics from the Centers for Disease Control and Prevention (CDC). The greater awareness and gradual destigmatization of mental health issues have led more people to seek professional help to improve their overall mental well-being.

When working with mental health professionals, self-disclosure is vital to finding the roots and triggers of mental health issues. Self-disclosure is a process through which a person reveals personal or sensitive information to others. It is a crucial way to relieve stress, anxiety, and depression.

Meanwhile, self-disclosure is a skill that one needs to cultivate through practice. It’s a skill we can only practice through constant self-exploration and the courage to be vulnerable.

To investigate alternative ways of practicing self-disclosure, a research team at the University of Illinois at Urbana-Champaign (UIUC) explored chatbots and conversational AIs as potential mediators in the self-disclosure process in a study in 2020. The team leader, Dr. Yun Huang, is an assistant professor in the School of Information Sciences at UIUC and the co-director of the Social Computing Systems (SALT) Lab. The team is mainly interested in context-based social computing system research.

Chatbots are ubiquitous in today’s online world. They are computer programs interacting with humans back-and-forth, like having a conversation. Some chatbots are task-oriented. An example can be a frequently-asked-questions (FAQ) chatbot that recognizes the keywords a person types and spits out a preset answer according to the keywords. Other more sophisticated chatbots, such as Apple’s Siri and Amazon’s Alexa, are data-driven. They are more contextually aware and can tailor their responses based on user input. Both are ideal qualities for designing an empathetic and tone-aware chatbot capable of self-disclosure.

As such, Dr. Huang’s team built a self-disclosing chatbot that can engage in conversation more naturally and spontaneously. The chatbot would initiate self-disclosure during small-talk sessions. It would gradually move to more sensitive questions that encourage users to self-disclose.

To study how chatbots’ self-disclosure can affect humans’ willingness to self-disclose, the team recruited university students and divided them into three groups. Each group would interact with the chatbot at different levels of self-disclosure, from no self-disclosure to low and high levels of self-disclosure.

During the four-week study, the student participants would interact with the chatbot every day for 7–10 minutes. At the end of the third week, the chatbot would recommend that students interact with a human mental health specialist. The researchers would then evaluate students’ willingness to self-disclose to the professional.

The team found that the groups that self-disclosed to the chatbot reported greater trust in the mental health professional than the control group. Participants felt “confused” when the chatbot brought up the human professional. In the experimental groups, they felt that they could listen to the chatbot and share sensitive experiences.

The team noted that, for participants interacting with the chatbot with the highest level of self-disclosure, their trust for the mental health professional stemmed from the trust of the chatbot. Participants’ trust was mainly directed toward the research team and professionals behind the chatbot for the other two groups.

This study highlights how chatbots can be a great tool to help users practice self-disclosure, making them more comfortable seeking human professionals. It is worth noting that, regardless of how sophisticated chatbots can be, they are just mediators between users and mental health professionals.

At the end of the day, the most meaningful kind of self-disclosure can only be found through care, empathy, and understanding. Human to human.

Get Involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Physics-Based Machine Learning for Sub-Seasonal Climate Forecasting

By Raleigh Butler

We’ve all heard the old adage that if you don’t like the weather in the Midwest, wait a minute and it will change. So how can we possibly forecast conditions weeks in advance?

In 2019, an NSF collaborative grant was awarded to six institutions to sponsor the study of sub-seasonal climate forecasting (SSF)—with machine learning (ML). This topic addresses three core themes of the Midwest Big Data Innovation Hub—resilient communities, digital agriculture, and cyberinfrastructure. A project of the NSF Harnessing the Data Revolution (HDR) program, this award was to researchers at the following six universities: University of Minnesota–Twin Cities, University of Chicago, University of Wisconsin–Madison, Carnegie Mellon University, George Mason University, and the University of Illinois at Urbana-Champaign.

What is Sub-Seasonal Climate Forecasting?

Sub-seasonal climate forecasting focuses on predicting weather 2–8 weeks away. Interestingly, this is an area of higher difficulty than other types of forecasting. As the research team states on its website, “SSF is considered more challenging than either weather forecasting or even seasonal forecasting.” This effort ties ML together with agriculture in an effort to make these difficult predictions.

Computing’s Place in Forecasting

What is ML compared with deep learning (DL)? Machine learning builds methods for machines to “learn” or change their procedures based on input over time. Deep learning is a specific type of ML and is based on how the human brain operates.

In the linked article below from the SSF team, some difficulties in building models are discussed. Many of these difficulties are tied to the relationship between ML and physics. Therefore, systems have been created for physics-guided ML and ML-enhanced physics. Here’s what some of these systems take into account to overcome the difficulties:

  • • Physics-guided ML takes physics into account to produce output (such as forces affecting movement of clouds, gravity in rainfall, etc.). Unfortunately, existing data that includes physics-related information is limited.
  • • The other approach is ML-enhanced physics. One example of this, among many, is the Monte Carlo Tree Search (MCTS). The MCTS works by applying a hierarchical partition tree to the data. By using this approach, the program follows the sub-“branches” that are most likely in a given situation to produce a prediction. In short, the MCTS works as a decision tree and is optimized to predict the most likely path down each branch with each decision. A visual is provided in the image below.

Drawing of a decision-tree flowchart. Photo by Kelly Sikkema.
Credit: Unsplash, Kelly Sikkema

Sub-Seasonal Agriculture

How does this tie into agriculture? First, we will examine the key planning that takes place during sub-seasonal periods. According to a graph on the SSF project site, these are some important decisions that are made during those periods:

  • Maritime Planning: Designate ship routing
  • Agriculture: Schedule planting
  • Agriculture: Irrigate and apply nutrients
  • Emergency Management: Pre-stage emergency supplies
  • Aviation: Plan evacuations and sorties
  • Water Resources: Manage reservoir levels for flood control
  • Energy: Plan for spikes in energy demand

Making these decisions is a delicate process; there is a high price to pay if predictions are incorrect. Increasing the ability to accurately forecast sub-seasonally is, of course, monetarily valuable; however, it is also valuable in terms of product production and delivery.

These studies have resulted in several scientific publications since the conclusion of the funding. One of these papers, published by many team members of the original study, is published here (available for download as a pdf). The paper, published in June 2020, discusses challenges, analyses, and advances associated with ML climate forecasting. The paper includes several diagrams of how various models predict sub-seasonal weather differently. The models also discuss forecasting in various climate zones (over the ocean, and different areas over land).

Scientists are still collecting data to use as input for the models and to increase accuracy. As mentioned, this area of forecasting is more difficult than forecasting over time horizons that are nearer or further away. Although climate prediction may still be difficult, there is progress being made in the field. The paper mentioned above states, “Overall, XGBoost and Encoder (LSTM)-Decoder (FNN) perform the best. Qualitatively, coastal and south regions are easier to predict than inland regions (e.g., Midwest).”

Get Involved

Learn more about the SSF project on their site.

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Building an accessible agricultural data community with the National Agricultural Producers Data Cooperative

By Raleigh Butler

Romaine lettuce crop grown on a city farm in Moscow. Photo by Petr Magera.
Photo by Petr Magera/Unsplash

Entities around the world gather data focused on various aspects of agriculture. Unfortunately, this information is not always accessible or easily available for those who need it. The National Agricultural Producers Data Cooperative (NAPDC) project recognizes that agriculture is a keystone of society and a critical piece of national solutions to climate-related challenges. The NAPDC, with support from the United States Department of Agriculture (USDA), aims to enable agricultural producers to benefit from the massive amounts of data generated by members of their community. As the NAPDC site states, the goal of the project is to create a “blueprint” for a national data framework where agricultural entities “can store and share data . . . to maximize their production and profitability.”

With enough available data and methods to extract relevant information, national agricultural systems can become more efficient and profitable. The framework being developed by the NAPDC will include data from many types of agricultural contexts and agricultural institutions, first and foremost the producers that drive agricultural productivity. Making the system diverse yet robust while safeguarding farmer privacy will result in a more reliable set of data for the entire agricultural community.

The NAPDC project emphasizes providing resources to community partners through webinars and seed grants in order to “identify needs and opportunities as well as challenges in physical infrastructure, education and human resources, and critical use cases” critical to the success of a future data framework. The project recognizes that a secure framework is necessary to protect privacy and governance information; these aspects will be carefully considered. The project also recognizes the importance of land-grant institutions and agricultural extension in the successful deployment of any framework.

The NAPDC project has a seed grant program to support development of community activities, with a deadline of June 1, 2022. It will be granting 4–6 awards; complete guidelines are listed on the site here. The grants will not be limited to principal investigators at universities; rather, any institution eligible for USDA funding may apply. As stated on the website, “individuals willing and qualified to lead representation for a national or regional agroecosystem are encouraged to apply.”

“The work of the NAPDC aligns well with the Digital Agriculture community of the Midwest Big Data Innovation Hub,” said MBDH Executive Director John MacMullen. “We anticipate integrating findings from our Community Data Needs Assessment (Community DNA) activities, which are helping to understand the data needs of stakeholders across the food supply chain, with the work of the NAPDC. We also look forward to partnering with the NAPDC team on our agricultural data work with the IEEE Standards Association and other partners.”

Jennifer Clarke, lead PI of the NAPDC project and faculty at the University of Nebraska–Lincoln, hopes the project serves as an initial step towards a national framework. “This project represents the willingness of the USDA to listen to agricultural producers and support the data needs of producer communities,” said Dr. Clarke. “This project provides producers and stakeholders with a vehicle for communicating their challenges related to data, and provides educators and researchers with a vehicle for proposing solutions to these challenges.”

The NAPDC will host an All-Hands Meeting in the spring of 2023 at the University of Nebraska–Lincoln that will highlight the work of the NAPDC and discussions of specific areas for future USDA investment. Interested members of the community can sign up for the project listserv through the project website ( to receive updates about this meeting as well as project information.

Get involved

Do you have an agricultural data success story or case study to share from your organization? Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

I-GUIDE: Increasing Sustainability by Harnessing Data

By Raleigh Butler

Gravity dam in Marion County, Oregon. Photo by Dan Meyers.
Photo by Dan Meyers/Unsplash

Sustainability is not just achieved through solar panels and windmills. Of course these help, but one organization is working to tackle sustainability on a larger scale: I-GUIDE is a collaborative environment for sharing and using geospatial data. It is community-oriented and works to address sustainability challenges.

“I-GUIDE” stands for “Institute for Geospatial Understanding through an Integrative Discovery Environment.” This project is funded by the National Science Foundation (NSF) under the Harnessing the Data Revolution program. Awarded in 2021, the institute is led by PI Shaowen Wang, head of the Department of Geography and Geographic Information Science at the University of Illinois. The institute has partners from across the country, including MBDH collaborators such as EarthCube, CUAHSI, the University of Minnesota, Columbia University, and the Discovery Partners Institute.

As the I-GUIDE site states, “most challenging sustainability and resilience problems today require expertise from multiple domains and geospatial data science.” I-GUIDE acts as a main point for qualified entities to access varying types of data. For example, I-GUIDE allows other participating entities to access the data stored in HydroShare, a system from CUAHSI, the Consortium of Universities for the Advancement of Hydrologic Science, Inc. The HydroShare infrastructure can be used to share data as well as analyze and visualize those data. I-GUIDE brings together other related programs. This allows increased knowledge on the subjects of sustainability, and the supporting data. I-GUIDE currently has data being added to it in the fields of water, geospace, geography, and the atmosphere.

“The institutional collaborations facilitated by this project will enable the I-GUIDE team as well as the broader community to explore a wide range of interdisciplinary science questions that leverage an interconnected network of software and cloud infrastructure,” said Dr. Anthony Castronova,
Senior Research Hydrologist at CUAHSI. “These types of institutional connections are critical to support water science research around pressing environmental issues that require modern software, data, and modeling approaches.”

Environmental issues often present themselves in one way (e.g., a drought) when the problem at hand is much larger than the assumed cause (a lack of rainfall). As the climate changes, droughts and other environmental changes can become increasingly harmful to current ecosystems. HydroShare cultivates collaboration in water-focused areas such as drought conditions, water quality, temperature, and soil moisture. These data act as the first step to help promote sustainability and resilience.

I-GUIDE holds regular webinars. The first in the series, held on March 23, 2022, explored the need for geospatial education when sustainability is growing more important every day. Led by Eric Shook from the University of Minnesota, the webinar emphasized the need for building diverse communities of instructors and learners to build best practices for cyberinfrastructure (CI) literacy, and lower the barriers for learners new to CI.

“The Midwest Big Data Innovation Hub is pleased to be a partner on the I-GUIDE project,” said MBDH Executive Director John MacMullen. “This is a diverse and talented team that will have important impacts on key areas of focus for the MBDH, including water data, CI workforce development, and data-enabled resilient communities.”

“MDBH is a great example of how our I-GUIDE Partners are organizations and institutions that share common goals and objectives,” said George Percival, co-lead of I-GUIDE’s Engagement and Partnership Team. “The I-GUIDE Partnership Program provides the pathway for Partners to contribute to and gain from the I-GUIDE activities based on mutually beneficial agreements. As the MBDH objective “to build and cultivate communities around data” is highly aligned with I-GUIDE, it is anticipated that the MBDH and I-GUIDE partnership will benefit both activities.”

If you’re interested in getting involved with I-GUIDE, please take a look at their News & Events page. The site often lists such events as webinars and symposiums. The I-GUIDE team held its first All-Hands Meeting in May 2022.

Get Involved

Activities to build the community of Midwest researchers and practitioners in the Smart & Resilient Communities priority area of the Midwest Big Data Innovation Hub are continuing throughout 2022. Contact the Hub if you’re interested in participating, or are aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Exploring Nature Through Imageomics with Professor Tanya Berger-Wolf

By Erica Joo and Qining Wang

We recently spoke with Professor Tanya Berger-Wolf, a pioneer in the area of imageomics who is leading a team to start a new field of imageomics. She is a computational ecologist who is director and co-founder of the nonprofit organization “Wild Me.” Berger-Wolf is also the Director of the Translational Data Analytics Institute (TDAI) and a Professor of Computer Science Engineering, Electrical and Computer Engineering, as well as Evolution, Ecology, and Organismal Biology, at The Ohio State University.

Tanya Berger-Wolf

Observation is fundamental to any biological research. The development of optics technology, such as the inventions of the microscope and the telescope, allowed biologists to observe the world at different scales, from animals living in jungles of millions of acres to DNA in animal cells of several micrometers.

However, as Prof. Berger-Wolf pointed out, those inventions only serve to “augment our ability to look” or “look at more things more carefully.” We are still making observations and searching for patterns with our own eyes, from which arises the caveat: We are not so good at finding patterns when things appear to be random, or when patterns are rare, sparse, subtle, or complex. We can’t answer, for example, whether the stripe patterns of mother zebras are similar to their babies’. The patterns appear to be too similar and too random at the same time to our eyes because human brains did not evolve to “take [the stripe patterns] holistically and quantify them in any meaningful way.”

And that’s where imageomics comes in. Imageomics is following genomics, a field where researchers understand the biology of an organism or a species through their genetic information. In a similar vein, imageomics aims to understand nature through biological information extracted from images.

Computers are the perfect information extractors, because they “perceive” the world differently. Computers can quantify images down to pixels and find patterns that humans do not, or cannot, comprehend. Berger-Wolf pointed out that imageomics, as a “whole new field of science,” allows scientists to answer biological questions that weren’t answerable before because it provides scientists with a new way of observing nature.

The complementary vision of computers is especially prominent in the studies of biological traits, according to Berger-Wolf. Biological traits are the interplay between genes and the environment. They can be physical characteristics such as “beak colors, stripe patterns, fin curvatures, the curves of the belly or the back.” They can also be behavioral characteristics such as possums playing dead or pollen feeding in birds. Being able to observe traits “is the foundation of our understanding of how these traits are inherited and the understanding of genetics,” insights into animal behavior, and ecological and evolutionary theories.

In order for biologists to propose new evolutionary hypotheses to explain biological traits, it is crucial to “make these traits computable.” Starting from a project funded by the National Science Foundation, Berger-Wolf founded Wild Me. This nonprofit organization has an ongoing initiative, Wildbook, that collects images containing animals from numerous sources, including camera traps, drones, and even tourists’ social media posts on YouTube, Instagram, and Flickr.

Those source images serve as a starting point for a branch of research in imageomics, which will allow researchers to develop open software and artificial intelligence for the research community. Those tools would allow biologists to discern biological traits that are too similar or too subtle to their eyes, such as animal coat patterns or species that look alike yet are genomically different. Computer vision would allow scientists to find out whether traits are inheritable or shared by multiple species. Based on those new insights, biologists could then conjure new evolutionary hypotheses and start asking even more interesting questions, to which only imageomics can provide the answers.

Berger-Wolf jokes that she has “multiple research personality,” with a passion for bringing her diverse backgrounds together. By helping to found the new Imageomics Institute, her interests were able to converge. Participating in both worlds—natural and technical—allows her to see “the better way” of working and increasing effectiveness.

She commented that starting conversations between fields increases “mutual respect and understanding of each other’s questions and where we can come together.” Berger-Wolf sums up her career by describing her work as “creating tools that expand our ability to look at more things more carefully and even be able to ask questions that people have never been able to ask before.”

Berger-Wolf is currently working on several projects. One looks at animal coat patterns and correlates them with genetics, heritability, and the overall scientific structure of why some traits are inheritable and others are not. By using imageomics, we are able to understand at a deeper level since humans cannot pay attention to every detail. In another project, she is working on species-level traits of butterflies that mimic other species. Computer algorithms can identify what is similar and different in their appearances, down to the small details. Computers can extract complex information and people can start asking different questions using information normally beyond the scope of human perception.

Berger-Wolf’s recent award for the new Imageomics Institute under the NSF Harnessing the Data Revolution program is extending this work and bringing it to a wider audience. The images to be used as sources come from existing research projects, citizen scientists, organizations like iNaturalist, eBird, and Wild Me, as well as the digitization of the natural history museum collections through the iDigBio project.

There are various opportunities for students at any level and researchers from all over the world to participate in the field of imageomics. Berger-Wolf emphasized that the goal is to have people understand what imageomics is and how it’s significant so that it can be accessible to all.

“It’s not just an opportunity to advance science, but also to engage people in science,” she explains. Her team is built up of multiple researchers and students, sharing a goal of building a community around it. More direct community engagement, outreach events, and conferences are great ways for informing people about imageomics and how people can change the way traits are seen.

“We have incredible privilege to do science. To spend time answering scientific questions that are interesting to us while the public is paying us to do so. It’s important to tell the science to the public, communicate why, and what science brings to the world.”

Get Involved

New community-building activities facilitated by the Midwest Big Data Innovation Hub are continuing throughout 2022. Contact the Hub if you’re interested in participating, or are aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Accelerating Data-Driven Materials Discovery at the Molecule Maker Lab Institute

By Qining Wang

Cancer scientist loading tubes into a lab machine. Photo by the National Cancer Institute.
Photo by the National Cancer Institute via Unsplash

Despite being a fundamental process for innovations in chemistry, biology, pharmaceuticals, materials science, etc., molecular discovery can be a time-consuming and labor-intensive endeavor. The traditional trial-and-error approach through experimentation does not always yield promising results. According to a Chemical Abstract Service (CAS) Registry analysis, scientists predict the number of stable light- and moderate-weight organic molecules to be more than 10180. Among those, only 1020 to 1060 are biologically relevant. That’s a lot of molecules, to say the least, let alone discovering the ones that we can use. In the meantime, hundreds of years of research hunting for molecules has yielded an array of successes and failures that we can harvest for data-driven molecule discovery.

To that end, the Molecule Maker Lab Institute (MMLI) and many other AI Institutes funded by the National Science Foundation (NSF) (highlighted in the map below) decided to take this data-driven approach to find the needles in haystacks of molecules quickly and accurately.

Map of NSF-funded AI institutes across the United States.
NSF-funded AI Institutes across the United States

MMLI is a partnership between the University of Illinois at Urbana-Champaign, Pennsylvania State University, and Rochester Institute of Technology. The institute fosters extensive collaborations among artificial intelligence (AI) and chemical and biological syntheses. Those collaborations serve to develop frontier AI tools and dynamic open-access databases. Current research at MMLI involves both small molecule discoveries and manufacturing.

For molecule discoveries, the Institute is currently focusing on improving the performance of organic solar cells. Compared to silicon-based solar cells, the state-of-the-art materials for solar energy harvesting, organic solar cells, are more flexible. They can also be manufactured at large scales at relatively low prices.

However, certain caveats prevent organic solar cells from replacing silicon-based solar cells. Unlike silicon, organic molecules are less efficient at converting solar power into other forms of energy like electricity. Those molecules cannot endure sunlight irradiation for a long time. (Think of pigments on your outdoor furniture that gradually fade away under sunlight. That is sunlight irradiation degrading organic molecules on display.)

To overcome these challenges, MMLI is currently developing AI-enabled tools such as AlphaSynthesis to accelerate the discovery of long-lasting and more efficient organic molecules for sunlight harvesting. Guided by machine-learning models, the team led by Martin Burke is able to screen through potential candidates at high throughput. “The team has an ambitious ‘10-10’ target to create organic photovoltaics with a greater than 10% efficiency and a 10-year lifetime,” said Celine Young, Managing Director of MMLI. “Led by a team of experts in AI, automated chemical synthesis, and automated additive manufacturing, the MMLI is employing a closed design-build-test-learn loop to work towards this goal.”

In terms of chemical manufacturing, MMLI primarily focuses on catalyst discovery. Catalysts are a crucial component for efficient chemical production, as they lower the energy barriers of chemical reactions. A catalyst is a local guide who can always tell you the fastest route to a specific destination. Without an efficient catalyst, commercializing any chemicals beyond lab-scale syntheses would be a great challenge.

To find the best catalysts for certain chemical transformations, MMLI developed new AI algorithms to find catalysts that can assist in making the desired molecules. Currently, the team led by Scott Denmark is using AI-enabled tools in hard-to-find catalysts for carbon-hydrogen (C-H) bond oxidation reactions. These reactions can change the properties of a molecule. In C-H bond oxidation reactions, a catalyst breaks the C-H bonds in the molecule and facilitates the formation of new chemical bonds like carbon-oxygen (C-O) bonds. Those reactions are crucial in drug synthesis and converting feedstock chemicals into higher-value chemicals.

MMLI not only stands at the forefront of innovations in AI-based molecule syntheses, but the Institute also realizes the barriers entering the field of molecule synthesis and manufacturing. Broadly speaking, the field is only accessible to a handful of experienced specialists with years of training. To break down such barriers, MMLI created Thrust 5, which aims to train junior scientists, engineers, educators, and practitioners on advanced chemical synthesis and AI skills. They deliver “MMLI in a Box” to classrooms in the USA and launch the Molecule Maker Digital Learning Platform to expose K–12 students to molecule making early on in their education.

Get Involved

MMLI is currently seeking applicants for their MMLI Seed Grant Program. Find out more about this opportunity and submit your grant proposal here by April 30, 2022. The Institute is also seeking industry partners that foster knowledge sharing between the MMLI and industry researchers.

The Midwest Big Data Innovation Hub will be doing a community data needs assessment in the advanced materials space later this year to understand key challenges around materials data management. Contact us if you’re interested in participating, or if you’re aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Student Group Profile: DREAM at Minnesota State University, Mankato

The Midwest Big Data Innovation Hub is developing a community of data science student groups across the Midwest region to share their experiences and best practices. This story is part of a series of student group profiles.

For this profile, we spoke with leaders from DREAM, Data Resources for Eager & Analytical Minds, a recognized student organization at Minnesota State University, Mankato. It has over 300 student members who focus on data science, data analytics, machine learning, artificial intelligence, information technology, and computer science. DREAM organizes and hosts conferences, trainings, competitions, and industry talks to support the students’ academic and professional development. The DREAM members have won many awards at various data science competitions and have authored dozens of research papers and conference presentations. DREAM is a past recipient of the Outstanding RSO of the Year award.

Minnesota State University, Mankato DREAM logo

What are the goals of your group, and who is your core audience?
DREAM was founded in 2016 when one dedicated data science professor at Minnesota State University, Mankato (MNSU), Dr. Rajeev Bukralia—the esteemed faculty advisor of DREAM—excited the students of the potential of and career opportunities in data science. Since the start, DREAM’s goal has been to explore, raise interest in, and share the wonders of data science and related fields. Our mission is to help students venture into the more interesting aspects of data science and corresponding fields, and in the process, connect students to industry mentors and professionals. We want to support anyone from any background who has interest in data analytics, data science, or computer science. Our core audience is varied because data itself is varied and can come from any field. Our audience is anyone who wants to understand that data on a deeper level, be they business majors, biology students, or just about anything else; we welcome anyone from any background who wants to participate!

What kinds of activities have you done previously, and what do you have planned for this year?
COVID has changed the format of our group considerably, but we still have regular industry talks and we act as a center for communicating events and opportunities to students interested in data science. Recently, we have had multiple industry leaders speak on their experiences working in the industry. They shared their experiences and tips to help set students up for success. So far this semester, we have hosted four industry talks with professionals from big companies such as UnitedHealth, One Drop, and Ovative. The larger projects we have planned for this semester focus around supporting students through the 2022 Data Derby Hackathon, setting up the spring election, and creating fun, themed training sessions for students to dip their toes into key tools for data science, such as Python and Power BI. We also hope to involve the members of our club in a student research showcase this spring in collaboration with MinneAnalytics.

As DREAM grows, we hope to expand our reach into the community. Through school or library programs, we hope to spark an interest in data science in kids grades 6 through 12. Programs like this would not only have to be volunteer-run, but also volunteer-created. So, after completing a few training sessions at the university, we hope to create an introductory data science curriculum that is interesting enough to captivate young students, but also approachable enough for young students.

What challenges have you faced in starting or maintaining your group?
The pandemic, of course, has been a large shift for a group like ours, which has over 300 students, dozens of which would be packed into a room eating pizza together on any given Thursday night pre-COVID. Since then, we have had to switch to Zoom for our meetings, although we’re trying to get back in person soon. There are also the general challenges of collaborating with university administration to secure and maintain the backend functions of the club and making sure to bring in a constant stream of new students to sustain the club.

What suggestions do you have for others who want to start a group on their campus, or expand their current group?
Reach out and promote your group through classes on your campus that are relevant—for example, we promote DREAM in the introductory data science courses and the database management courses.

Run events regularly—consistency will help build up more engagement, both from members of the group that are excited to participate more, or from members of the student body that just decide to pop into one meeting because they see it happening every week.

Keep a careful eye on your roster. Make sure you always have a copy backed up. Also, keep it organized so you can keep track of current students, alumni, etc. Your email roster is your direct point of contact with your group, so be sure to communicate with them regularly and to always maintain the current contact details.

Stay true to the mission. Be active and involved in community events. Try different methods to promote your group’s spirit and resources, such as Twitter and LinkedIn, etc.

Get involved

You can find the DREAM club on Twitter and their website.

Are you a student group leader or advisor? We’d like to hear more about your group’s activities. Contact us if you’d like us to profile your organization or participate in our student groups webinar series. You can also join our new Slack community to continue the discussion and make new connections.

About the Midwest Big Data Innovation Hub

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

University of Nebraska researchers extend smart rural bridge health initiatives

By Raleigh Butler

Did you know that, despite increases in technology, bridge health across the United States is decreasing? Bridges currently score a C on the country’s infrastructure report card, which is a fall from last year’s grade.

Within the Midwest, the percentage of structurally deficient bridges per state include the following:

  • • Iowa has the largest percentage, 19.0%.
  • • Minnesota has the smallest percentage, 4.7%.

The Midwest Big Data Innovation Hub’s Smart & Resilient Communities priority area spans a range of disciplines, sectors, data, and cyberinfrastructure in its work to connect researchers and practitioners focused on community resilience. Bridges play key roles in community planning, resilient supply chains for food and goods, and in transportation capacity management.


In 2018, a new regional innovation center project, “Smart Big Data Pipeline for Aging Rural Bridge Transportation Infrastructure (SMARTI),” was funded by a $1 million National Science Foundation (NSF) grant. The grant was aimed toward “rural bridge health management” and included faculty from both the University of Nebraska–Lincoln (UNL) and University of Nebraska Omaha (UNO). The work began with a planning grant in 2016, and both awards were part of the NSF’s Big Data Spoke program, in collaboration with the regional Big Data Innovation Hub program.

The principal investigator for the project, Robin Gandhi, is from UNO’s College of Information Science and Technology. The 16 research team members also include Daniel Linzell and Chungwook Sim, both from UNL’s College of Engineering.

The SMARTI project focused on “mining existing data sets from private, state and federal partners, as well as collect[ing] new data through sensors on targeted rural bridges throughout Nebraska.” The outputs of this work were presented through workshops and made available to researchers through the Bridging Big Data website.

“Our government and industry partners can better manage their aging rural bridges, improve their health and ultimately keep people safe using data and tools developed from our research,” said Robin Gandhi. “We continue to engage stakeholders through companion research projects and by presenting our work at relevant technical meetings and conferences. For example, we will be presenting at the Midwest Bridge Preservation Partnership, the American Society of Civil Engineers Structures Congress in April, and the International Association for Bridge Management and Safety Conference in July 2022.”

Student engagement

Six students from both the Lincoln and Omaha campuses who are working on these projects presented their research in October 2021 at the Midwest Big Data Innovation Hub’s Regional Community Meeting, with a focus on the data sets and data science tools that are important to this work. Recordings of their presentations are available on the MBDH YouTube channel.

Next steps

Approximately three years after the start of the SMARTI project, the Nebraska team was awarded $5 million by the Department of Defense Army Corps of Engineers for research to extend the lifespan of bridges through new monitoring technology. This award was announced in October 2021.

The researchers will continue with their work on bridge safety. The team will use rural Nebraska as testbeds for locations to safely collect data, as well as to analyze “socio-technical impacts such as fairness of data, algorithms, and analysis; and intelligent decision-making and support systems.”

“This project brings bridge owners, designers, and builders, big data solution providers, and academics together to discuss data-informed bridge infrastructure health and resilience in times of crisis,” said Daniel Linzell. “Attendees at our last workshop heard from several stakeholders about the pandemic’s impact on bridge infrastructure resilience from design, sensing, economic, and socio-political perspectives. Discussions such as these keep the research team focused on the importance of the work: developing sensing and big data technology applications that support smart, resilient, big data pipelines for aging rural bridge transportation infrastructure; highlighting solutions to data discovery and controlled sharing challenges; and unveiling novel data-driven decision-making tools.”

Get involved

New activities to build the community of Midwest researchers and practitioners in the Smart & Resilient Communities priority area of the Midwest Big Data Innovation Hub are beginning in spring 2022. Contact the Midwest Big Data Innovation Hub if you’re interested in participating, or aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Water Data Forum Webinar Series

Header for the Water Data Forum web series presented by the Cleveland Water Alliance, Water Environment Federation, and Midwest Big Data Innovation Hub.

Water Data Forum, the virtual series presented by the Cleveland Water Alliance, Water Environment Federation, and Midwest Big Data Innovation Hub, is returning for a second season in 2022!
In 2021, the Forum assembled expert panels to engage in timely topics such as new sensor and control technologies as well as water data for environmental justice and climate resilience. This year, interactive web sessions will engage a diverse array of experts across sectors in an exploration of topics ranging from the intersection of cyber security and water to STEM and youth empowerment.

2022 Sessions

The new season will kick off this March with a session titled: Innovations in Water Quality: The Real-Time Revolution on March 30 at 12 p.m. ET. This session will convene industry, government, and research experts to explore the next generation of water quality sensing technologies. In a facilitated discussion, panelists will use specific case studies to examine the challenges posed by new, or more recently understood, sources of water pollution and the opportunities surrounding real-time networks and new sensing modalities.

May Session: Cyber and Water: Driving Digital Security across the Water Sector
July Session: Smart Stormwater: Data-Driven Response to Flooding, Erosion and other Natural Hazards
September Session: Water Education: STEM, Youth Empowerment and Workforce Development
November Session: Smart Water Equity: Data-Enabled Affordability, Justice and Sovereignty

Robust, accurate data are crucial for the future of water resource management, economic and workforce development, and technological advancement. Water Data Forum aims to demystify the complexities of water data for seasoned experts as well as the general public. For more information and updates around speakers and registration, visit

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas, which include Water Quality. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Student Group Profile: Girls Who Code, University of Michigan DCMB

The Midwest Big Data Innovation Hub is developing a community of data science student groups across the Midwest region to share their experiences and best practices. This story is part of a series of student group profiles.

University of Michigan Girls Who Code logo

In light of Women’s History Month and International Women’s Day on March 8th, we talked with the leaders of Girls Who Code club at the University of Michigan about their work on empowering young girls to participate in coding projects and the STEM field by and large.

What are the goals of your group, and who is your core audience?
We are an organization founded by doctoral students from the Department of Computational Medicine and Bioinformatics at the University of Michigan. Our goal is to provide a collaborative and supportive environment for students of all skill levels and backgrounds interested in learning to code. Our club curriculum focuses on computational data analysis and the Python programming language. Participants learn fundamental coding concepts and implement their new skills in their chosen data science capstone project. Our core audience includes girls, women, and allies who support our mission of closing the gender gap in technology.

What kinds of activities have you done previously, and what do you have planned for this year?
Our Girls Who Code club meets weekly from September through May. During the summer, we offer a two-week intensive Summer Experience (SE) program. During club and SE, students participate in live coding lectures, work through paired programming exercises, hear from guest speakers, and complete a data science capstone project. We have also facilitated field trips to the Ann Arbor Google office and connect students to faculty at the University of Michigan for long-term research experiences. Along the way, we have partnered with other STEM outreach organizations at the University of Michigan. For instance, this year, we will collaborate with FEMMES (Women+ Excelling More in Math, Engineering, and the Sciences) and DFB (Developing Future Biologists) to provide hands-on programming activities.

University of Michigan Girls Who Code group photo

What challenges have you faced in starting or maintaining your group?
A primary challenge we faced in starting the club and SE programs was the lack of live-coded Python for data science curriculum for our target age group (high school). However, given the expertise of our student facilitators, we were able to develop a custom curriculum teaching Python fundamentals and data science skills, including statistical analysis, from scratch. We rely entirely on hard-working undergraduate, graduate, and postdoctoral volunteers, and recruiting volunteers who can dedicate time to this extracurricular activity is often difficult. To help address this challenge, we have started paying our SE instructors. The pandemic created a massive shift in how we delivered our programming, and we had to shift the club to a virtual format within a week. We have continued virtual instruction, and despite its challenges, we have been able to expand our reach.

University of Michigan Girls Who Code Zoom screenshot 1
University of Michigan Girls Who Code Zoom screenshot 2

What suggestions do you have for others who want to start a group on their campus, or expand their current group?
Find ways to collaborate with existing organizations so that you can build on their previous work instead of reinventing the wheel. Identify and understand the needs of the communities that you’re interested in working with to ensure that your programming aligns with your target audience. It’s also a good idea to consider your organization’s longevity and plan at the onset for the transfer of leadership responsibilities after the original leadership moves on. Creating documents that allow for knowledge transfer and working with faculty that can provide continuity are two such ways to address this.

Get involved

You can find the Girls Who Code club on Twitter, Facebook, and their website. The club has also compiled resources on coding, online teaching, and fostering diversity, equity, and inclusion on their GitHub page.

Are you a student group leader or advisor? We’d like to hear more about your group’s activities. Contact us if you’d like us to profile your organization or participate in our student groups webinar series. You can also join our new Slack community to continue the discussion and make new connections.

About the Midwest Big Data Innovation Hub

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Student Group Profile: Iowa State University Data Science Club

The Midwest Big Data Innovation Hub is developing a community of data science student groups across the Midwest region to share their experiences and best practices. This story is part of a series of student group profiles.

For this profile, we talked with leaders of the Iowa State University Data Science Club.

Iowa State University Data Science Club logo

What are the goals of your group, and who is your core audience?
Our main goal is to promote the field of Data Science, whether it be information on the field, internship opportunities, school resources, or skills you need to learn to get a job in the field.

Our main audience is data science majors and any other adjacent majors with some prior coding experience. And anyone, in general, that would be interested in this type of career.

What kinds of activities have you done previously, and what do you have planned for this year?
We have focused a lot on company presentations and internship opportunities in the field. We have now been focusing on workshops surrounding data science essentials, like Google Cloud, Machine Learning, or Tableau basics.

What challenges have you faced in starting or maintaining your group?
One of the main challenges has been keeping people engaged. Workshops aren’t super fun but essential to learning about the field. Company presentations are nice but don’t appeal strongly to freshmen and sophomores. We have been working on making the club more of a community. Having members help each other with homework, talk about outside activities, have fun events occasionally that don’t relate to data science, but just make a place for collaboration and talk to others about their love for the field.

What suggestions do you have for others who want to start a group on their campus, or expand their current group?
Start big, expect small. In the beginning, focus on appealing to as many as possible. Do as many things as you can to interest people. But always have a foundation for your goal as a group, stay centered, stay consistent. You may have a ton of people at the first meeting and very few at the next, but the key is to stay consistent and think big picture.

In terms of expansion, bring outside help, see if your school can help, collaborate with outside companies. Put yourself in a position where your group will not just be a fun place to hang out but a place that could benefit your resume and help bring you to experience for future internship opportunities.

Get involved

Are you a student group leader or advisor? We’d like to hear more about your group’s activities. Contact us if you’d like us to profile your organization or participate in our student groups webinar series. You can also join our new Slack community to continue the discussion and make new connections.

About the Midwest Big Data Innovation Hub

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

MBDH Learning Innovation Fellows Program Builds on Success with Second Cohort

The Midwest Big Data Innovation Hub and the Gala Sustainability Learning Initiative at the University of Michigan School for Environment and Sustainability continue to build on the success of last year’s Learning Innovation Fellows pilot program with a second cohort of fellows. The student fellows, hailing from a range of midwestern institutions, work with faculty advisors at the intersections of the Midwest Hub’s “Cyberinfrastructure and Data Sharing” and “Data Science Education and Workforce Development” themes. The program brings together data science and sustainability, delivering open-access, data-enriched learning tools on the Gala platform, along with experiences and mentoring for student fellows.


Alternative Transportation Scenarios
Shanshan (Shirley) Liu

Shanshan (Shirley) Liu (Student Fellow) is a PhD student from the Department of Civil and Environmental Engineering at the University of Illinois at Urbana-Champaign. Her research interests include transportation electrification policy and planning, sustainable transportation systems, and transportation energy. Shirley’s project is based around Shelie Miller’s case study, Assembling Our Transportation Future, which asks readers to think about transportation policy hinge points in American history. She is using Python to create tools that allow students to analyze scenarios of alternative vehicle adoption and evaluate them from the perspective of energy consumption and carbon emissions.

Shelie Miller

Shelie Miller (Faculty Advisor) is a professor at the University of Michigan School for Environment and Sustainability. Her research uses life-cycle assessment and scenario modeling to identify environmental problems before they occur. Miller’s research group works on a variety of energy-related topics, including the energy-water nexus, bioenergy, refrigeration in the food system, and autonomous vehicles.

Modeling Rainforest Carbon Cycling
Anneke van Oosterom

Anneke van Oosterom (Student Fellow) is a sophomore double majoring in biology and data science at St. Catherine University. She is currently involved with the biology department at St. Kate’s through the Biology Club and as a microbiology lab prep assistant. Through the fellowship she is creating a systems model using the Insight Maker modeling tool to demonstrate carbon cycling in tropical rainforests for Ann Russell’s forthcoming case Healing the Scars: Tropical Rainforest Carbon Cycling (developed through the OCELOTS network for tropical ecology).

Ann Russell

Ann Russell (Faculty Advisor) is a terrestrial ecosystems ecologist at Iowa State University, with special expertise in the biogeochemistry of tropical and managed ecosystems. Her research addresses links between traits of plant species and ecosystem processes, focusing on species and management effects on belowground processes, and subsequent implications for human impacts on soil fertility and carbon sequestration. Her research is designed to enhance our understanding of human impacts on the biosphere, improve biogeochemical models, and help guide selection of species for sustainable management of agroecosystems.

Scenario Planning for the Rouge River
Julie Arbit

Julie Arbit (Student Fellow) is in her final semester as an environmental policy and planning student within the School for Environment and Sustainability at the University of Michigan (UM). She works as a research associate for the Center for Social Solutions at UM, where her main project focuses on equity in flood risk, response, and recovery. Julie is using ArcGis Online and Python to create scenario planning tools for the case The Rouge River: Redlining, Riverbanks, and Restoration in Metro Detroit.

Perrin Selcer

Perrin Selcer (Faculty Advisor) is an associate professor and director of undergraduate studies at the University of Michigan Department of History. He works at the intersection of environmental history, history of science, and international relations.

Accessible Data Science Tools for Water Utilities
Thien Nguyen

Thien Nguyen (Student Fellow) is a second-year computer science undergraduate and sustainability enthusiast at the University of Minnesota, Twin Cities (UMN). He has previously worked with UMN’s Institute on the Environment, writing geospatial analysis algorithms in Google Earth Engine to observe soil degradation in Senegal’s Peanut Basin. Thien is working with PhD student Matt Vedrin to create tools for a PIT-UN funded collaboration working to help classrooms, communities, and workforces confront challenges in the monitoring and improvement of drinking water distribution systems.

Lutgarde Raskin

Lutgarde Raskin (Faculty Advisor) is a professor at the University of Michigan School for Civil & Environmental Engineering. She works to rethink engineered systems to better harness the power of microorganisms to treat water and recover resources from waste streams. Dr. Raskin and her team work to understand and improve various aspects of the engineered water cycle microbiome to improve human health using sustainable design approaches, with a focus on biofiltration, disinfection, distribution, and building plumbing biostability.

Get involved

This work was supported by the National Science Foundation through the MBDH Community Development and Engagement (CDE) Program.

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

In Memoriam: Val Pentchev

Val Pentchev portrait

The Midwest Big Data Innovation Hub team is saddened to announce the passing of our longtime colleague Valentin (Val) Pentchev on December 31, 2021.

Val was most recently the PI on the MBDH partner award to Indiana University, which leads the Smart & Resilient Communities priority area. Val had a long, valued history with the MBDH, beginning in the first phase of the project when NSF initially funded the national network of four Regional Big Data Innovation Hubs.

After participating in the community in the early days of the Hub, Val was elected to the MBDH Steering Committee for the 2018–2020 term.

Val was especially generous with his time, and was committed to the success of the Hub. In addition to regular participation at Steering Committee meetings, he was always willing to join new activities to help the MBDH to grow and mature. We partnered to develop a session at Indy Big Data that was aimed across Industry, Government, and Academia. Val’s leadership and engagement with the organizers got us into the program where we delivered a comprehensive and well-received presentation. His kindness and collaboration will be greatly missed.” —Melissa Cragin, MBDH Executive Director in phase 1 of the Hub
Val Pentchev leading 2019 MBDH All Hands Meeting panel

A regular presence at the annual MBDH All-Hands Meetings, Val often served as a reviewer of student research poster submissions.

At our 2019 All-Hands Meeting in Chicago, the last in-person event sponsored by the Hub prior to the pandemic, Val co-organized and moderated one of the spotlight panels, “The ‘Smart’ Challenge: Delivering on Data-Enabled Decision-Making for Governments and Communities,” with panelists Amy Glasscock, Meera Raja, Ruby Mendenhall, and Charlie Catlett. At that meeting, Val also led a related breakout discussion with other interested participants.

2019 MBDH All Hands Meeting, with Alice Delage
Val was always an energetic and friendly presence at our MBDH meetings, and just simply a wonderful person to be around. He was faithfully involved with the Hub since the very beginning and contributed to this community in countless precious ways over the years. His loss is not only an absolute tragedy for all the many important projects he worked on, but also for all the people who worked beside him and loved him.“ —Alice Delage, Program Manager and Community Liaison for the MBDH in phase 1

In 2019, NSF awarded the BD Hubs an additional four years of support to continue regional and national data science community development. During this second phase, the MBDH continued to grow its work on the Smart & Resilient Communities theme. Val became a co-PI on the Indiana University team, and later became PI in 2021.

Val served on the Hub-wide leadership team throughout 2021, and contributed to our discussions about strategy, partnerships, and long-term sustainability.

I worked with Val from 2015 to 2021. Val was a wonderful human being. A positive coworker with contagious enthusiasm and energy that directly influenced me and others at the time at Indiana University. I have fond memories of Val and I will take the time to remember what Val has taught me over the years, primarily: passion for work and new projects and compassion for coworkers and human beings.” —Franco Pestilli, past PI of the Indiana University MBDH award

At Indiana, Val also led the Collaborative Archive & Data Research Environment (CADRE) project, of which the MBDH is a partner, and helped bring members of the academic library and research data management communities to the Hub.

2019 MBDH All Hands Meeting
“Val was a tremendous colleague. His positive attitude, passion, and commitment to his work made him stand out. He had a way of seeing the big picture and his enthusiasm was contagious. He was a remarkable human being and it was a privilege to know him.” —Lourdes Gonzalez, MBDH Site Coordinator at Indiana University

Val represented the Hub at the Midwest AI Day in-person cross-sector conference in Indiana in August 2021, bringing the MBDH story to attendees from industry, government, and academia.

In October 2021, Val co-organized and participated on a panel discussion at the online MBDH Regional Community Meeting, with a focus on community building across the Smart & Resilient Communities and Data Science for Social Good spaces, with panelists Kimberly Zarecor (Iowa State), Tayo Fabusui (University of Michigan), and moderator Anita Say Chan (UIUC). In 2022, we had planned to continue this work with Val co-leading and helping to establish new partnerships in the region.

“The MBDH will continue to build on the legacy of work that Val helped create,” said John MacMullen, MBDH Executive Director. “His goal with the Hub was to broaden the impact of data science in addressing societal challenges. Due to his dedicated engagement, we are ready to accelerate our data needs assessment and community development efforts in the Data Science for Social Good and Smart & Resilient Communities spaces across the region in 2022.”

Professor Kimberly Zarecor on Community-Based Research and Building Interdisciplinary Research Teams

By Qining Wang

An expert in Eastern European Architecture, Professor Kimberly Zarecor tells us about her journey of building a highly interdisciplinary research team that takes data science into research on rural communities in Iowa.

Kimberly Zarecor

To some, architectural history and data science research may sound like oil and water—two fields that are almost impossible to mix well. However, Kimberly Zarecor, professor of Architecture at Iowa State University (ISU), leads her research team to create the perfect emulsion of many seemingly unrelated fields: sociology, statistics, industrial design, data science, architecture, and beyond.

With a research focus on small and shrinking communities in rural Iowa, not only does the team uncover the community efforts that keep some of these towns thriving, but the team is also offering the broader research community a valuable lesson on how to bring a wide range of expertise to projects and how experts from different fields can work together in harmony.

Zarecor found her inspiration to study Iowa’s shrinking towns from Ostrava, in the Czech Republic, a city she studied during her PhD research and later lived in for a semester as a Fulbright scholar. “[Ostrava] was part of a study in Europe called the Shrink Smart project, where [researchers] were looking at Ostrava as a shrinking post-industrial European city and questioned how to manage the governance of a relatively large city in the context of population loss.” As Zarecor shifted her primary research focus from architectural history in Eastern European cities to rural population loss in the Midwest, she realized the concept of shrinking smart could also be applied.

Zarecor and her collaborators started exploring the data-science component of shrinking smart with funding from a Smart & Connected Communities planning grant from the National Science Foundation (NSF) in 2017. Researchers at Iowa State University have been collecting data about the quality of life in small Iowa towns through the Iowa Small Town Poll since 1994, but “nobody had ever brought a data-science mindset to the analysis of [this] data.” The sociologists who had been collecting the data did not “think of [the poll] as a large set” and had not thought to build “a predictive model” from it.

Zarecor invited a computer scientist to be part of the planning grant team to transform the Small Town Poll data into training data, from which they could construct models to understand and predict the factors that influence people’s perceptions of quality of life in small rural communities. “We realized that what we were trying to understand is what are the actions that people in communities take as inputs into a system that results as outputs on the other side, as increases in perceptions of quality of life,” Zarecor explained. The planning grant team, consisting of a computer scientist, a sociologist, a community and regional planner, and two architects, found that “the best way to define [rural smart shrinkage] is that you are actively pursuing specific activities that you as a community can do together” that contribute to improved perceptions of quality of life even as population loss continues.

In 2020, Zarecor received another NSF grant of $1.5 million to continue this research and investigate strategies to address the data deficit in shrinking rural communities.

As the scope of the research expanded, so has Zarecor’s team. In addition to Zarecor and rural sociologist David Peters, who was also a Co-PI on the planning grant, the team now includes a community economic development specialist and a community arts specialist from ISU Extension and Outreach (both are also faculty in the College of Design at Iowa State), an industrial design faculty member, masters students from industrial design and community and regional planning, and for the data science work, three statistics faculty and three statistics PhD students. The Iowa League of Cities is also a partner on the project.

Coming to data science with little technical understanding, Zarecor approaches the data science component more from an intuitive rather than conceptual perspective: “It’s not that I understand the statistics, but I understand [the goals] as we go step by step . . . [and] the power of the tools that [the statisticians] are building.”

To lead such a highly interdisciplinary team, Zarecor thinks of herself as a bridge-builder within the team. Zarecor helps the members of her team understand data science by asking questions in a way that they can elicit responses that deepen the understandings of the nontechnical team members. “I like having that [bridge] function because it’s asking questions as a way of learning. For me, just the conversations with the data scientists helped me to better understand the data science part of our project.”

And the bridge function goes both ways. In addition to helping non-data-science experts learn more about the potential of data science, Zarecor also cultivates data scientists’ ability to contribute to projects that are community-based. “When it comes to community-based work, the assumption that this is not an expertise of its own is something that’s a challenge for the field, because doing work in communities is its own expertise,” Zarecor explained. Even though the residents in rural Iowa are the direct beneficiaries of the work from Zarecor’s team, the knowledge gap with respect to finding and using data makes those benefits inaccessible to some residents. Meanwhile, data scientists often lack the skills to convey their findings to an audience outside their academic circle. “As a field, data science, in my opinion, has not done a good job to educate necessarily well-rounded [data scientists].”

To overcome this bottleneck, Zarecor’s team works on creating dashboards that visualize the data and make the data more interpretable to the rural communities. Zarecor also encourages the statisticians on her team to talk to residents of the communities they study and ask what kind of data they would like to have. “When we ask what they want, it’s not because they know everything that’s available. We’re doing a mix of hearing from them what they want, and also guessing some things that they probably don’t know are out there that we can also give them in a usable form.”

Zarecor believes that similar types of highly collaborative and interdisciplinary research would benefit the entire research community, and those collaborations start with abandoning assumptions of different fields.

She gives an example in the discipline of architecture, where architects would assume themselves to be capable of doing graphic design or planning. Many don’t realize that those tasks are outside of their expertise even though these fields are seemingly adjacent. “And I would transfer that over to data scientists who know that data science is a synthetic and integrative discipline. [. . .] It doesn’t mean, though, that there are not all of these soft skills, all of this other communication, and people-related aspects of the data science work that you can handle without help.”

Therefore, Zarecor suggests that data scientists should work in conjunction with domain experts to make their research more relatable to a broader audience. Team members also need to respect the importance and specificity of other kinds of expertise beyond the technical or data-driven parts of a project. When a team successfully works this way, “the data science gets improved and amplified and becomes more useful. If you actually think horizontally on the project, you know that there’s not a pyramid, but that you are a team that’s working across the group [of collaborators]. This would be a much healthier way of [working with] data and for data scientists to interact with people.”

In this regard, Zarecor noted that the Midwest Big Data Innovation Hub, as a highly integrated and inclusive organization, has the potential to cultivate different layers of collaboration across various disciplines. “But it does require the data scientists who were the first audience, or the more explicit audience [for the Hub], to be willing to open up.”

Get Involved

New community-building activities in the Smart & Resilient Communities priority area of the Midwest Big Data Innovation Hub are beginning in spring 2022. Contact the Hub if you’re interested in participating, or are aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

New MBDH Community Development and Engagement partners

By Qining Wang

The Midwest Big Data Innovation Hub (MBDH) recently partnered with multiple institutions in the region for new data science activities under its Community Development and Engagement Program. This program incubates new projects and provides support to help them grow.

In the last proposal cycle, the MBDH Seed Fund Steering Committee selected three projects to support, led by the Tribal Nations Research Group (TNRG), St. Catherine University, and Trinity Christian College.

TNRG Digital Agriculture Meeting

The TNRG, together with the University of North Dakota and Grand Farm/Emerging Prairie, will host a one-day workshop in 2022, at the Microsoft Business Center in Fargo, North Dakota. This workshop will connect tribal colleges and universities working with their local tribal governments to extend digital agriculture and educational opportunities to Native farmers.

Approximately 30% of the nation’s Native population and 20 of the 37 of the nation’s tribal colleges and universities are located in the MBDH service area. Because of this, the MBDH is well-positioned to engage tribal stakeholders on issues related to Data Science Education and Workforce Development. This is especially true in the context of Digital Agriculture, where many of these institutions are working with their local tribal governments to extend agricultural programs and educational opportunities to Native farmers.

Tribal communities have not had the dedicated capital for building a resilient and sustainable infrastructure for harnessing food on their lands for a long time. The lack of such infrastructure creates food insecurity that can be detrimental to Indigenous peoples. In addition, due to climate change, it is crucial to build sustainable farming practices that can provide sufficient food and preserve the ecosystem everywhere in the long run.

One way to realize optimal farming practices is to incorporate digital agriculture, which integrates digital technologies into crops and livestock management. Technologies such as machine learning and big data analysis tools can improve agricultural production while minimizing the harm to the ecosystem. For instance, by correlating multiple parameters related to crop growth using machine learning, farmers can better predict crop yield based on other parameters such as nutrients in the soil, weather, and fertilization. Those technologies can therefore make information on ecosystems, crops, and animals more findable and interpretable to farmers.

However, implementing digital agriculture on tribal lands involves extra layers of nuance. Data scientists and agricultural experts must conduct digital agriculture research in tribal regions under proper data sovereignty standards, such as the CARE Principles for Indigenous Data Governance. Indigenous peoples are entitled to know what data is collected and how data scientists use and analyze their data. The data should enable Indigenous peoples to derive benefit from any fruits of the research involving tribal communities.

This workshop will serve to increase the accessibility of digital agriculture in Native communities, emphasizing respecting the culture, traditions, and sovereignty of the Native people. In addition, this workshop will enlist more tribal stakeholders nationwide for broader engagement in digital agriculture, potentially developing a Data Science Workforce Development and Education proposal for Native communities. Anita Frederick, the President of TNRG, will lead this workshop and present the importance of Data Management and Data Sovereignty.

“Outreach to Indian tribes is often difficult for non-tribal entities and individuals,” Frederick said. “As a direct result, tribal populations are often left out of initiatives that could help to address some of the economic, health, and other societal conditions that tribes face. Clearly, American Indian citizens must have access to the opportunities envisioned in the Big Data Revolution. The proposed project is a first step in helping to close the growing Big Data gap that is emerging between Indian country and the rest of the nation.”

St. Catherine Data Science Boot Camp

MBDH will also support a data science program “created by women for women” at St. Catherine University (aka St. Kate’s), one of the USA’s largest private women’s universities, located in St. Paul, Minnesota. This program aims to cultivate a new generation of women and historically underrepresented data scientists. In addition to teaching data science and data analytic principles, this program will also raise students’ awareness of using data science in ethically, socially, and environmentally just ways.

Introduced in the fall semester of 2018, the data science program at St. Kate’s reaches both current and prospective students of the University. Monica Brown, the Mary T. Hill Director of Data Science at St. Kate’s, will lead the program’s two initiatives in 2021-2022. Working alongside her colleagues at St. Kate’s for over 13 years, Brown aspires to make data science and data analytics principles accessible to every student in the St. Kate’s community.

Brown will launch a one-week Data Science Boot Camp in the summer of 2022. This boot camp will provide hands-on coding experience to middle- and high-school students, particularly those historically excluded from data science. In addition, Brown will invite data science professionals to speak about future career opportunities. Overall, this program aims to enable younger students to envision themselves as future data scientists and to elicit their passion for coding and data science. The lessons learned organizing this event will be shared with others who wish to do so with their own student populations.

“St. Kate’s is grateful for the partnership with MBDH towards the support of a boot camp,” said Brown. “We very much look forward to bringing younger students onto our campus to encourage and empower them through data science activities.”

Trinity Data Science for Social Good Workshop

The third project to be incubated under the MBDH’s Community Development and Engagement program will be an annual workshop and conference on Teaching with Data for Social Good (DSG) in summer 2022. DSG addresses the importance of teaching data science for positive social impact, and this conference serves as an opportunity that encourages teaching faculty to include DSG in their curricula proactively.

Trinity Christian College, a faith-based institution located on the outskirts of Chicago, will host this meeting. The workshop chair will be Dr. Karl Schmitt, an assistant professor in the Data Analytics department at Trinity and the coordinator of the Data Analytics program.

The meeting format resembles that of regional professional society meetings, consisting of a workshop, keynotes, and contributed talks. To provide more practical assistance to teaching faculty incorporating DSG, faculty will directly generate teaching materials that include DSG in the primary workshop sessions. Additionally, faculty will also have a chance to practice teaching DSG by actively advising student teams participating in a colocated datathon. In this student competition, student teams will use data science to solve practical problems.

“An important component of increasing persistence and success for our current generation of students is connecting their coursework to meaningful change or outcomes,” Schmitt said. “Through the Workshop on Data for Good in Education, the MBDH will be supporting faculty in developing their teaching to better incorporate the Data for Social Good (DSG) movement. This provides a natural connection to relevance with grass-roots level improvements in our society while promoting the broad applicability of data science.”

Beyond these outcomes, Schmitt said, “the workshop will be a professional development opportunity for all instructors seeking to more deeply engage their students through meaningful social good projects within a classroom setting. It will inspire, educate, and most importantly, allow faculty the chance to share, and prepare, materials for use within their own teaching context.”

Get involved

Learn more about other Community Development and Engagement partnerships, and contact the MBDH if you have an idea for a project to help build the data science community in the Midwest.

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Agroterrorism: Cybersecurity Incidents Affect Agriculture and Water

By Raleigh Butler

You may not think that agriculture and cybersecurity, both themes of the Midwest Big Data Innovation Hub, are linked, but recent events demonstrate there are connections between the two that pose risks to our food security.

The “food and agriculture” industry is publicly defined as a critical infrastructure sector by the U.S. Department of Homeland Security. The Cybersecurity & Infrastructure Security Agency (CISA) states that food and agriculture is one of sixteen essential critical infrastructure sectors that provide “the essential services that underpin American society and serve as the backbone of our nation’s economy, security, and health. We know it as the power we use in our homes, the water we drink, the transportation that moves us, and the communication systems we rely on to stay in touch with friends and family.” Those statements highlight the urgency of building robust cyberinfrastructure to prevent massive disruptions to crucial public services.

A recent cyberattack targeting an Iowa-based agriculture company called New Cooperative illustrates the severity and consequences of those incidences. The group claiming responsibility—BlackMatter—deals in blackmail, Reuters reports. The hackers from BlackMatter locked New Cooperative’s access to data that support the food supply chains and detail the feeding schedule of the livestock. In order to get access to the decryption key for its data and reinstate their farming activities, New Cooperative was ordered to pay $5.9 million.

As Bobby J. Martens, an associate professor of Economics at Iowa State University was quoted as saying, “This event wasn’t long enough to cause a change in the commodity price, but certainly it will have ramifications in terms of the food supply system. If they do it to this company, they could do it to one of the majors. They can block the food chain. They attacked in the heartland of all agriculture. It’s a new form of terrorism.”

Regardless of the source, and whether it is purposeful or accidental, a failure in any other critical sector could be life threatening for US citizens. For example, Water and Wastewater Systems is a related sector on CISA’s list, and in fact, water system attacks did occur early in 2020, the most prominent being the Oldsmar, Florida attack of February 16. While the breach nearly allowed a mass poisoning to occur, the mayor viewed the event as a “success.” According to ProPublica, cybersecurity experts view the breach not as a success, but instead as a “frightening near-miss.” Retired Admiral Mark Montgomery, a panelist on the MBDH Water Data Forum webinar on water and cybersecurity in May 2021, was quoted as saying, “Frankly, they got very lucky. They averted a disaster through a lot of good fortune.”

Nontechnical companies are extremely vulnerable to cyberattacks. According to the 2020 state of ransomware report, manufacturing, government, services, and healthcare are among the top sectors prone to cyberattacks. This link leads to this report from a company called BlackFog, a leading company in ransomware protection.

Moving forward, it is possible for businesses and governmental sectors to make cybersecurity an integral part of their practices. Even seemingly trivial data maintenance, such as regularly backing up data in multiple storage devices and encrypting data during transfer, can improve data security in the long run. The key is to operate under the mindset of protecting data and to be more intentional about data protection at any point. The U.S. National Institute of Standards and Technology (NIST) and CISA developed the NIST Cybersecurity Framework, a comprehensive approach to security for critical infrastructure, and there are subsets of that work to support small businesses and other organizations with cybersecurity risks that may not have extensive resources.

On the management level, designated information security officers can build more secure databases and data management systems. The information security officers can also perform routine testing for weaknesses in the existing systems. They could also work with the risk managers to develop preventative measures in case of cyberattacks. Other preventive measures include purchasing cyber insurance.

An additional benefit of developing systems for monitoring and collecting data is the ability to assess the impact of other external events. We previously published a story on how researchers were assessing the spread of COVID-19 by examining the relative levels of the virus in wastewater systems. Since many infrastructure systems, such as agriculture, water, and food, are an interconnected web of dependencies, threats to one can have cascading impacts across others. For academic organizations that manage research data repositories, the MBDH and its partners developed a guidance document on data security for open science, through our Trustworthy Data Working Group.

Get involved

Do you have a cybersecurity success story or case study to share from your organization? Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Climate Change Affecting Crops in Iowa

By Raleigh Butler

In 2010, the University of Minnesota received a grant from the National Science Foundation to study climate change using data-driven methods. The project included Midwest Big Data Innovation Hub co-PI Shashi Shekhar and a team of researchers from across the country. The research led, in part, to explorations of connections between food, energy, water, and climate change.

Because greenhouse gases contribute heavily to climate change, activities that contribute to their release are becoming more divisive with time. There’s no doubt that the food we eat is becoming an increasingly political statement. According to the 2019 Environmental Protection Agency report, agriculture was responsible for 10% of all greenhouse gas emissions, amounting to 650 million metric tons of CO2. A quarter of those emissions (about 2.5% of all greenhouse gas emissions) come from livestock before they are butchered.

The Coupled Model Intercomparison Project 3, a project of the World Climate Research Programme, predicts when the global average temperature will increase by 2°C. The approximately 0.75°C increase in temperature since 1950 has caused a huge increase in natural disasters. This can be seen by an increase in hurricanes, such as Katrina, and the melting of the polar ice caps, among other issues.

According to the graphic from their report, the global average temperature has already increased by about 1°C (1.8°F) relative to preindustrial levels, and will continue rising to as much as 7°C in some regions by the end of the century.

Climate change is the culprit behind many natural disasters, as more than 170 scientific reports covering 190 extreme-weather events found that around two-thirds of extreme-weather events likely originated from, or were exacerbated by, anthropogenic hazards.

How does this apply to the Midwest? Let’s look at Iowa, where over 90% of its land is used for agriculture. In recent years, extreme-weather events have wreaked havoc on crops.

2020: Inland Hurricanes

Farmers and agricultural specialists were worried in August 2020 when portions of Iowa experienced derechos. Pronounced deRAYchos, these are widespread, long-lived thunderstorms mixed with 100–130 mph winds. According to the National Weather Service, a derecho like this was “a roughly once-in-a-decade occurrence” in the Midwest.

These immensely strong storms destroyed crops and decreased crop output for the season in Iowa. According to the power-outage map published by the University of Wisconsin–Madison’s Cooperative Institute for Meteorological Satellite Studies (CIMSS) below, A quarter of the counties in Iowa caught the worst of the storm. All the affected counties were in the central-east portion of the state.

2021: Drought

There were high hopes that 2021 would bring a better crop return. However, when agricultural scouts crossed Iowa in mid August, they found that the state was suffering from extreme drought.

Although derechos and rain-damaged fields were no longer the center of concerns, 2021 has brought high levels of drought. According to the U.S. Drought Monitor, on August 17, 2021, 79% of Iowa was impacted by some degree of drought.

National Drought Mitigation Center map of the 2021 drought in Iowa
The U.S. Drought Monitor is jointly produced by the National Drought Mitigation Center (NDMC) at the University of Nebraska–Lincoln, the United States Department of Agriculture, and the National Oceanic and Atmospheric Administration. Map courtesy of NDMC.

The areas were being scouted out ahead of time for the upcoming 2021 Pro Farmer Crop Tour. Scouts on crop tours have the job of evaluating likely crop production in each region. For more information on crop tours, visit this link.

2021: Storms

Drought became an afterthought just days later. On August 24, 2021, the Midwest experienced multiple storms. Although the severity of the storms did not come close to derechos, they still left behind large paths of downed corn and soybeans. On August 28, 2021, South Dakota and southwest Minnesota even experienced baseball-sized hail.

According to Iowa State University (ISU) Extension Field Agronomist Terry Basol, “The storms hit northeast Iowa farms pretty good, honestly.” Basol said, “It’s amazing the scope of the crop damage,” he continued, concerned about the pace of harvest and crop quality.

Unfortunately, the rain has come too late for many crops, and on top of that, some areas are even flooding. One person, Iowa State University Extension crop specialist Angie Rieck-Hinz, said, “The crop is highly variable. Crop conditions are literally all over the place.”

What’s to Come?

Amidst all these natural disasters and climate change, what can be expected for the future of agriculture in Iowa? In April 2021, the Environmental Defense Fund commissioned KCoe Isom, an agricultural consultancy, to model the potential climate change impacts on Iowa corn, soy, and silage production over the next two decades. According to that site, “Iowa farmers could see statewide gross farm revenues reduced by as much as $4.9 billion per decade. Because with climate change agricultural prices are likely to rise, relative to without climate change, the impact to gross farm revenues from yield impacts will be offset to some degree by higher prices.”

Unfortunately, the increase in climate change (and resulting natural disasters) is likely to continue reducing levels of crop production. This will result in an increase in food prices where those crops are sold, affecting consumers across the country.

The roles for data science and related research around climate and agriculture are growing: in September 2021, the National Science Foundation funded a new multidisciplinary institute led by the University of Illinois, called I-GUIDE, which is focused on better understanding the risks associated with climate change.

Get Involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

How Do Scientists Help AI Cope with a Messy Physical World?

By Qining Wang

When we see a stop sign at an intersection, we won’t mistake it for a yield sign. Our eyes recognize the white “STOP” letters printed on the red hexagon. It doesn’t matter if the sign is under sunlight or streetlight. It doesn’t matter if a tree branch gets in the way or someone puts graffiti and stickers on the sign. In other words, our eyes can perceive objects under different physical conditions.

A stop sign. Photo by Anwaar Ali.
Photo by Anwaar Ali via Unsplash

However, identifying road signs accurately is very different, if not more difficult, for artificial intelligence (AI). Even though, according to Alan Turning, AIs are systems that can “think like humans,” they can still present limitations in mimicking the human mind, depending on how they acquire their intelligence.

One of the potential hurdles is to correctly interpret variations in the physical environment. Such a limitation is commonly referred to as an “adversarial example.”

What Are Adversarial Examples?

Currently, the most common method to train an AI application is machine learning, a type of AI process that helps AI systems learn and improve from experience. Machine learning is like the driving class an AI needs to take before it can hit the road. Yet machine-learning-trained AIs are not immune to adversarial examples.

Circling back to reading the stop sign, an adversarial example could be the stop sign turning into a slightly darker shade of red at night. The machine-learning model captures these tiny color differences that human eyes cannot discern and might interpret the signs as something else. Another adversarial example could be a spam detector that fails to filter a spam email formatted like a normal email.

Just like how unpredictable individual human minds can be, it is also difficult to pinpoint the exact origin of what and why machine learning makes certain predictions. Neither is it a simple task to develop a machine-learning model that comprehends the messiness of a physical world. To improve the safety of self-driving cars and the quality of spam filters, data scientists are continuously tackling the vulnerabilities in the machine-learning processes that help AI applications “see” and “read” better.

What Are Humans Doing to Correct AI’s Mistakes?

To defend against adversarial examples, the most straightforward mechanism is to let machine-learning models analyze existing adversarial examples. For example, to help the AI of a self-driving car to recognize stop signs under different physical circumstances, we could expose the machine-learning model that controls the AI to pictures of stop signs under different lightings or at various distances and angles.

Google’s reCAPTCHA service is an example of such a defense. As an online safety measure, users need to click on images of traffic lights or road signs from a selection of pictures to prove that they are humans. What users might not be aware of is that they are also teaching the machine-learning model what different objects look like under different circumstances at the same time.

Alternatively, data scientists can improve AI by teaching them simulated adversarial examples during the machine-learning process. One way is to implement a Generative Adversarial Network (GAN).

GANs consist of two components: a generator and a discriminator. The generator “translates” a “real” input image from the training set (clean example) into an almost indistinguishable “fake” output image (adversarial example) by introducing random variations to the image. This “fake” image is then fed to the discriminator, where the discriminator tries to tell the modified and unmodified images apart.

The generator and the discriminator are inherently in competition: The generator strives to “fool” the discriminator, while the discriminator attempts to see through all its tricks. This cycle of fooling and being fooled repeats. Both become better at their own designated tasks over time. The cycle continues until the generator outcompetes the discriminator, creating adversarial examples that are indistinguishable to the discriminator. In the end, the generator is kept to defend against different types of real-life adversarial attacks.

AI Risks and Responses

GANs can be valuable tools to tackle adversarial examples in machine learning, but they can also serve malicious purposes. For instance, one other common application of GANs is face generation. This so-called “deepfake” makes it virtually impossible for humans to tell a real face from a GAN-generated face. Deepfakes could result in devastating consequences, such as corporate scams, social media manipulation, identity theft, or disinformation attacks, to name a few.

This shows how, as our physical lives become more and more entangled with our digital presence, we can never neglect the other side of the coin while enjoying the benefits brought to us by technological breakthroughs. Understanding both would serve as a starting point for practicing responsible AI principles and creating policies that enforce data ethics.

Tackling vulnerabilities in machine learning matters, and so does protecting ourselves and the community from the damage that those technologies could cause.

Learn More and Get Involved

Curious whether you can tell a real human face from a GAN-generated face? Check out this website. And keep an eye out for the Smart & Resilient Communities priority area of MBDH, if you wish to learn more about how data scientists use novel data science research to benefit communities in the Midwest. There are also several NSF-funded AI Institutes in the Midwest that are engaged in related research and education.

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Teaching During a Pandemic

By Raleigh Butler

This story is part of a series on coronavirus research in the Midwest region. To explore other NSF-funded research addressing the COVID-19 pandemic, please visit the COVID Information Commons, a project of the four NSF Big Data Innovation Hubs.

In early 2020, the USA was brought to a standstill when stores, schools, and other everyday locations closed to help fight the COVID-19 pandemic. This piece will discuss a variety of RAPID awards funded by the National Science Foundation in the Midwest to support research to mitigate a variety of education-related pandemic challenges. Links to the NSF award abstracts will be linked in the article.

Remote Learning Research

A common topic for early school-related COVID-19 research was the exploration of remote or online styles of teaching and learning. In the early stages of the pandemic, it was unclear when schools would be able to reopen. Teachers and administrators were attempting to learn from prior work on remote learning at all levels, kindergarten through college.

For instance, in a project led by researchers from the Chicago Board of Education, researchers discuss the impact of public school students studying computer science remotely. Issues broached include access to the appropriate technology outside of school and possible socioeconomic variables at play. An award to the University of Kansas touches on a similar topic at the graduate level. This work is STEM-focused, addressing COVID-19-related challenges in engineering. One question the researchers ask is: “To what extent do the relationships between perceived e-mentoring support and student outcomes vary by demographics, disciplines, and institutional characteristics?”

A project at the University of Nebraska–Lincoln explores this topic from the faculty perspective. The project goal is described as “to identify cognitive and emotional themes concerning faculty and staff adaptability and community engagement during a crisis compared to those found under typical teaching circumstances.“ Adaptability is a key theme here—professors, regardless of their experience using online-teaching technology, were expected to learn how to do so. In Illinois, the Chicago Public Schools did its own research on forced remote learning and “mitigating the impact” of the sudden transition from in-person to online learning.

Remote Learning Activities

Researchers at the University of Minnesota–Twin Cities put an interesting spin on remote learning with their work on virtual reality (VR). Virtual reality allows learners to immerse themselves in a different world using a pair of electronic goggles or a headset. This device is used to portray a different world, like being “inside” a video game. The Minnesota researchers recognize that “many people, especially young adults, typically being used to active social life, can find this physical/social distancing leading to social isolation. Unfortunately, social isolation is strongly associated with negative outcomes for mental health and therefore represents a serious threat to long-term compliance.” The project aims to promote web-based VR as a way for people to interact safely in a shared environment, despite not actually being physically together.

The need for activities—especially for young learners—is addressed in Indiana University Bloomington’s research project. They have launched a Facebook group called CoBuild19, which works on making STEM activities more accessible to children.

Remote learning has also been addressed in an award granted to the Georgia Research Alliance. This award focuses on “ALOE”—Adult Learning and Online Education. Given the current state of the pandemic, it is immensely difficult, if not impossible, to continue education safely in person. The National AI Institute for Adult Learning and Online Education (AI-ALOE) addresses this by working to move adult-education opportunities online.


There’s a lot of information available about COVID-19. Some is real, some is misinformation (simply incorrect), and some is disinformation (incorrect with the intent to deceive). Many people have trouble deciding what sources to trust. A project led from the University of Michigan–Ann Arbor plans to follow a sample of university students. The premise of the research is to determine “whether and to what extent people follow recommendations and change behavior.”

Teaching About the Pandemic

Teaching about the pandemic itself is important. A relevant award is an exploration by researchers at the University of Nebraska–Lincoln into using popular media to educate youth on COVID-19-related issues. By using illustrated media such as comics to raise youth awareness of accurate coronavirus-related information, perhaps it’s possible to lessen the mis/disinformation discussed in the University of Michigan’s work. The project proposes assembling “an integrated package of high-quality, widely accessible media and other outreach materials designed to engage middle school youth, educators, and libraries in learning about viruses in relation to COVID-19.”

The researchers, wanting the material to be accessible, propose that “[t]hese resources will be disseminated broadly and at no cost to youth and educators of all kinds, including schools, libraries, museums, and other established networks for formal and informal science education.” Indeed, children’s lives are being altered drastically by COVID-19. It’s important that they know what is occurring to cause such alterations. In fact, the University of Missouri and the University of North Carolina at Chapel Hill collaborated to develop a curriculum for high school students. The curriculum covers epidemics in both scientific and social contexts.

Get involved

The projects described above were all funded by the NSF, which published a related story.

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Community Engagement through Open Watersheds

By Qining Wang

Lake Michigan and the Chicago skyline. Photo by Muzammil Soorma.
Photo by Muzammil Soorma via Unsplash

Many living in and outside of the USA presume clean water to be universally accessible in the USA. In reality, many people are living in a water crisis.

A recent study published in Nature Communications assesses the degree of the clean water crisis. Researchers found that as of August 18, 2020, more than 22,000 community water systems are either serious violators of or in significant noncompliance with the Safe Drinking Water Act. As the researchers point out, “our findings demonstrate that the problem of water hardship in the United States is hidden, but not rare.”

A huge underlying cause is inaccessible data on water quality. Different governmental and state sectors, such as the National Oceanic and Atmospheric Association and the Environmental Protection Agency, collect data on various water sources. Yet, the lack of communication among different sectors creates fragmentation of watershed data. As a result, watershed data is widespread, difficult to locate, and sometimes wholly inaccessible. Such fragmentation significantly limits policymakers’ ability to make informed decisions to improve water quality. Neither would consumers be able to tell when their water is unsafe.

One solution is to create open data hubs that centralize accessible and interpretable data, which would require both governmental and nongovernmental efforts. As such, the creation of open watersheds manifests in interesting intersections of community empowerment, resident engagement, and watershed management. In a recent panel discussion titled “Open watersheds: Innovations in Community Water Data,” four panelists involved in open watersheds in the Midwest discussed the benefits, challenges, and opportunities in open-source environmental monitoring.

This panel discussion is part of the monthly Water Data Forum webinar series, co-hosted by the Midwest Big Data Innovation Hub, the Cleveland Water Alliance, and the Water Environment Federation. The participating panelists were Whitnye Long-Jones, founder and executive director of Organic Connects; Mark App, project manager of the Great Lakes Data Watershed; Barb Horn, expert facilitator and steering committee member of the Water Data Collaborative; and Brandon P. Wong, president and co-founder of Hyfi.

Envisioning Open Watersheds

Each panelist had their own vision for open watersheds. Despite coming from different backgrounds, all agreed on the importance of recruiting community effort. Wong spoke from a technological standpoint by relating open-source technologies to open watersheds. He said: “I think there’s a responsibility to see what’s going on underneath the hood.” When it comes to open watersheds, there should be transparency in the process, from data collection from physical devices to data storage. Wong believes that open technologies will allow people to speak openly about what they know about the watershed, knowing how the data is collected and processed.

Long-Jones mentioned creating connections with community members. Since data scientists tend to use jargon and terminologies to describe water data, it is crucial to train community members to make water data more interpretable. Horns reciprocated this point and talked about cultivating relationships between residents and institutions.

Tools for Open Watersheds

Panelists further discussed building connections and relationships when talking about different tools that can benefit the communities. Horn made a crucial point about building healthy relationships with new technologies that facilitate data collection: “Too often technology comes in with its excitement [. . .], but no one has spent that time helping them build the context on how to use it, so it just becomes a strategy that actually has a short-term gain but not long still sustainability even if it could have.”

Horn also explained using data and technology to serve “wholism.” In other words, we should use new technologies to collect data that foster collaboration and innovation. New technologies should not mean to create competitions that only profit a minority of people. “It’s not the community versus the agency or the Agency. It’s not the company industry against the community. It’s like, how can this [new technology] serve wholism? How can this technology serve us coming together in a whol[ly] innovative way?” Horn said.

Regarding the current challenges and barriers, App discussed how the lack of consistent standards in water data collection creates difficulty in data integration. He suggested creating new visions around watershed data. In those visions, data collection would not merely be the responsibility of isolated entities but would be up to the whole community. Skillful community members such as retired NASA engineers, motivated high school students, and computer professionals in pattern recognition can all contribute to monitoring water quality. He believes that those new visions will be the driving force to create open standards in open watersheds.

Open Watersheds and Beyond

The panelists also discussed the benefits of open watersheds beyond open-source data collection and environmental monitoring. Long-Jones emphasized that the community efforts in open watersheds can greatly benefit areas experiencing disinvestment due to historical redlining. “Now you’re seeing some of that even still continuing when we’re talking about cities [that] are experiencing dirty, unclear drinking water,” she said. Many residents in those communities struggle to meet their basic needs, and their survival priorities come before monitoring the contaminants coming out of their faucets.

In this regard, Long-Jones encourages us to envision communities beyond geographic boundaries and be open-minded and humble when engaging with community members bearing diverse backgrounds. Only by truly listening to each other and understanding where each of us comes from can we realize that open watersheds and improving water quality require everyone’s involvement.

Get involved

You can get more involved with open watersheds by participating in the Cleveland Water Alliance’s Smart Citizen Scientist Initiative, a movement that encourages youth, elder, and underrepresented citizen scientists to collect open-source data on Lake Erie with simple technologies. You can also join upcoming Water Data Forum sessions.

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas, which include Water Quality. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Midwest researchers address food insecurity and transportation access during the pandemic

By Raleigh Butler and Qining Wang

This story is part of a series on coronavirus research in the Midwest region. To explore other NSF-funded research addressing the COVID-19 pandemic, please visit the COVID Information Commons, a project of the four NSF Big Data Innovation Hubs.

The University of Michigan received a RAPID award from the National Science Foundation in the early stages of the pandemic to explore improving food-insecurity conditions driven by the pandemic. The project, titled “Improving Transportation Equity to Enhance Food Security for Families Vulnerable to COVID-19,” is led by Robert Hampshire, in collaboration with H. V. Jagadish, Tayo Fabusuyi, and Aditi Misra.

The project builds on earlier NSF-funded research that developed the Transportation Equity Open Knowledge Network (OKN). The researchers integrated data from the Food Security Index and other sources into the Transportation Equity OKN. The researchers proposed to “investigate, and begin to develop mechanisms to address, the lack of access to food (i.e., food insecurity) associated with COVID-19 and the role of transportation challenges leading to food insecurity.” The research builds on prior work to support the development and evaluation of a meal-delivery program, as well as the identification of people and places most at risk of food insecurity due to a lack of access to transportation.


As a part of the project, the research team provided background context and technical assistance to the City of Detroit’s pilot program that delivers meals to vulnerable families.

The project aims to address the food insecurity as a result of the underlying inequalities exacerbated by the COVID-19 pandemic. During the pandemic, many low-income, marginalized, and vulnerable households struggled with access to food because of reduced public-transit services and inability to access internet services. Consequently, these households cannot place food orders or call for food delivery. Fearful of contracting COVID, many also avoided in-person grocery shopping. Considering the underlying broader social inequality, the food-insecurity situation isn’t just about food. In times of COVID-19, it translates into broader issues of health insecurity.

To address this issue, Hampshire’s team takes a data-driven approach to estimate the number and the key demographics of households facing food insecurity. In addition, they also worked with the City’s pilot program—Covid Food Delivery Program (CFDP)—to provide meal-delivery services for identified food-insecure households that rely on public transit in the City of Detroit or based on health referrals. By making their results publicly available, the team hopes their findings could inform policy makers to create more effective mitigation measures.


Choosing the City of Detroit as their case study, the team used data from multiple sources to identify the key demographic characteristics of households receiving Supplemental Nutrition Assistance Program/Electronic Benefits Transfer (SNAP/EBT) benefits. The team estimated that 71,600 households across Michigan met criteria for both food and transportation insecurity based on the US Census Public Use Microdata Sample (PUMS), of which 20,800 are from the City of Detroit. Finer segmentation based on geography and household composition were also carried out. By narrowing down the sample size to a finer geographic region, the team can easily replicate their approach for other local regions with more accurate Census tracts and more consistent information on food services.

The team randomly selected 350 patrons from the CFDP data dashboard to investigate the benefits of food delivery during the pandemic. They found that even though the program’s service only accounted for roughly 70% of the households’ weekly food consumption, 86% of the recipients of the program’s service reported having sufficient food each week.

However, many also reported that the food deliveries lack refrigerated items such as dairy and meat. Alarmingly, they also found more than a third of the patrons were first-time beneficiaries of CFPD, suggesting the pandemic is creating new cases of food insecurity in Detroit.

Through the analysis, the research team was able to identify the key benefits and issues of CFDP, which enabled CFDP to secure additional resources to redesign and expand their program. This program has now received an additional $1.5 million that can sustain the program until 2024. The work of the research team serves as an example of data for social good, in which a data-driven approach provides insightful guidance on how to mitigate issues around food insecurity.

Tayo Fabusuyi, the lead author of the project’s report, stated that “by documenting the program’s process issues and demonstrating how food insecurity severity could be estimated for different geographic areas, the program could easily be replicated at city or neighborhood level across the US. The project allows for learnings and adaptations not only by the City of Detroit, but also other cities that may be grappling with similar challenges.” The report closes by saying, “We believe that other cities will benefit from our documentation, learn from our experience and be able to modify similar program designs to address local peculiarities.”

Learn more

If you’re interested in learning more about how data directly connects to societal issues and human lives, consider attending MBDH’s “Smart & Resilient Communities / Data for Social Good” panel discussion, which will be held Thursday, October 28, 2021 – 2:00–3:00 p.m. CT / 3:00–4:00 p.m. ET. This panel includes one of the project’s Co-Principal Investigators, Tayo Fabusuyi.

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other projects we should include here, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, the University of Michigan, the University of Minnesota, Iowa State University, Indiana University, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the NSF Big Data Innovation Hubs community.

Big Data Neuroscience Workshop Brings Together a Transdisciplinary Research Community

By Erica Joo

Researchers working at the interface of computational neuroscience, big data science, and health analytics held the latest in a series of workshops designed to virtually bring together their community to explore new research and opportunities. The 2021 Advanced Computational Neuroscience Network (ACNN) meeting was held September 2–3, 2021. This is the sixth year that this workshop series was organized since the initial seed funding from the Midwest Big Data Innovation Hub in 2016. Despite this being the second year that the COVID-19 pandemic has led to this being an online meeting, participation remained strong, with over 180 participants from over 40 institutions across the Midwest, USA, and several other countries.

“The success of this workshop series in bringing together researchers across the Midwest has gone beyond our initial expectations,” said co-organizer and ACNN co-founder Franco Pestilli. “Every year for the last six years we have had students, postdocs, and faculty join the events. There is a thirst for connection across the Midwest.”

“This is how I have come to think about the Midwest region: It is similar to Boston or New York City but with a geographical barrier,” Pestilli said. “Large hubs such as those in the East Coast have an incredible amount of talent compressed within a small urban area. That allows researchers to share scientific ideas, results, and resources just by walking into a building at the other side of town. The Midwest has similar talent but spread across an incredibly large geographic region. What our workshop series aimed at doing is to break the barriers to scientific research and education created by the geography of the Midwest region. We did so first by using support by the NSF to bring students and scientists together from across the Midwest.”

“We learned a lot by going virtual,” Pestilli continued. “In 2020, we had over 450 participants, and double that of the years before. This year’s event was hybrid and we learned that it is possible to successfully bring together talent across the Midwest using hybrid events. We think that if more of these events are organized, the data science and neuroscience talents across large U.S. regions can come together more often, and effectively, just like it can more naturally happen in the East Coast hubs. We can break the geographical barriers to science and education in the Midwest. We also think that the southern States possibly have a similar challenge, with talent dispersed across a large geographic area. I am looking forward to expanding our Neuroscience network to the South.”

The 2021 meeting included multiple research presentation sessions, lightning talks, and keynote talks. Dr. Kamil Ugurbil from the University of Minnesota delivered the “Nalbandov Public Lecture” on Harnessing Imaging towards meeting a Central Scientific Challenge of the 21st Century: Understanding Human Brain Function. And Charles Springer from the Oregon Health and Science University presented a keynote lecture on Celebrating the 50th anniversary of first human MRI for non-invasive 3D imaging of water molecules, or protons, bones and soft tissues.

Reports of some exciting new research included recent work of Monica Rosenberg from the University of Chicago on building generalizable models of human behavior using Big Data neuroimaging data, and Archana Venkataraman from Johns Hopkins University, who demonstrated novel strategies for understanding structural and functional brain connectivity and its applications to multidimensional clinical phenotyping.

“The ACNN workshop was fantastic,” said Rosenberg. “It was a great way to hear about cutting-edge theoretical and methodological work in the field and connect with the computational and network neuroscience communities here in the Midwest. I’d love to participate in the future.”

A number of talks and lightning presentations introduced powerful multimodal techniques for data-driven inference in structural, functional, and diffusion imaging (Shella Keilholz), contrasting population-based and individual differences in functional brain networks (Caterina Gratton), and deriving and utilizing proxy measures of brain connectivity (Joaquin Goni). One lightning talk held by Dr. Bradly Alicea from the University of Illinois at Urbana-Champaign was on network science and application to neuroscience and biology. One such presenter, Paul Camacho, who is a neuroscience doctoral student at the University of Illinois at Urbana-Champaign, shared his experience from the workshop.

“The workshop was a fantastic event with a rare balance of world-class keynote speakers and a well-curated set of lightning talks from our Midwest community,” said Camacho. “The level of discussion in each session was greater than I had come to expect from virtual conferences over the past couple of years. Although I did not personally know all of my fellow presenters, there was a sense of camaraderie that is emblematic of the Midwest and very appreciated in the scientific community. As a mark of how successful the workshop was in fostering collaboration, I have noticed an uptick in traffic to the GitHub repositories for the work I presented in my lightning talk.”

Dr. Bradly Alicea from the University of Illinois at Urbana-Champaign held a lightning talk on his research in the application of network science on neuroscience and biology. “The conference went well. I’ve attended other conferences before, and there were some great keynote speakers as well as interesting discussions at this one,” Alicea notes. “I’m looking forward to next year’s conference and hope to present again.”

Next year, the 2022 ACNN meeting is scheduled to be held in person in Texas. “After five tremendous years in the Midwest I relocated to the South, to the University of Texas at Austin,” said Franco Pestilli. “I am currently in the process of exporting the model for the Big Data Neuroscience workshops to the South, building a new team of collaborators across the Southern states. I am sure the Midwest team, Ivo Dinov, Rich Gonzalez, and the others, will continue the work we have initiated in the region. The Midwest Big Data Hub has been fundamental in supporting our activities and I am sure it has interests at stake to continue the ‘good’ that it has been started and to connect the human infrastructure resources the Midwest has available.”

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

National Science Foundation (NSF) Awards $2 Million for COVID Information Commons Extension for Pandemic Recovery (CIC-E)

New York, NY – October 5, 2021

The COVID Information Commons (CIC) project, a program led by the Northeast Big Data Innovation Hub in the Data Science Institute at Columbia University, in collaboration with the Midwest Big Data Innovation Hub, the South Big Innovation Data Hub, and the West Big Data Innovation Hub, received additional funding from the National Science Foundation (NSF) to support the COVID Information Commons Extension for Pandemic Recovery (CIC-E) proposal (NSF #2139391). This new grant will provide an additional $2 million in funding to the COVID Information Commons project through September 30, 2025.

The COVID Information Commons (CIC) was established in May 2020 via an NSF COVID Rapid Response Research (RAPID) award (NSF #2028999) to facilitate information sharing and collaboration across NSF-funded COVID research efforts. The initial focus was on compiling publicly available information from COVID-related RAPID projects funded by the various NSF Directorates in order to create an easily searchable corpus. In addition to the publicly available information, the CIC also collected self-reported information from the project leaders via a voluntary survey. A CIC research webinar series was created, featuring talks by researchers from the NSF-funded COVID RAPID research projects. The CIC Extension will extend this initial CIC effort to include all projects funded by NSF related to COVID-19 including the pandemic recovery phase. In addition, it will seek to include publicly available information on COVID-related efforts beyond those funded by the NSF.

The initial CIC effort clearly demonstrated the benefits of bringing together information about a diverse set of COVID-related projects into a single place, thereby enabling interested users to efficiently search for information and discover linkages among diverse efforts. This helped foster the creation of a CIC community of researchers and students, and helped catalyze local and global collaborations. The CIC Extension will carry forward this idea to include projects in the pandemic recovery phase, and will additionally incorporate contemporary ways of interacting with the information such as via search and discovery of linked information using semantic search methods, and the use of domain ontologies and knowledge graph mechanisms.

Broad impact is central to the idea of the COVID Information Commons, which pulls together publicly available information along with voluntary self-reported information on NSF-funded COVID-related research projects in order to enable search and discovery of information and collaborations among individual efforts. The CIC has demonstrated early successes in creating such collaborations among researchers from diverse scientific disciplines and from different parts of the country, and around the world, drawn together by their common interest in studying the COVID pandemic. By extending the CIC effort to the pandemic recovery phase, the CIC Extension will reach an even larger and more diverse community of COVID researchers and facilitate networking among researchers engaged in COVID-related research. The CIC Extension will also build upon and expand the successful CIC research webinar series and undergraduate engagement programs initiated in the initial phase of this effort. COVID researchers funded by NSF and NIH, including those newly funded through the American Rescue Plan of 2021 (ARP), will be invited to join the open CIC community and participate in collaborative webinars and events to increase researcher collaboration and accelerate COVID-19 recovery. Visit us at to learn more and join the CIC community.

The Northeast Big Data Innovation Hub

The mission of the Northeast Big Data Innovation Hub is to build and strengthen partnerships across industry, academia, nonprofits, and government to address societal and scientific challenges, spur economic development, and accelerate innovation in the national big data ecosystem.

The Northeast Hub is a community convener, collaboration hub, and catalyst for data science innovation in the Northeast Region. The Hub amplifies successes of the community and shares credit across the community to encourage collaboration and mutual success in data science endeavors.

The goals of the Northeast Hub are to: build collaborations to address real-world challenges through translational data science approaches; foster innovation and scale endeavors that reflect regional interests and align with national priorities related to data science; support and promote representative community engagement/impact across all Hub activities; and increase data science capacity and talent, emphasizing underserved communities. Visit us at to learn more.

The COVID Information Commons

The COVID Information Commons (CIC) is an open website to facilitate knowledge sharing and collaboration across various COVID research efforts, initiated by the NSF Convergence Accelerator. The initial focus of the CIC website was on NSF-funded COVID Rapid Response Research (RAPID) projects. The CIC serves as a resource for researchers, students and decision-makers from academia, government, not-for-profits and industry to identify collaboration opportunities, to leverage each other’s research findings, and to accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic.

The CIC community is a dynamic, collaborative community of over 1,500 researchers, practitioners and students working on COVID-19 research and insights to enable pandemic recovery and mitigation. The entire CIC community is invited to monthly CIC PI lightning talk webinars, which have attracted over 835 participants from the CIC launch webinar in July 2020 through September 2021. The monthly CIC webinars have featured 78 PI lightning talks which are individually available on demand on the CIC website on the “Meet the Researchers” page. The full recordings of all monthly webinars are also available in the CIC Video Library on the CIC website. COVID researchers find research collaborators by participating in the live webinars and by watching recordings through the CIC portal. The addition of more researchers, research, publications, datasets, and metadata will further accelerate and increase collaboration on COVID research, through the CIC-E funded by NSF.

Upcoming COVID Information Commons Events

Every month, the CIC brings together a group of researchers studying wide-ranging aspects of the current pandemic, to share their research and answer questions from our community. Attend this event to learn more about their ongoing efforts in the fight against COVID-19, including opportunities for collaboration. Register here for your unique Zoom link and calendar information.

Media Contacts

Florence Hudson
Executive Director, Northeast Big Data Innovation Hub

Lauren Close
Operations & Communications Manager, Northeast Big Data Innovation Hub

Sign up for the COVID Information Commons newsletter to receive future updates, including event notifications and program announcements.

Building a Midwest Carpentries Community

By Raleigh Butler

The Midwest Big Data Innovation Hub is committed to building data science instructional capacity in the Midwest region, particularly at smaller colleges and universities, such as predominantly undergraduate institutions (PUIs).

One avenue for this is the Midwest Carpentries Community, a partnership between the MBDH and the University of Wisconsin-Madison, under the Hub’s Community Development and Engagement (CDE) incubator program.

The project aims to build “hands-on data science instruction capacity,” by using the existing curriculum and workshop model of The Carpentries, an international member-supported organization that strives to teach data science and coding skills on a global scale. The organization is structured around three lesson programs: Software Carpentry, Data Carpentry, and Library Carpentry, which are “communities of Instructors, Trainers, Maintainers, helpers, and supporters who share a mission to teach foundational computational and data science skills to researchers.”

In this post, we will focus on a discussion with Sarah Stevens, who leads the Midwest Carpentries Community. Stevens is a 2021 member of the Executive Council for The Carpentries. She is also a Data Science Facilitator at the University of Wisconsin–Madison, in the Data Science Hub within the Wisconsin Institute for Discovery and American Family Insurance Data Science Institute.

How did you get involved with The Carpentries?
“I did my undergrad at the University of Illinois. My degree was in molecular and cellular biology, but I did a minor in informatics. And when I came to graduate school, I found that none of my classmates had done any coding and they didn’t know computation. And almost all of them had to learn how to do some computational analysis over the course of grad school. So to help support [them], I started a community of practice around helping each other with our computational needs and learning from one another. I was trying to bring people together not just to discuss the biology in our research, but actually the computation in our research, and in doing so I also got connected with The Carpentries community. There’s been an ongoing Carpentries community since long before my time at the University of Wisconsin-Madison. And my advisor recommended ‘maybe you should sign up for instructor training so you can learn how to teach these things better.’”

What are some of the main projects you’ve worked on during your time there, specifically in the Midwest?
“I’ve been trying to bring together researchers in the Midwest who are either running Carpentries communities of their own or want to get started with Carpentries communities. We’ve been hosting a monthly call to bring those people together to help each other, similar to the community of practice I started in grad school. I’d say probably instructor training is one of the things that I find the most useful and interesting in The Carpentries. I think it’s really cool to talk to other instructors about how to teach, and how to teach using evidence-based research, and how to teach computational skills and learn from one another.”

What are some of the skills that people develop in Carpentries workshops?
“They [the learners] come to learn R, Python, the Unix shell, and Git, but what I really want them to get is a foundation where they believe that they can learn more. I feel like a lot of people come to our workshops feeling like computing and technology is not for them. Maybe they’ve even had bad experiences trying to learn coding in the past. What I really want people to learn and come away with from our workshops is that they can learn this.”

What has been different about doing Carpentries-related activities specifically during the pandemic?
“Moving online has its own challenges. Being a part of a community of instructors, who are also all dealing with this transition to online at the same time, I got to learn a lot from what other people did and how it worked for them. So, as a community, we were able to share tips and tricks and best practices for moving online and learn from one another. That’s really one of the things I love most about The Carpentries community is being able to benefit from other instructors’ experiences.”

“I will say the worst part about moving online is that while I totally respect folks not turning on their video, it’s a little less rewarding to teach to a screen. You do get feedback, like the sticky note feedback we collect in Google forms and people typing in chat, ‘this was a great workshop.’ But you don’t get to see them actually overcome that boundary of ‘I didn’t think I could do it—and I can do it now or this makes sense to me suddenly.’ And so it’s a little less rewarding to teach online, I will say, but I do feel like it’s been a good learning experience of having to pivot and practice these skills in a different way of teaching and checking in with learners.”

You proposed the Midwest Carpentries Community project for the MBDH CDE program—what did you perceive as the need for that?
“I’m seeing communities start to form in other places across the world. And I think it’s really great for creating new Carpentry communities and teaching these important skills across the globe. I was running into people from other institutions who had interacted with The Carpentries in some way. I wanted to be able to share my experience with The Carpentries like at UW–Madison; what works well with the UW–Madison Carpentries community, with other folks in the Midwest and working to learn from them as well.”

“So, what works well at Illinois, what are they doing that we can learn from? Are they creating new workshops that we too could use? That’s where I saw the need—I wanted to be able to support these new instructors and new communities that we’re developing in the Midwest, and learn from the existing communities that have been teaching Carpentries workshops for a while and doing new and interesting things.”

What would you say to someone new to The Carpentries world about why it’s valuable to participate in the community beyond attending a workshop?
In addition to offering the teaching of various skills, Stevens says “I think it’s really valuable. There’s so many things you get from it, you learn a lot about building an inclusive community as that is a big part of the Carpentry community.”

She adds, “I see a lot of networking—developing an interpersonal network and being able to find employment in the future is also a benefit of this, but you make connections with other institutions and learn from them and other organizations across the globe, really, and so it’s a great opportunity to learn from others, not just being in the workshop, but observing other people in our community and their activities they’re up to.”

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in our activities, which include a data science student community and the national BD Hubs monthly webinar on data science education and workforce development.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Meet the MBDH Fall 2021 science writing and coordination interns

For Fall 2021, the Midwest Big Data Innovation Hub has four new interns joining the team to work on a variety of projects. One intern, Sushma Mahadevaswamy, will be working on project and events coordination. Three others, Raleigh Butler, Erica Joo, and Qining Wang, will be science writers, helping to amplify the many community-led projects in the Hub’s 12-state region. All will learn about the range of activities and communities the MBDH is involved in, and will receive mentoring and have opportunities for career development.

The MBDH has a number of events planned for Fall 2021, including ongoing webinar series (Water Data Forum, Data Science Student Groups), a new research development series called the Collaboration Cafe, and a two-day Regional Community Meeting, open to all.

To help develop these events, and do outreach to our student community, Sushma Mahadevaswamy has joined the MBDH team as a project coordination and events intern. She’s currently pursuing her master’s degree in information management at UIUC. Previously, she was a software developer for 3 years at Cisco. Hailing from the silicon city of India, she’s well versed in cloud computing, problem solving and algorithms (she knows her Big O’s), and software development.

While working at Cisco, she handled application security, across six cross-geographical teams based in India and the USA, through collaboration and communication. She loves to organize events to motivate her team. She’s a vibrant individual, who was an MC for various global events. Her strengths lie in development as well as efficient management of projects.

Her goal is to bridge the gap between technical and business aspects of product/project management. She’s excited to put her skill set to good use at MBDH. She will be engaging with the student community to organize knowledge-sharing events that will enrich the data science community.

In her spare time, she usually paints or goes on a hike. She’s done three Himalayan treks and hopes to ascent Mt. Everest one day. She also believes in giving back to the society and she regularly volunteers to teach underprivileged children. Her favorite quote is, “Make a difference, not a living.”

With programmatic activities ranging from the MBDH’s partnerships in its Community Development and Engagement (CDE) program, to other Priority Area work, exciting new projects in the region, and the events described above, there is a lot for the science-writing interns to draw from. They will be focused on telling the stories of the projects and the people—researchers, students, partners, and collaborators—and how the work they are doing is impacting the Midwest region, the nation, and the world.

Raleigh Butler is one of the three science writers interning at MBDH for the fall semester. Her undergraduate degree was a dual major in Linguistics and French at the University of Tennessee, Knoxville. She recently got her MS degree in UIUC’s Journalism program, graduating summa cum laude. Between the two degrees, she pursued a post-bac, focusing on introductory science courses.

Raleigh views science writing as a wonderful opportunity to combine STEM and the humanities. She aspires to “translate” technical verbiage into phrasing easily understood by the average reader. She emphasizes, “during these times of great scientific developments—not to mention health-related developments—it’s critical that the wider population have an understanding of what’s going on. By providing a reliable source of information that is also more understandable, perhaps we can assist in this education process.” Indeed, people frequently want to learn without necessarily reading a full-length technical article.

She believes that access to easy-to-understand material instead of difficult-to-parse journal articles will reach the population more successfully and wants to do her best on that front. For example, recently, she has been writing about COVID-19.

Raleigh says “I’m extremely excited about this opportunity to begin pursuing my dream job and to learn more about the field.”

Qining Wang (she/her) also joins MBDH this semester as a science-writing intern. Born and raised in China, Qining moved to the USA in 2013 and received her BA degree in chemistry from Rutgers University in 2018. She is now in her fourth year of pursuing a PhD in chemistry at Northwestern University. Co-advised by Prof. Joe Hupp and Prof. Justin Notestein, she synthesizes heterogeneous catalysts supported on metal-organic frameworks and investigates their gas-phase reactivities.

Aside from conducting scientific research, Qining is also conscious of the broader impact of science. She strives to inform the public of the progress in science and technology by making cutting-edge science more accessible to a lay audience. She wants to tell the stories of scientific discoveries and scientists through a curious lens without invoking intimidating equations and jargon. Therefore, in addition to writing, she also explores different approaches to effectively communicate science, such as videos, podcasts, and social media.

Qining says, “there are so many barriers to accessing and understanding science, from the intricate language scientists use to talk about science to the academic publications behind paywalls. As a scientist, I am responsible for removing those barriers.”

Erica Joo (she/her) is the third science-writing intern at MBDH this semester. As a junior at the University of Illinois at Urbana-Champaign, Erica is pursuing her BS degree in Molecular and Cellular Biology with a minor in Journalism. Additionally, she is an undergraduate researcher in Dr. Joe Qiao’s lab, and her research project is focused on meiotic checkpoint pathways and investigating certain enzymes involved with DNA repair pathways.

While being involved on the frontlines as a healthcare worker during the pandemic, she noticed a disparity in information about COVID-19, especially with the perpetuation of misinformation across the media. Erica recalls. “I felt that I wanted to be a part of the change that the world desperately needed at the time.” Combining her two passions, science and writing stories, was a catalyst in the evolution of her life. Erica has a strong interest in social issues and science research, and as a biology student herself, she understands the difficulty in understanding science at face value. “Navigating from one discipline to the other, I’m ultimately trying to create a common ground in my versatility.”

She aspires to take her experiences and academic background to not only help readers make sense of the science behind various types of research but to also address questions that the general public may wonder about and make it easily accessible. With high hopes and ambitions, Erica imparts, “from my experience in both fields, my job is always to write effectively so that audiences without extensive knowledge on a particular field can also learn and develop their own thoughts.”

MBDH Executive Director John MacMullen said, “We’re excited to have such a talented group of interns who bring a diverse set of skills and experiences to the Hub this semester. We look forward to seeing the work they produce and having the community engage with them on the wide range of data science activities happening across the region.”

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in our activities, which include a data science student community.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Researcher Profile: Aditya Kulkarni

By Raleigh Butler

On May 19, 2021, five researchers joined the COVID Information Commons (CIC) “Lightning Talks” webinar hosted by the National Science Foundation-funded Big Data Innovation Hubs. Each speaker was involved in COVID-19 research and gave a brief presentation on their project.

One of the presenters, Minnesota high school student Aditya Kulkarni, was almost indistinguishable from the other researchers in terms of his preparation and professional presentation.

Kulkarni is currently about to go into his senior year. He has been taking college classes since seventh grade. He started off just taking dual-enrollment math courses and now takes all of his classes at the University of Minnesota.

Though he has always been fascinated with programming and data science, the COVID-19 pandemic spurred Kulkarni on to explore data related to that specific issue. He submitted a paper entitled “Human Mobility Patterns Linked to COVID-19 Prone Locations” to the COVID Information Commons (CIC) Student Paper Challenge. His paper won third place, and he was invited to present the research alongside his more senior colleagues on the CIC webinar.

Needless to say, all this is an impressive feat, so I sat down and spoke with him a bit about his interests, school life, and hopes for the future.

How has taking college courses so early in your school career affected you? Do you think you’re more driven or serious than normal?
“Yeah, I think it’s actually been pretty helpful, because . . . I do feel like there’s some differences between taking high school classes and college classes. I mean, high school classes are like, fine—you have your different social groups, but with college, you’re also able to get exposed [to] the cutting-edge research that’s happening, [in] these fields that you’re learning about.”

Do you do dual-enrollment classes where the professors come to your high school, or do you go to the university?
“In this term, the high school isn’t really involved. I’m basically just like a college student traveling to campus coming back later in the evenings. And I’m still in the class with the other college students interacting with them, doing projects.”

Yeah, I was going to ask, if you were socially involved with college students; if you’re more mature than most people your age, then that would be something to appreciate.
“Yeah, and . . . it’s not like people even treat me weird. I just blend in with everyone else, just participating in things.”

Did you take any programming classes? And if so, like, did you enjoy them?
Kulkarni stated that his school offered a small programming course. “It was called Hour of Code. So there was a website, and we would have around an hour a day for one week. And we would just spend [time] seeing how to develop code, mainly block code. But at that time, it was kind of interesting to me seeing how I was able to create things just by dragging and dropping things. And yeah, it was pretty interesting. And [I] think it was mainly animation based . . . just making things move on the screen doing simple tasks. But from there, I think I saw the power and the capabilities that were there with coding.”

Do you and your peers participate in datathons, hackathons, and other kinds of science and computing activities?
This coming term, Kulkarni said, “the high school [won’t be] really involved,” but in the past, he started a STEM-related club at his high school and was very active in terms of connecting fellow students with professionals. Students from the club also team up to participate in hackathons and datathons. Kulkarni says he finds these competitions interesting “especially if there’s a sponsor, I’ll do something related to what they’re doing.”

For the CIC Student Paper Challenge, Kulkarni focused on a data set obtained from This site tracks device movement (no personal information tied in) across the USA. Kulkarni used the available information to create related datasets and compare similar locations in Minnesota. For instance, he found 15 public places with June-July outbreaks and 15 places with no June-July outbreaks. His results show that longer-duration visits to an establishment are associated with COVID outbreaks. He received feedback and mentoring from Midwest Big Data Hub co-PI Shashi Shekhar, a professor of computer science at the University of Minnesota. His final paper is available online in the Columbia University Academic Commons repository.

Are there opportunities for you to build on this specific project that you submitted?
Currently, Kulkarni is pursuing “another direction of economic metrics.” “Even though it’s a human mobility data set, seeing the economic aspect in terms of socioeconomic groups, how [those people] were affected during the pandemic, and then their mobility in terms of that.”

So, I get the feeling you’re wanting to officially pursue computer science and data. If you had to choose a specific subfield to go into, what would you choose?
“I think I would actually go [into] data science. I think that’s the main thing. Then AI, with data sets, just seeing what are the possibilities to explore.” He went on to emphasize how technology could be of use in terms of bettering health situations and other human issues, “there’s just so much [COVID] data going further beyond into the predictive capabilities that can just be done with this much data. Because if there’s a future pandemic, which even though happens pretty rarely, if it happens, then maybe there’s something that we can learn from this one and apply it to the future.”

So, basically, what you like about research is the ability to help and provide insight into what can make the world a better place; is that how you would say it?
“Just because I can, through this mode . . . I can help the community as . . . a broader world or even as a small, small subsection. That’s a way where I can contribute to society, I guess.”

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led activities. The MBDH also has a data science student community, with a monthly webinar. Learn more about the COVID Information Commons webinar series and community.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Integrating Regional Water Quality Data with the Upper Mississippi Information System (UMIS) Project

By KJ Naum

Photo of the Mississippi river near Fort Snelling & Minnehaha, Minnesota
Photo by Mathew Benoit on Unsplash

As the Mississippi River flows from its source in northern Minnesota to its mouth on the Louisiana coast, its waters cross the boundaries of ten states, picking up a lot along the way. This includes nutrients such as nitrogen and phosphorous, which contribute to “dead zones” where the river drains into the Gulf of Mexico. Dead zones occur when too much nutrient pollution causes algae to grow excessively. When they die, the decaying cells consume oxygen, depriving other life forms of the oxygen they need to survive. This condition, known as hypoxia, can lead to the devastation of entire ecosystems if left unchecked.

There’s not a lot of mystery about what causes nutrient pollution. Widespread agricultural practices in the Midwest’s Corn Belt encourage the plentiful use of nutrient-based fertilizer, so much so that much of it washes away even before the crops can use it. But trying to understand how it’s happening remains a challenge. The data on the river is as free-flowing as the water itself—and often just as slippery.

“Lots of people are doing water quality monitoring, and there are maybe hundreds or thousands of water quality parameters that can be tracked,” says Chris Jones. Jones is a research engineer at the University of Iowa, who works with the Upper Mississippi Information System (UMIS), an online platform that aims to make this deluge of data more accessible and manageable. Jones also works on the Iowa Water Quality Information System (IWQIS), an ongoing effort that informs this newer project. IWQIS makes real-time water quality data from within the state of Iowa available to researchers and the general public. However, the UMIS team is thinking bigger than that. Jones notes, “Watershed boundaries are different from political boundaries. We have to think within their context if we’re going to improve water quality, and so our vision was to bring the IWQIS concept to a larger geographical area.” The Upper Mississippi Information System aims to do exactly that. A team of researchers at the University of Iowa, Iowa State University, and the University of Illinois at Urbana-Champaign are working together on building the UMIS platform and wrangling the data for public consumption. The online platform provides one-stop access to independently managed data streams—both real-time and historical.

The initial site is live, and Jones characterizes it as about halfway complete. The biggest task for the team is to acquire still more data through building partnerships with other organizations. “We’re mainly focused on nutrients like nitrogen and phosphorus right now, but some other data will likely be available,” Jones says. “We had to start somewhere. This is a good place to start because it’s what many people are most interested in.”

Despite the widespread interest, combating nutrient pollution in the Midwest is an uphill battle. Unlike other U.S. water systems like the Chesapeake Bay, the states of the Mississippi basin have chosen not to regulate nutrient reduction, thanks to a powerful agricultural lobby that is opposed to such mandates. Instead, the state governments each try to promote and incentivize more widespread adoption of practices that reduce nutrient flow. 

Jones, however, is skeptical that meaningful change can happen without collaboration. “The states will have to work in concert in order to have any meaningful impact on solving hypoxia,” he says. “That means giving scientists access to a lot of data. Having access to sound scientific data is critical for making policy.”

Individuals and organizations that are interested in the UMIS project can sign up to be a data partner or beta user via the UMIS website, or contact the team via email. Jones and the team are hopeful that UMIS will help drive change at the scale that is needed. “Nutrient pollution is one of the wicked problems, along with climate change, but we know there are solutions out there,” he says. “Solving this is a sociological and economic issue. Hopefully, UMIS can be a tool for policymakers to do just that.”

Get involved

Contact the Midwest Big Data Innovation Hub to suggest other projects we should highlight on this blog, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing data science collaborations in the 12-state Midwest region. Learn more about the NSF Big Data Innovation Hubs community.

Big data aids PPE research

By Barbara Jewett

This story is part of a series on coronavirus research in the Midwest region

Many researchers in the Midwest received awards from the National Science Foundation last year for developing novel masks and other personal protective equipment.

One of those researchers, Leonardo P. Chamarro, an associate professor in the Department of Mechanical Engineering at the University of Illinois at Urbana-Champaign, was awarded a special one-year, $200,000 RAPID grant to design a 3D-printable medical mask inspired by the nasal structures of animals. Working with Associate Professor Sunghwan Jung at Cornell University and Assistant Professor Saikat Basu at South Dakota State University, the team hopes their design addresses mask shortages and improves existing face protection by providing an open-source template for use with 3D printers.

The team captured small aerosol droplets that can carry viruses from inhaled air using a combination of copper-based filters and twisted periodic thermal gradients induced by spiral copper wires that mimic nasal pathways. The aerosol capture was articulated by modulating the dynamics of flow structures in the convoluted geometry (a vortex trap) and by thermophoresis action along the respirator’s internal walls (a thermal trap). Cyclic cold/hot temperature changes on the walls, along with ionic activity from the copper material, is used to inactivate the trapped viruses.

Dr. Chamorro took time away from his research to answer five questions about his COVID-19 research:

What’s the problem you’re trying to solve, and how is your team addressing it?
We are focused on exploring ways to mitigate the COVID-19 pandemic transmission and understand the role of turbulence [in virus spread]. In particular, we are collaborating with Sunny Jung at Cornell University and Saikat Basu at South Dakota State University in the development of a novel bio-inspired protective mask based on thermal and vortex traps. [We are also collaborating] with researchers at Purdue, Rensselaer Polytechnic Institute, the National Autonomous University of Mexico, and Tsinghua University in Beijing in the development of an autonomous robot for scanning, data mining, and disinfection. [In another project] we are also collaborating with a team at Northwestern on the description of contaminated droplet dynamics. My team uses theory, state-of-the-art flow diagnostics tools at various scales, and in-house analysis tools.

What’s changed since this project started last year?
It is a question that has many layers. The more we learn, the more we realize that several fundamental gaps need to be addressed to prepare for the next pandemic. Changes have occurred at various levels.

What data are you working with? Are there data challenges you’re dealing with? Are you using public data resources? Are you producing data that others are using?
We focus on the dynamics of droplets and aerosols and the interaction with closed domains at a range of scales. It requires performing experiments, capturing three-dimensional particle and flow dynamics, and, consequently, we produce our data. High-fidelity tracking of many particles and flow filed simultaneously in space and time is not trivial; however, my team has developed the needed technology to face those challenges.

Is your team seeking collaborators, subject matter experts, or other resources that you’d like to put a call out for?
Yes, we would very much like to collaborate at the fundamental and applied levels on various pressing problems, including, but not limited to, the role of turbulence across scales, ventilation, and boundary conditions.

Where can people learn more about your progress?
So far, we have contributed to two peer-reviewed papers. One paper in Extreme Mechanics Letters on the performance of various fabrics in homemade masks and another paper is in advanced stages of review in PNAS. My group also gave four technical talks on COVID research at the last American Physical Society in November, and we are updating our webpage to share recent findings.

Other PPE Projects
There are numerous other PPE projects in the Midwest that received Rapid Response Research grants. Here are a few of them:

  • Safely returning to using reusable equipment, including some PPE, is the focus of an award to Andrea Hicks, an assistant professor of civil and environmental engineering at the University of Wisconsin–Madison. You can read more about her work here.
  • Producing masks that capture and neutralize viral pathogens by adapting a decade of work developing a proprietary composite nanofiber material for water filtration is the focus of collaborators David Cwiertny, a professor of civil and environmental engineering and director of the Center for Health Effects of Environmental Contamination at the University of Iowa, and Nosang Myung, the Keating Crawford Endowed Professor in Chemical and Biomolecular Engineering at Notre Dame. Cwiertny received an award for this research project and Myung also received an award. You can read more about their work here and also here.
  • Developing smart face masks embedded with battery-free sensors to assess proper fit and monitor health is the focus of the award received by Northwestern’s Josiah Hester, an assistant professor of electrical and computer engineering. You can read about his work here.
  • Developing a new self-sanitizing medical face mask that deactivates viruses on contact earned an award for Northwestern materials science professor Jiaxing Huang. You can read about his work here.
  • Exploring coating the surface of PPE with copper and zinc oxide nanoparticles to limit the spread of viral particles is the subject of an award for Robert DeLong, an associate professor in the Nanotechnology Innovation Center at Kansas State.

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other projects we should include here, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, the University of Michigan, the University of Minnesota, Iowa State University, Indiana University, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the NSF Big Data Hubs community.

MBDH Learning Innovation Fellows program – first cohort projects

The Midwest Big Data Innovation Hub Learning Innovation Fellows Program, housed at the University of Michigan School for Environment and Sustainability, enables teams to form for work toward better understanding of the intersections of the Hub’s “Cyberinfrastructure and Data Sharing” and “Data Science Education and Workforce Development” themes.

Our fellows work with faculty and teaching staff to create innovative interactive data analysis activities that can nest within sustainability science case studies. They design, prototype, and pilot these features in classrooms within the MBDH network. The program leverages talent and resources from two existing, open-source science learning environments. Gala ( is a community-based, responsively designed sustainability science learning environment. Quantitative Undergraduate Biology Education and Synthesis (QUBESHub, or Qu) is a virtual center for faculty development and open educational resource sharing ( that has had long-term support from NSF, formalizing and professionalizing open educational resources.

Through a series of virtual “Networkshops,” we connect undergraduate data science majors, graduate/professional students, faculty, and professionals. We can thus be inclusive, incorporating into classrooms problem-driven, data-rich material that speaks to lived infrastructural and environmental challenges from a range of communities across our region, and beyond. The team includes the following:


Rebecca Hardin (PI) is an anthropologist and Associate Professor at the University of Michigan School for Environment and Sustainability (UMSEAS), where she leads collaborations on the open-source, open-access learning platform Gala ( and research group on Digital Justice. Rebecca also coordinates the Environmental Justice Field of Specialization and related Certificate program at UMSEAS.

Ann E. Russell (Co-PI) is an ecosystems ecologist, with special expertise in the biogeochemistry of tropical ecosystems. She is an Associate Adjunct Professor in the Department of Natural Resource Ecology and Management at Iowa State University, and PI of the NSF Research Collaborative network ALIVE: Authentic Learning in Virtual Environments.

M. Drew Lamar (Co-PI) is a mathematician and Associate Professor of Biology at William & Mary. His teaching and research are highly interdisciplinary in nature, using techniques and concepts from mathematics, statistics, biology, and computational sciences. Drew is Co-PI and Director of Cyberinfrastructure for the Quantitative Undergraduate Biology Education and Synthesis (QUBES) virtual center, with an interest and passion in open-source software development, quantitative biology education, and development of education gateways.

Ed Waisanen (Program Manager) is Program and Platform Lead for Gala ( He has a master’s degree in Natural Resources and Environment from the University of Michigan, with a focus in Environmental Informatics and a background in multimedia production. Ed is focused on developing tools and communities that emphasize curation, open exchange, and narrative approaches to deepen learning.


Data Learning for Restoration Ecology

Kyra Hull (Fellow) is a native of Grand Rapids, Michigan, and a first-year graduate student at Grand Valley State University, studying Biostatistics. Kyra is working on the following case about forest restoration, which is bilingual (Spanish and English versions):

Karen Holl (Faculty Advisor) is a Professor of Environmental Studies at the University of California, Santa Cruz. Her research focuses on understanding how local and landscape-scale processes affect ecosystem recovery from human disturbance and using this information to restore damaged ecosystems. She advises numerous public and private agencies on land management and restoration; recently, she has been working to improve outcomes of the effort of the many large-scale tree-growing campaigns.

Data Learning to Address Groundwater Contamination

Saba Ibraheem (Fellow) is a second-year Health Informatics student at the University of Michigan, focusing on data analytics and research in health care. Saba is working on the following case, which is bilingual (English and French versions):

Rita Loch-Caruso (Faculty Advisor) is a toxicologist in the Department of Environmental Health Sciences at the University of Michigan School of Public Health, with a research focus in female reproductive toxicology and, in particular, mechanisms of toxicity related to adverse pregnancy outcomes such as premature birth.

Alan Burton (Faculty Advisor) is a Professor at the School for Environment and Sustainability and the Department of Earth and Environmental Sciences at the University of Michigan. His research focuses on sediment and stormwater contaminants and understanding contaminant bioavailability processes, effects, and ecological risk at multiple trophic levels. He is also a specialist in ranking stressor importance in human-dominated watersheds and coastal areas.

Data Learning in Livestock Ecologies

Daniel Iddrisu (Fellow) is a second-year student in Masters in International and Regional Studies, with a specialization in Africa, at the University of Michigan. He earned a BA degree in Integrated Community Development from the University for Development Studies, Tamale, Ghana. His research focuses on health, development, gender, and environmental health. The case he is working on takes place on the Greek Island of Naxos, but comprises skills for modeling and analyzing human/livestock interactions more broadly:

Johannes Foufopoulos (Faculty Advisor) is an Associate Professor at University of Michigan’s School for Environment and Sustainability, who focuses his lab research on fundamental conservation biology questions and on issues related to the ecology and evolution of infectious diseases. Major research projects examine how habitat fragmentation, invasive organisms, and global climate change result in species extinction.

Data Learning on Safari

Rahul Agrawal Bejarano (Fellow) has a background in computer science and he is currently working on a master’s degree at the University of Michigan School of Environment and Sustainability, with a concentration in Sustainable Systems. Rahul uses data from a diverse range of sources to shed light on today’s environmental challenges and develop innovative solutions, and is working on identifying climate-related vulnerabilities to our supply chains. He is working on this case, about the interactions of various wildlife species in the Serengeti:

Charles Willis (Faculty Advisor) is a Teaching Assistant Professor, Biology Teaching and Learning at the University of Minnesota. He is currently interested in the research and development of pedagogy practices for non-major biology students. In particular, he is focused on studying student-student and instructor-student feedback in online spaces. His research is also concerned with understanding how changing environments shape plant diversity on both evolutionary and ecological time scales. Currently, he is focused on using historical specimen data to study how historic climate change (over the past century) has impacted plant phenology and diversity across North America.

Jeffrey A. Klemens (Faculty Advisor) is an Assistant Professor of Biology at Thomas Jefferson University, where he serves as program director for the undergraduate biology curriculum. His current research activities are focused on the use of agent-based models to describe habitat use by organisms in the urban environment and the role of active learning in science education, particularly the use of systems thinking and other modeling techniques to improve student understanding of complex phenomena.

Data Learning in Detroit’s Eastern Market

Ghalia Ezzedine (Fellow) is a second-year master’s student studying Health Informatics. She is interested in leveraging data and digital tools to improve population health. In her free time, she likes to try new recipes, work out, and occasionally jump off a bridge or airplane. She chose this case study because of her interest in nutrition, and the shift in foods available at this iconic marketplace:

Josh Newell (Faculty Advisor) is an Associate Professor in the School for Environment and Sustainability at the University of Michigan. He is a broadly trained human-environment geographer, whose research focuses on questions related to urban sustainability, resource consumption, and environmental and social justice. His research approach is often multiscalar and integrative and, in addition to theory and method found in geography and urban planning, he draws upon principles and tools of industrial ecology and spatial analysis.

Profile: Crystal Lu

Nitrogen reduction in the Upper Mississippi River Basin

By Katie Naum

As extreme climate events become more frequent, some of their impact is visible—like the derecho that tore through Iowa in August 2020, leaving a wake of destruction in its path. Other impacts—including nutrient pollution in water systems—are less understood. In what ways will climate change affect the world around us? How can we use data science to better understand and adapt to the impact of climate extremes? 

Chaoqun (Crystal) Lu portrait
Chaoqun (Crystal) Lu

Chaoqun (Crystal) Lu is a quantitative ecosystem ecologist and assistant professor at Iowa State University, and a collaborator of the Midwest Big Data Innovation Hub. Her work focuses on water quality modeling, including the impact of extreme climate events and human activities on nutrient pollution. Her recent NSF CAREER award is titled “Understanding the dynamics and predictability of land-to-aquatic nitrogen loading under climate extremes by combining deep learning with process-based modeling”. The project will bridge the gaps between science and practice, sharing the most current knowledge of Earth system modeling to the public and making the complex concept of watershed management more concrete for the next generation of scientists, land managers, policy makers, and voters.

I spoke with Lu recently via Zoom to learn more about her work with water quality data. The following conversation has been edited and condensed for clarity.

Why is it important to study water quality here and now?

In the United States, nearly 60% of coastal rivers and bays have been degraded by nutrient pollution. Here in the Midwest, people have invested a lot of money and effort over the years to reduce nitrogen pollution. At the same time, climate-driven variations may far outweigh the effects of these nitrogen reduction practices. Increasing summer humidity, more frequent heavy rainfalls, and extreme floods have become a new normal in the central United States over the past few decades. There are a lot of unknowns about how extreme climate events have affected nitrogen leaching from soil and nitrogen loading through tiles, streams and rivers. Lots of data exist, though! 

Policymakers need science-based management suggestions. As a researcher, I would like to benchmark my model with long-term measurements of water quality, and scale up from site-specific measurements to a broader region such as the Upper Mississippi River Basin. If we can figure out how to reduce nitrogen pollution here in the Midwest, the solution we come up with will be very likely to be effective elsewhere. 

Can you tell readers more about the focus of your work, including your recent NSF CAREER award? (Congrats!)

I’m engaged in water quality modeling projects—studying, for example, the impact of nitrogen reduction practices on water quality. Our research team uses mathematical models to represent the physical processes involved in connected systems—the flow of water, the amount of nutrients used by plants or lost to runoff. We also quantify how climate change, land uses, and human management practices could affect nitrogen loading, and assess the effectiveness of nitrogen reduction practices in cleaning water.

The focus of this CAREER award is on how extreme climate events may affect nitrogen loading. My team wants to see how sensitive nitrogen leaching and loading are to events like these, which are increasing in the Midwest. We’re integrating machine learning approaches with a traditional process-based hydroecological model, using a large volume of water quality monitoring data that drains from various sized watersheds in the upper Mississippi–Ohio river basin. I want the key processes represented by traditional process-based models to be kept for water quality prediction, and at the same time improve the models’ outputs with “big data” and machine learning. Our integrated model uses data on water quality, weather, land cover, and human management practices, to better understand whether and where there are nitrogen pollution hotspots in the region. 

What are some of the challenges in working with water data? What are the insights you hope to gain from your research?

One important challenge is just the enormous amount of variation in the data. If you look at a time series for hydrological flow, you see huge variation in the relationship between flow and nitrogen concentration. The challenge we have is to quantify how varied and why. Why do some small watersheds have larger variations than others? Why are some regions more sensitive to climate than others? Is this pattern we’re seeing caused by a specific event, or the legacy of many such events over time? We want to get the whole picture on nitrogen dynamics, from vegetation to soil to water to rivers, from small to large watersheds, at daily time steps, using modeling to recreate such processes.

In our work under this award, we’re planning to include more small watersheds and high frequency data sets. I’m looking forward to new insights from such data analysis. There is so much data over the past few decades to work with, and the technology of water quality monitoring has really improved.

How does deep learning contribute to watershed management?

Deep learning has been transformative for hydrological science and earth system science, yet few studies have used it to digest the big data of water quality monitoring. Meanwhile, high-frequency water quality monitoring data are increasingly available, especially in smaller watersheds and at shorter time scales. This brings new opportunities to test the relationship between flow and nitrogen concentration in response to climate extreme events. All of this motivates me.

Do you consider yourself a data scientist as well as an ecologist? 

I consider myself an ecosystem ecologist, with data science skills. The question I want to find answers to are mostly ecological questions. Sustainability science, biogeochemical cycles, climate variability, natural and human drivers—these are all ecology questions. I say this even though I received training in ecosystem modeling and geospatial analysis for many years—but I consider these tools, the same way I consider machine learning a tool. I always keep my eyes open for tools that can help answer the ecological questions I care about. I tell my students this too: even if their degree or job title says ‘ecosystem modeler,’ I always hope they will step back and see the big picture.

How might interested stakeholders learn more or get involved?

We’ll be developing a project webpage where we will release research findings, future publications, and other relevant materials. Our results will be presented and disseminated to interested stakeholders through our collaborating institutions—not only to academic investigators, but also to the general public, because they are the people who actually make decisions on managing the land and improving the environment. 

This is a very multidisciplinary project, and others may have different ways of thinking about and analyzing the problem that we haven’t considered. We would love to hear from other researchers interested in analyzing the problem from another angle. We are also working actively to seek collaborators and more grants to leverage this project, putting available data sources online to allow easy access.

What do you love most about your research?

Being a modeler is a very precious role. Through multi-scale modeling, we try to connect a lot of different people—field scientists, computational experts, engineers, economists, stakeholders, and policy makers—who can work together to understand and build a more sustainable world for us to live in. This provides a lot of opportunity to collaborate with people in different fields. As a quantitative ecosystem ecologist and ecosystem modeler, I can serve as a bridge between field scientists, extrapolating their findings, and decision makers, who want to see and understand ecological outcomes. The work is really useful and applicable in real life. I enjoy the endless possibilities and the feeling that my research is useful and applicable for our world.

Katie Naum writes on science & technology, climate change, and culture. Follow her @naumstrosity and read more at

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, the University of Michigan, the University of Minnesota, Iowa State University, Indiana University, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the NSF Big Data Hubs community.

Midwest water researchers explore COVID-19 in wastewater

This story is part of a series on coronavirus research in the Midwest region

Researchers in the Midwest are looking in a surprising place for clues about the COVID-19 pandemic: wastewater.

Because so many people who are infected with COVID-19 are asymptomatic, scientists are interested in measuring the prevalence of the SARS-CoV-2 coronavirus in wastewater as a way to understand the population-level spread of the virus in communities. In-person testing can be problematic for a variety of reasons, so researchers are interested in alternatives.

Minnesota Public Radio interviewed one research group that is exploring new ways to explore coronavirus spread without directly testing people. “We’ve decided that one of the easiest ways to do that would be to noninvasively kind of scan the population for the presence of the virus,” University of Minnesota professor Glenn Simmons Jr. said. “And one easy way of doing that would be to look at the wastewater.”

Simmons, along with his collaborator Richard Melvin at UMN Duluth, are testing samples collected from wastewater treatment facilities for the presence of genetic material from the SARS-CoV-2 virus. Other researchers in the Midwest are working on similar sample collection, data analysis, and developing new tools and resources.

One resource under development is a publicly accessible, web-based Wastewater Pathogen Tracking Dashboard (WPTD). Dr. Rachel Spurbeck, research scientist at the non-profit Battelle Memorial Institute in Columbus Ohio, leads the creation of this project.

“The WPTD program is tracking SARS-CoV-2 and other viral pathogens found in the wastewater of four different locations in Toledo, Ohio over time and comparing the sequencing results to the public health and demographic data for these sites”, Spurbeck said. “This comparison will be used to generate risk models for COVID-19 spread in the community as well as other viruses present. We will also be identifying mutations in SARS-CoV-2 which will not only tell us that the virus is in the communities being studied, but also if there are any differences in the virus that could enable identification of how the virus is affecting the population and where the virus came from geographically.”

The data collected will be entered into the Wastewater Pathogen Tracking Dashboard for use by local public health officials to aid in identifying where contact tracing will be most useful. The project is funded by the National Science Foundation (NSF).

Since March 2020, the NSF has made hundreds of new awards focused on COVID-19 research to help address the pandemic. The NSF and the four regional Big Data Innovation Hubs collaborated on the creation of the COVID Information Commons resource to bring together information on these projects. Researchers can use the site to help find tools and resources, and to develop collaborations with other researchers.

Other wastewater tracking projects in the Midwest include two led by Kyle Bibby, Associate Professor of Engineering at Notre Dame university in Indiana. Bibby is leading an effort to develop methods to monitor for the presence of SARS-CoV-2 in wastewater and to connect these measurements to epidemiology models. Bibby also leads a project to create a national Research Coordination Network (RCN) focused on wastewater surveillance, in collaboration with partners from Howard University, Stanford University, Arizona State University, and the Water Research Foundation.

At the national level, the U.S. Centers for Disease Control and Prevention (CDC) has announced the development of a National Wastewater Surveillance System (NWSS) that collects data from local, state, tribal, and territorial health departments to supplement the efforts above.

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other projects we should include here, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, the University of Michigan, the University of Minnesota, Iowa State University, Indiana University, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the NSF Big Data Hubs community.

Introducing the COVID Information Commons

The Midwest Big Data Innovation Hub collaborated with the other three regional Big Data Innovation Hubs and the National Science Foundation (NSF) to launch the COVID Information Commons (CIC).

Funded by NSF COVID Rapid Response Research Award #2028999, the CIC is an open website to facilitate knowledge sharing and collaboration across various coronavirus research efforts, especially focusing on NSF-funded COVID Rapid Response Research (RAPID) projects.

The CIC serves as a resource for researchers, students, and decision-makers from academia, government, not-for-profits, and industry to identify collaboration opportunities and accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic.

WATCH: The recording of our launch and demo webinar is available at as well as on YouTube.

LEARN MORESlides from the webinar are available at, below the July 15 launch + demo video. While you’re there, you can explore the live site!

JOIN THE COMMUNITY: The CIC Slack community is a space for discussion and collaboration among PIs and other stakeholders engaged in COVID research.

We will be announcing further CIC events to showcase lightning talks from 40+ PI volunteers over the next few months. If you are interested in hearing more and did not opt-in at registration for future email updates, you may sign up here.

If you have any questions, please email us at

Midwest Big Data Innovation Hub announces leadership changes

As its second year of new funding begins, there is new leadership at the Midwest Big Data Hub (MBDH), with a swap in principal investigators and the appointment of a new executive director. Catherine Blake, a co-principal investigator (PI) on the project, has moved into the PI role, while William (Bill) Gropp transitions to co-PI duties. Long-time Hub staff member John MacMullen was named executive director in January.

Catherine Blake

Blake is an associate professor in the School of Information Sciences (iSchool) at the University of Illinois at Urbana-Champaign, with an affiliate appointment in the Department of Computer Science. At the iSchool, she serves as associate director of the Center for Informatics Research in Science and Scholarship (CIRSS) and director of the graduate programs in information management and bioinformatics. Gropp is director and chief scientist of the National Center for Supercomputing Applications (NCSA) and the Thomas M. Siebel Chair in the Department of Computer Science at Illinois. Prior to joining the MBDH, MacMullen was a faculty member in the iSchool.

“I’m excited and honored to step into the role of principal investigator for the Midwest Big Data Hub,” said Blake. “The community developed during the first phase has made the MBDH well positioned to leverage the rapidly growing data and information collections and technologies in Phase 2 that focus on opportunities, interests, and resources that are unique to the Midwest.”

The MBDH, co-led by the NCSA and the iSchool, serves a twelve-state region that encompasses Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin. It is part of the National Science Foundation’s regional Big Data Innovation Hub (BD Hubs) program that comprises offices in the Midwest, West, South, and the Northeast. Initially funded in 2015, the second phase started in summer 2019 and will run until 2023. The goal of the MBDH awards, which will total over $4 million for both phases, is to catalyze data science efforts around important priority areas in the Midwest.

“This month we’re starting our second year of the new phase of the Hub with the launch of our Community Development and Engagement funding program,” said MacMullen. “We look forward to continuing to develop a vibrant and diverse data science community in the Midwest that includes the range of academic institutions in the region, and grows participation from nonprofits, government agencies, and industry partners.”

Priority areas for MBDH currently include advanced materials and manufacturing; water quality; big data in health; digital agriculture; and smart, connected, and resilient communities. In addition, MBDH leads cross-cutting initiatives to broaden the participation in data science education, develop cyberinfrastructure for research data management, and address cybersecurity issues around big data. MBDH engages with the BD Hubs Data Sharing and Cyberinfrastructure Working Group, the Open Storage Network, and other initiatives that foster access to research data under FAIR (findable, accessible, interoperable, reusable) principles. By leading initiatives in data science education and workforce development, the MBDH aims to increase data science capacity within the region, such as by growing a network of predominantly undergraduate institutions and minority-serving institutions.

“The MBDH is building on the momentum of its first phase by growing the stakeholder community in the Midwest,” said Gropp, who began as PI of the Hub in 2017. “At the same time, we’re actively participating in the evolution of the national data science ecosystem. I look forward to continuing to develop long-term sustainability for the Hub’s activities through strategic projects such as the COVID Information Commons collaboration between the Hubs and NSF, launching in July 2020.”

Follow MBDH on Twitter: @MWBigDataHub

The Midwest Big Data Innovation Hub was initially funded under NSF award #1550320. The current Phase 2 award is #1916613.

Midwest Big Data Hub successfully transitions to second phase with new NSF award

The National Science Foundation (NSF) this month announced the second phase of funding for the regional Big Data Innovation Hub (BD Hubs) program. Under the planned four year, $4 million award, the Midwest Big Data Hub will continue to be led from the National Center for Supercomputing Applications (NCSA) at the University of Illinois, Urbana-Champaign. The Hub’s priority focus areas will be co-led by five partner institutions in the region: Indiana University, Iowa State University, the University of Michigan, the University of Minnesota – Twin Cities, and the University of North Dakota.

First funded in 2015, the four regional BD Hubs were designed by NSF to follow U.S. Census Regions, with offices in the Midwest (led by Illinois), West (UC Berkeley), South (Georgia Tech and UNC Chapel Hill) and the Northeast (Columbia University). The Midwest Hub serves a 12-state region that encompasses Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin.

“Developing innovative, effective solutions to grand challenges requires linking scientists and engineers with local communities,” said Jim Kurose, Assistant Director for Computer and Information Science and Engineering at the National Science Foundation, which funded these awards. “The Big Data Hubs provide the glue to achieve those links, bringing together teams of data science researchers with cities, municipalities and anchor institutions.”

“The Midwest Big Data Hub has built a strong network of partners and a diverse community of stakeholders in the region,” said Bill Gropp, Principal Investigator for the award. “The Hub is well positioned to continue its record of fostering innovative partnerships and providing valued services to our stakeholders in its next phase. Our partner institutions are leaders in the region, and each brings unique strengths to the priority areas they lead.”

The Midwest Hub’s priority areas currently include:

  • Advanced Materials and Manufacturing – Led by the University of Illinois, this area focuses on next-generation materials research in a manufacturing context, and complements the 2016 NSF Big Data Spoke awards on integrative materials design (iMaD) to Northwestern University, the University of Chicago, the University of Illinois, University of Wisconsin – Madison, and the University of Michigan, as well as leveraging existing partnerships with the Materials Data Facility, the nanoMFG node at UIUC, and the Center for Hierarchical Materials Design (CHiMaD) at Northwestern University, all supported by NSF.
  • Water Quality – Led by a new Phase 2 partner, the University of Minnesota – Twin Cities, this area complements the existing water cyberinfrastructure focus of the MBDH through the NSF Big Data Spoke awards made in 2018 to Iowa State University, the University of Illinois, and the University of Iowa.
  • Big Data in Health – The University of Michigan will continue to lead this area, with contributions from Indiana University, building on prior work in Phase 1 as well as the Spoke awards for the Advanced Computational Neuroscience Network (ACNN).
  • Digital Agriculture – Iowa State University will lead this area, with continuing contributions from the University of North Dakota, the University of Nebraska, the University of Illinois, and other partners, including from the 2016 Spoke award for Unmanned Aircraft Systems, Plant Sciences and Education (UASPSE), to continue to build a vibrant stakeholder community engaged with transdisciplinary issues around data for agriculture, food production and plant and animal science.
  • Smart, Connected, and Resilient Communities – Led by Indiana University with contributions from Iowa State University, the University of Michigan, and the University of Illinois, this area continues to build a network and connect resources at the intersection between research and data-driven community decision-making.  

“By catalyzing partnerships that integrate academic researchers into the fabric of communities across the U.S., we can accelerate and deepen the impact of basic research on a range of societal issues, from water management to efficient transportation systems,” said Beth Plale, one of the National Science Foundation program directors managing the Big Data Hubs awards.

The Midwest Hub also leads cross-cutting initiatives for broadening participation in data science education, cyberinfrastructure for research data management, and cybersecurity issues around big data. MBDH participates in the BD Hubs Data Sharing and Cyberinfrastructure Working Group, the Open Storage Network, and other initiatives that foster access to research data under FAIR (findable, accessible, interoperable, reuseable) principles. By leading initiatives in data science education and workforce development, the MBDH aims to increase data science capacity within the region, in part through a growing network of Predominantly Undergraduate Institutions and Minority Serving Institutions.

The Midwest Big Data Hub was initially funded under NSF award # 1550320. The phase 2 award is # 1916613.

Explore the Hub at

Learn more about the BD Hubs ecosystem at

The MBDH project office is housed at the National Center for Supercomputing Applications (NCSA), which provides computing, data, networking, and visualization resources and expertise that help scientists and engineers across the country better understand and improve our world. NCSA is an interdisciplinary hub and is engaged in research and education collaborations with colleagues and students across the campus of the University of Illinois at Urbana-Champaign.

For interview requests, general questions, copyright permission and B-roll inquiries contact:

National Science Foundation (NSF) media contact:


Midwest Big Data Hub co-leads local events for 4th Annual Global Women in Data Science Conference

The Midwest Big Data Hub co-led local participation in the 4th annual Global Women in Data Science (WiDS) Conference, with sponsorship from the National Center for Supercomputing Applications (NCSA) and the University of Illinois. The event was free and open to all. The WiDS Conference, hosted on March 4th at 150 locations around the world, seeks to unite and connect women working in data science fields.

“We were very excited to co-sponsor this with NCSA, and support this inaugural Illinois event for Stanford’s Global Women in Data Science Day,” said Melissa Cragin, Executive Director of the Midwest Big Data Hub. “Partnering with others on events such as the Illinois WiDS allows us to best use our human resources and experts network to broaden participation in data science and Big Data research and education. I was honored to participate and have the opportunity to moderate such a terrific panel of accomplished leaders, who shared their perspectives on data science, data-enabled research, and opportunities for women in this space.”

panel discussion
Faculty panel moderated by MBDH Executive Director Melissa Cragin

The WiDS local events, hosted this year at NCSA, featured a variety of speakers from diverse backgrounds presenting sessions on opportunities for women in data science, technical vision talks, and the variety of data science and technology careers available in the Midwest.

“I always enjoy telling my story about how I got started working big data research,” said Ruby Mendenhall, Illinois Professor of Sociology and African-American Studies and NCSA faculty affiliate. “My story also demonstrates the importance of doing outreach to groups that are not traditionally represented in data science such as African American Studies.”

As part of her 2017-2018 NCSA Faculty Fellowship, Mendenhall and NCSA research programmer Kiel Gilleade completed a pilot study called the Chicago Stress Study that examines how the exposure to nearby gun crimes impacted African American mothers living in Englewood, Chicago. Mendenhall and Gilleade developed a mobile health study which used wearable biosensors to document 12 women’s lived experiences for one month last fall. As part of their research, Mendenhall, Gilleade, and their team were able to create an exhibit based on the study data they collected in order to bring the unheard, day-to-day stories of these mothers to life.

panel discussion
Panel discussion moderated by iSchool Professor Catherine Blake

Professor Donna Cox, Director of NCSA’s Advanced Visualization Lab, was a panelist at this year’s local conference, and praised the insights of the other speakers while emphasizing the importance of the larger WiDS conference. “It was valuable to hear other panelists,” said Cox. “The future of Women in Data Science should include raising awareness about important issues emerging in data science, especially socially-relevant issues. We need more women actively involved in the ethics of data science.”

Alice Delage, Associate Project Manager for NCSA and Program Coordinator for the MBDH, said, “Hosting WiDS Urbana-Champaign at Illinois was an opportunity to highlight the campus expertise around data science led by women.” Delage, who co-chairs the local Women@NCSA group, said, “Data science and technologies are increasingly impacting our lives and society, and it is imperative that women and minorities be part of these transformations. We wanted to showcase the groundbreaking work being done in that area by Illinois female data scientists and to inspire more women and underrepresented communities to engage in the field.”

There are also opportunities to expand the event next year by better incorporating student work in the program, Delage said, or running a datathon, for example. Some of this year’s participants have already volunteered to help with next year’s event.

A full list of this year’s speakers at the WiDS Conference at NCSA is here. For more information about the global WiDS conference and ways to get involved, please visit

The MBDH is one of four regional Big Data Innovation Hubs with support from the National Science Foundation (award # 1550320), and works to build capacity and skills in the use of data science methods and resources in the 12-state U.S. Midwest Census region. Learn more about the Hub at

Thanks to NCSA Public Affairs for contributing to an earlier draft of this post.

BD Hubs profiled in SIGNAL magazine

The NSF-funded Big Data Innovation Hubs were highlighted in a recent article in SIGNAL magazine, a publication of the AFCEA (Armed Forces Communications and Electronics Association). The Executive Directors of the Midwest and Northeast Big Data Hubs, Melissa Cragin and René Baston, were quoted extensively from interviews that covered the wide-ranging communities and activities of the Hubs. Here is an excerpt from the article:

“[W]hile we’re called the Big Data Innovation Hubs, we’re very focused on building capacity in data science, building expertise, access to data-related services and networks related to all things data science,” said Cragin.

That means making available “to all kinds of communities” access to data-related skills, services, tools and opportunities, Cragin states. By developing public/private partnerships and working with groups to leverage these resources, the hubs can help coordinate solutions to “shared grand challenges,” she notes. The hub also is endeavouring to extend data science research and education to predominantly undergraduate institutions—including minority-serving institutions—to help add data skills for the developing workforce, she states.

The regional aspect allows each hub to identify priority areas or “spokes” that they are pursuing. For the Midwest, issues relating to water quality; digital agriculture and unmanned aerial systems; and food, energy and water, among others, play a major role.

Read the full article here.

Guest post – Diverse programs from ISU address sustainable cities challenges

By Iowa State University’s Sustainable Cities team

Researchers with the Sustainable Cities team at Iowa State University recognize the difficulty that public officials face in transforming vast amounts of climate and energy research into contextualized public policy. In attempting to address this critical issue, the team’s mission goes beyond the creation of new climate analysis tools to also investigate new methods for integrating communities into the discourse of data creation and energy conservation. To accomplish this agenda, our team engages in various research avenues that range from the creation of new spatial-data tools to enabling community youth activism. Here are just a few highlights of the team’s most recent achievements:

Sustainable Cities’ team leader Ulrike Passe, associate professor of architecture, presented our hybrid physics data modeling framework at the National Science Foundation-sponsored Research Coordination Networking (RCN) workshop held at Carnegie Mellon University on May 17, 2018. The presentation, which capstones one of the major branches of the Sustainable Cities initiatives, demonstrated the integration of our recently developed thermo-physical data simulator with our research into human energy-use behavior to demonstrate how a more holistic neighborhood energy model could be constructed. This same model was presented by graduate research assistant Himanshu Sharma at the fifth High Performance Building’s Conference on July 9, 2018, at Purdue University.

image from Krejci et al. (2016)

The Community Growers Program, a public-engagement initiative started back in March of 2017, has become another core pillar of the Sustainable Cities group research. Spanning a course of eight weeks, researchers worked with 22 leadership-minded youth in the Baker Chapter of the Boys and Girls Club at Hiatt Middle School in Des Moines, Iowa, to create a community garden based on a methodology of spatial, socio-technical storytelling. Through this process, the youth participants were able to learn more about their community through access to geographic information system (GIS) and spatial mapping tools. Associate English professor Linda Shenk, our community engagement lead, and Mallory Riesberg, a collaborator with the Baker Chapter of the Boys and Girls Club, presented this methodology in a presentation titled, “Fostering the Next Generation of Big Data Scientists and Sustainable City Planners” at The Growing Sustainable Communities Conference in Dubuque, Iowa, on Oct. 4, 2017. Team members Linda Shenk, Passe and Alenka Poplin, assistant professor of community and regional planning, would later be published in the 35th Journal of Interaction Design and Architectures for the inclusion of this work in their entry, titled, Engaging Youth with Pervasive Technologies for Resilient Communities.

Poplin, an established researcher in the field of geo-spatial mapping, also leads a research group that seeks to understand how to better develop feedback loops through innovative user-interfaces. An inquiry into mapping places of emotional power was highlighted in a 2017 paper entry to the second edition of Kartographische Nachrichten on Empirical Cartography Journal, titled, “Mapping Expressed Emotions: Empirical Experiments on Power Places.” More recently, Poplin and her researcher team have begun testing an energy survey game they have developed called E-Footprints. The framework of this game includes the extraction of user-performance data to measure and analyze what learning opportunities may help guide more environmentally efficient decision making. This feedback is then generated back into learning mini-games throughout the game, such that the user gets more “energy savvy” as they play. This project begins field-testing in November 2018.

With a diverse, multifaceted research team of nearly 50 members, the Sustainable Cities group continues to advance the capabilities of communities and cities to think sustainably about a better future.


Image reference:

Krejci, C. C., Passe, U., Dorneich, M. C., & Peters, N. (2016), “A Hybrid Simulation Model for Urban Weatherization Programs”, Proceedings of the 2016 Winter Simulation Conference, Arlington, VA, December 11–14. T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds. (pdf)


Read more about the MBDH’s Smart, Connected, and Resilient Communities initiatives.

Guest post – Data Science Education at Two-Year Colleges

By Matt Fall

Executive Director, Center for Data Science, Lansing Community College

Recently, the American Statistical Association (ASA), with support from the National Science Foundation (NSF), hosted a two-day summit in Washington D.C. to discuss outcomes and curricula for data science programs at two-year colleges. The Two-Year College Data Science Summit (TYCDSS) was intended to help spur the growth of data science programs at these institutions and included representatives from two and four-year institutions, government, and industry.

Sallie Keller (Virginia Tech) plenary talk (photo: Nicholas Horton)

The summit included several plenary talks discussing the role of two-year colleges in addressing the need for data scientists as well as a brief presentation from a graduate of a community college data science program. The majority of the summit, however, was devoted to a series of working sessions where the participants discussed ideal outcomes and competencies for three categories of students:

  • Category 1: students intending to complete an Associate’s degree and begin working
  • Category 2: students intending to earn an Associate’s degree and transfer to a 4-year program
  • Category 3: students seeking a certificate

The working discussions provided an opportunity for the summit participants to discuss what was expected and feasible for a student from each category to complete. The discussions were captured by a designated writing group and there will be a forthcoming write-up summarizing the recommendations of the summit participants with guidelines for two-year college data science programs.

This summit was particularly timely for my colleagues at Lansing Community College (LCC) as we have recently begun development of a data science program. Prior to the summit, participants were provided access to a list of resources that included relevant research, reports from related workshops, and sample syllabi. Of particular interest to us, as we design the layout of our program, were the Park City Math Institute’s Curriculum Guidelines for Undergraduate Programs in Data Science (2016) [PDF], the Oceans of Data Profile of the Data Practitioner (2016), and the Oceans of Data workshop report on Building Global Interest in Data Literacy (2016). The resources provided, candid discussions with other two-year colleges regarding their programs, and the discussions about realistic competency expectations were also of interest and informative to our program design.

The intent of the TYCDSS directly supports the MBDH’s priority area of interest in data science, education and workforce development. Two-year colleges provide higher education accessibility to many students who could not or would not otherwise pursue an advanced degree. An increasing number of these schools are offering certificate and Associate’s degree programs in data science and analytics to support growing workforce demand. Growth in these types of programs should naturally lead to an increase in data competency, enrollment in university programs, and larger hiring pools for data science based careers.

Related information:

Guest post – URSSI: Conceptualizing a US Research Software Sustainability Institute

First URSSI workshop attendees (Credit: Mike Hucka)

Contributed by Daniel S. KatzJeff CarverSandra GesingKarthik RamNic Weber


The NSF-funded conceptualization of a US Research Software Sustainability Institute (URSSI) is making the case for and planning a possible institute to improve science and engineering research by supporting the development and sustainability of research software in the US.

Research software is essential to progress in the sciences, engineering, humanities, and all other fields. In many fields, research software is produced within academia, by academics who range in experience and status from students and postdocs to staff members and faculty. Although much research software is developed in academia, important components are also developed in national laboratories and industry. Wherever research software is created and maintained, it can be open source (most likely in academia and national laboratories) or commercial/closed source (most likely in industry, although industry also produces and contributes to open source.)

The open source movement has created a tremendous variety of software, including software used for research and software produced in academia. This plethora of solutions is not easy for researchers to find and use out-of-the-box. Standards and a platform for categorizing software for communities are lacking, which often leads to novel developments rather than reuse of solutions. Three primary classes of concern are pervasive across research software in all research disciplines and have stymied research software from achieving maximum impact:

  • Functioning of the individual and team: issues such as training and education, ensuring appropriate credit for software development, enabling publication pathways for research software including novel methods beyond “classical” academic publications, fostering satisfactory and rewarding career paths for people who develop and maintain software, increasing the participation of underrepresented groups in software engineering, and creating and sustaining pipelines of diverse developers.
  • Functioning of the research software: supporting sustainability of the software; growing community, evolving governance, and developing relationships between organizations, both academic and industrial; fostering both testing and reproducibility, supporting new models and developments (for example, agile web frameworks, software as a service), and supporting contributions of transient contributors (for example, students).
  • Functioning of the research field itself: growing communities around research software and disparate user requirements, avoiding siloed developments, cataloging extant and necessary software, disseminating new developments, and training researchers in the usage of software.

The goal of this conceptualization project is to create a roadmap for a URSSI to minimize or at least decrease these types of concerns. To do this, the two aims of the URSSI conceptualization are to:

  1. Bring the research software community together to determine how to address the issues about which we have already learned. In some cases, there are already subcommunities working together on a specific problem, including those that we are part of, but those subcommunities might not be working with the larger community. This leads to a risk of developing solutions that solve one issue but don’t reduce (or might even deepen) other concerns.
  2. Identify additional issues URSSI should address, identify communities for whom these issues are relevant, determine how we should address the issues in coordination with the communities, and determine how to prioritize all the issues in URSSI.

We are not working in a vacuum, but with other like-minded projects. In addition to Better Scientific Software (BSSw) and activities around research facilitators (ACI-REF) in the US, there are two ongoing institutes in science gateways (SGCI) and molecular sciences (MolSSI); a recently completed conceptualization in high energy physics (S2I2-HEP); two other conceptualization projects now underway in geospatial software and fluid dynamics; and a large number of software development and maintenance projects. In the UK, the Software Sustainability Institute (SSI), which has been in operation since 2010, is an inspiration and a potential model for our work.

Given these existing activities, part of our challenge is to define how we will work with these other groups. For example, we might decide that they perform an activity so well that we should point to it, such as the SSI’s software guides. Or we might decide to either duplicate or enhance an activity they do to expand its impact, such as working with the SGCI to offer incubator services to a wider community than just gateway developers. Or we might decide to collaborate with one or more groups, such as on policy campaigns aimed at providing better career paths for research software developers in universities.

We have held one workshop and are planning three more, in addition to a community survey we plan to have out soon, and a set of ethnographic studies of specific projects. We are communicating through our website, a series of newsletters, and a community discussion site.

URSSI welcomes members of the research software community to join us, both to help us determine how to proceed and to directly contribute. Please sign up for the URSSI mailing listcontribute to our discussions, and potentially publish a guest blog post on the URSSI blog on a topic around software sustainability.

Welcome to the new MBDH Community Blog


Today we are launching a new MBDH Community Blog, which is intended to extend information sharing around events and projects, as well as expand our channels for Community conversation.

We plan to run 1-2 posts per month, and we are now seeking submissions from the MBDH Community – including the Spokes and our other collaborative projects – that describe your contributions and developments in the broader data ecosystem. Of interest are short reports and highlights from data-related meetings, events, or project outcomes, inclusive of the role and impact of the MBDH for these efforts.

We welcome contributions from the Social Sciences and Humanities, including short contributions that address data and algorithmic ethics, or coming changes for work, daily life, and public engagement in U.S data policy.

We encourage submissions from practitioner and NGO perspectives, as well as those from academia, industry, or government. We will provide additional guidelines shortly. If you are interested in submitting a Blog post, please send your contact information and the subject area to:

Our first guest post is by Daniel Katz, Assistant Director for Scientific Software and Applications at the National Center for Supercomputing Applications (NCSA). Check out his post on the US Research Software Sustainability Institute (URSSI) project.

Finally, I’ll note a couple of activities where we are currently seeking input and engagement:

Add your voice to our Midwest Big Data Hub evaluation

  • To create a robust strategic plan for the Midwest Hub.
  • To plan toward long-term sustainability, especially financial sustainability, for the Midwest Hub.
  • Provide your input here:

Participate in our election of five (5) At-large representatives for the MBDH Steering Committee:

As always, please contact us with any ideas or questions.
Thank you for your continued support!

All the best,
Melissa Cragin
Executive Director, Midwest Big Data Hub

Midwest Big Data Summer School 2018

Midwest Big Data Summer School reveals how big data can advance research efforts

By Paula Van Brocklin, Office of the Vice President for Research, Iowa State University

Iowa State University logo
The Midwest Big Data Summer School, held May 14-17 at Iowa State University, helped nearly 140 academic and industry researchers, graduate students and post-docs from nine states broaden their understanding of big data and its ability to advance their research interests. Iowa State has organized and hosted the event since 2016.

“The summer school seeks to bridge the gap between scientists and engineers using data science technology by introducing them to data science techniques and vocabulary,” said Hridesh Rajan, lead organizer of the Midwest Big Data Summer School and professor of computer science at Iowa State. “The idea is to help these individuals better communicate and leverage their data-science needs.”

The curriculum

The school’s first three days introduced attendees to a range of big data topics, including data acquisition, data preprocessing, exploratory data analysis, descriptive data analysis, data analysis tools and techniques, visualization and communication, ethical issues in data science, reproducibility and repeatability, and understanding domain/context.

On the final day, participants selected one of four tracks, which focused on a sub-area of big data analysis. The tracks were:

  • Foundations of Data Science
  • Software Analytics
  • Digital Agriculture
  • Big Data Applications

Several individuals at Iowa State were instrumental in developing and organizing the tracks’ curricula. Click here for a list of those involved.


Keynote presenters at this year’s summer school were:

  • Chid Apte, director, Mathematical Sciences and Blockchain Solutions, IBM Research
  • Tom Schenk, chief data officer, City of Chicago
  • Jacek Czerwonka, principal software engineer, Microsoft Research
  • Will Snipes, principal scientist, ABB Research

A complete list of speakers, including their bios, is available here.

Data science evolving quickly

The field of big data, also referred to as data science, is relatively new yet advancing quickly. For this reason, organizers encourage researchers and scientists to learn as much as they can through resources like the Midwest Big Data Summer School.

“Our aim is for early career researchers and professionals – both in academia and industry – to get a taste of what it’s about, what the state of the art is and how they can start thinking about using data science in their own domains,” said Chinmay Hegde, assistant professor of electrical and computer engineering at Iowa State and a co-organizer of the summer school.

Many thanks

Rajan recognizes the summer school would not be possible without the help of many.

“We are especially thankful for the Midwest Big Data Hub, the National Science Foundation, the Office of the Vice President for Research, Iowa State’s College of Liberal Arts and Sciences, and the departments of computer science and statistics for providing both funding and personnel support for this event.”

Next year
Plans are in the works for the 2019 Midwest Big Data Summer School, though no dates have been set. Rajan said more application-specific tracks may be added to next year’s curriculum. Watch the Midwest Big Data Summer School website for more details in the spring of 2019.


Reposted from Iowa State University’s Research News blog. View the original post here.

Big Data Hubs partner with NSF and JHU on new nationwide data storage network

The Midwest Big Data Hub and the three other regional Big Data Innovation Hubs are partnering with the National Science Foundation and Johns Hopkins University on development of a new nationwide research data network called the Open Storage Network. Partners include Alex Szalay, lead PI (Johns Hopkins), Ian Foster (University of Chicago), the National Data Service (NDS), and five supercomputing centers within the Big Data Hubs’ regions.

The official NSF press release is available here.

The Johns Hopkins story is here.

A story from NCSA with more details from Melissa Cragin, MBDH Executive Director and award PI, and NDS Executive Director Christine Kirkpatrick is here.

Links to partners:

Innovating in the Big Data Ecosystem: Public-Private Partnerships for a Data-enabled World

Solving complex data challenges require innovative cross-border, multi-sector partnerships

(This article first appeared in the Spring/Summmer 2018 issue of Current magazine, published by MBDH partner Council of the Great Lakes Region. There is a PDF version here. View the full issue here.)

by Melissa Cragin, Ph.D
Executive Director, Midwest Big Data Hub

Complex data challenges facing the Great Lakes region in the era of big data transcend industries, applications, and borders. While data is increasingly borderless, borders and barriers still present substantial problems to industry, academic, and government initiatives that are dependent on data policy and governance processes that structure access and use. These challenges require innovative cross-border, multi-sector partnerships that can leverage the benefits of shared high performance computing resources and cyberinfrastructure services.Read More

MBDH partners on US Ignite Reverse Pitch challenge

part of Hub’s focus on Smart, Connected, and Resilient Communities

US Ignite Hackathon
UIUC collaborators and mentors meet with HackIllinois teams on US Ignite Challenge

The University of Illinois at Urbana-Champaign (UIUC) was awarded a $20,000 grant from US Ignite to host a Smart Gigabit Communities Reverse Pitch Challenge. The MBDH, along with other local partners (see below), contributed towards matching the grant, bringing to $40,000 the total resources available to support the development of smart gigabit applications for the benefit of the local community. Read More