Skip to main content

Building an accessible agricultural data community with the National Agricultural Producers Data Cooperative

By Raleigh Butler

Romaine lettuce crop grown on a city farm in Moscow. Photo by Petr Magera.
Photo by Petr Magera/Unsplash

Entities around the world gather data focused on various aspects of agriculture. Unfortunately, this information is not always accessible or easily available for those who need it. The National Agricultural Producers Data Cooperative (NAPDC) project recognizes that agriculture is a keystone of society and a critical piece of national solutions to climate-related challenges. The NAPDC, with support from the United States Department of Agriculture (USDA), aims to enable agricultural producers to benefit from the massive amounts of data generated by members of their community. As the NAPDC site states, the goal of the project is to create a “blueprint” for a national data framework where agricultural entities “can store and share data . . . to maximize their production and profitability.”

With enough available data and methods to extract relevant information, national agricultural systems can become more efficient and profitable. The framework being developed by the NAPDC will include data from many types of agricultural contexts and agricultural institutions, first and foremost the producers that drive agricultural productivity. Making the system diverse yet robust while safeguarding farmer privacy will result in a more reliable set of data for the entire agricultural community.

The NAPDC project emphasizes providing resources to community partners through webinars and seed grants in order to “identify needs and opportunities as well as challenges in physical infrastructure, education and human resources, and critical use cases” critical to the success of a future data framework. The project recognizes that a secure framework is necessary to protect privacy and governance information; these aspects will be carefully considered. The project also recognizes the importance of land-grant institutions and agricultural extension in the successful deployment of any framework.

The NAPDC project has a seed grant program to support development of community activities, with a deadline of June 1, 2022. It will be granting 4–6 awards; complete guidelines are listed on the site here. The grants will not be limited to principal investigators at universities; rather, any institution eligible for USDA funding may apply. As stated on the website, “individuals willing and qualified to lead representation for a national or regional agroecosystem are encouraged to apply.”

“The work of the NAPDC aligns well with the Digital Agriculture community of the Midwest Big Data Innovation Hub,” said MBDH Executive Director John MacMullen. “We anticipate integrating findings from our Community Data Needs Assessment (Community DNA) activities, which are helping to understand the data needs of stakeholders across the food supply chain, with the work of the NAPDC. We also look forward to partnering with the NAPDC team on our agricultural data work with the IEEE Standards Association and other partners.”

Jennifer Clarke, lead PI of the NAPDC project and faculty at the University of Nebraska–Lincoln, hopes the project serves as an initial step towards a national framework. “This project represents the willingness of the USDA to listen to agricultural producers and support the data needs of producer communities,” said Dr. Clarke. “This project provides producers and stakeholders with a vehicle for communicating their challenges related to data, and provides educators and researchers with a vehicle for proposing solutions to these challenges.”

The NAPDC will host an All-Hands Meeting in the spring of 2023 at the University of Nebraska–Lincoln that will highlight the work of the NAPDC and discussions of specific areas for future USDA investment. Interested members of the community can sign up for the project listserv through the project website ( to receive updates about this meeting as well as project information.

Get involved

Do you have an agricultural data success story or case study to share from your organization? Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

I-GUIDE: Increasing Sustainability by Harnessing Data

By Raleigh Butler

Gravity dam in Marion County, Oregon. Photo by Dan Meyers.
Photo by Dan Meyers/Unsplash

Sustainability is not just achieved through solar panels and windmills. Of course these help, but one organization is working to tackle sustainability on a larger scale: I-GUIDE is a collaborative environment for sharing and using geospatial data. It is community-oriented and works to address sustainability challenges.

“I-GUIDE” stands for “Institute for Geospatial Understanding through an Integrative Discovery Environment.” This project is funded by the National Science Foundation (NSF) under the Harnessing the Data Revolution program. Awarded in 2021, the institute is led by PI Shaowen Wang, head of the Department of Geography and Geographic Information Science at the University of Illinois. The institute has partners from across the country, including MBDH collaborators such as EarthCube, CUAHSI, the University of Minnesota, Columbia University, and the Discovery Partners Institute.

As the I-GUIDE site states, “most challenging sustainability and resilience problems today require expertise from multiple domains and geospatial data science.” I-GUIDE acts as a main point for qualified entities to access varying types of data. For example, I-GUIDE allows other participating entities to access the data stored in HydroShare, a system from CUAHSI, the Consortium of Universities for the Advancement of Hydrologic Science, Inc. The HydroShare infrastructure can be used to share data as well as analyze and visualize those data. I-GUIDE brings together other related programs. This allows increased knowledge on the subjects of sustainability, and the supporting data. I-GUIDE currently has data being added to it in the fields of water, geospace, geography, and the atmosphere.

“The institutional collaborations facilitated by this project will enable the I-GUIDE team as well as the broader community to explore a wide range of interdisciplinary science questions that leverage an interconnected network of software and cloud infrastructure,” said Dr. Anthony Castronova,
Senior Research Hydrologist at CUAHSI. “These types of institutional connections are critical to support water science research around pressing environmental issues that require modern software, data, and modeling approaches.”

Environmental issues often present themselves in one way (e.g., a drought) when the problem at hand is much larger than the assumed cause (a lack of rainfall). As the climate changes, droughts and other environmental changes can become increasingly harmful to current ecosystems. HydroShare cultivates collaboration in water-focused areas such as drought conditions, water quality, temperature, and soil moisture. These data act as the first step to help promote sustainability and resilience.

I-GUIDE holds regular webinars. The first in the series, held on March 23, 2022, explored the need for geospatial education when sustainability is growing more important every day. Led by Eric Shook from the University of Minnesota, the webinar emphasized the need for building diverse communities of instructors and learners to build best practices for cyberinfrastructure (CI) literacy, and lower the barriers for learners new to CI.

“The Midwest Big Data Innovation Hub is pleased to be a partner on the I-GUIDE project,” said MBDH Executive Director John MacMullen. “This is a diverse and talented team that will have important impacts on key areas of focus for the MBDH, including water data, CI workforce development, and data-enabled resilient communities.”

“MDBH is a great example of how our I-GUIDE Partners are organizations and institutions that share common goals and objectives,” said George Percival, co-lead of I-GUIDE’s Engagement and Partnership Team. “The I-GUIDE Partnership Program provides the pathway for Partners to contribute to and gain from the I-GUIDE activities based on mutually beneficial agreements. As the MBDH objective “to build and cultivate communities around data” is highly aligned with I-GUIDE, it is anticipated that the MBDH and I-GUIDE partnership will benefit both activities.”

If you’re interested in getting involved with I-GUIDE, please take a look at their News & Events page. The site often lists such events as webinars and symposiums. The I-GUIDE team held its first All-Hands Meeting in May 2022.

Get Involved

Activities to build the community of Midwest researchers and practitioners in the Smart & Resilient Communities priority area of the Midwest Big Data Innovation Hub are continuing throughout 2022. Contact the Hub if you’re interested in participating, or are aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Exploring Nature Through Imageomics with Professor Tanya Berger-Wolf

By Erica Joo and Qining Wang

We recently spoke with Professor Tanya Berger-Wolf, a pioneer in the area of imageomics who is leading a team to start a new field of imageomics. She is a computational ecologist who is director and co-founder of the nonprofit organization “Wild Me.” Berger-Wolf is also the Director of the Translational Data Analytics Institute (TDAI) and a Professor of Computer Science Engineering, Electrical and Computer Engineering, as well as Evolution, Ecology, and Organismal Biology, at The Ohio State University.

Tanya Berger-Wolf

Observation is fundamental to any biological research. The development of optics technology, such as the inventions of the microscope and the telescope, allowed biologists to observe the world at different scales, from animals living in jungles of millions of acres to DNA in animal cells of several micrometers.

However, as Prof. Berger-Wolf pointed out, those inventions only serve to “augment our ability to look” or “look at more things more carefully.” We are still making observations and searching for patterns with our own eyes, from which arises the caveat: We are not so good at finding patterns when things appear to be random, or when patterns are rare, sparse, subtle, or complex. We can’t answer, for example, whether the stripe patterns of mother zebras are similar to their babies’. The patterns appear to be too similar and too random at the same time to our eyes because human brains did not evolve to “take [the stripe patterns] holistically and quantify them in any meaningful way.”

And that’s where imageomics comes in. Imageomics is following genomics, a field where researchers understand the biology of an organism or a species through their genetic information. In a similar vein, imageomics aims to understand nature through biological information extracted from images.

Computers are the perfect information extractors, because they “perceive” the world differently. Computers can quantify images down to pixels and find patterns that humans do not, or cannot, comprehend. Berger-Wolf pointed out that imageomics, as a “whole new field of science,” allows scientists to answer biological questions that weren’t answerable before because it provides scientists with a new way of observing nature.

The complementary vision of computers is especially prominent in the studies of biological traits, according to Berger-Wolf. Biological traits are the interplay between genes and the environment. They can be physical characteristics such as “beak colors, stripe patterns, fin curvatures, the curves of the belly or the back.” They can also be behavioral characteristics such as possums playing dead or pollen feeding in birds. Being able to observe traits “is the foundation of our understanding of how these traits are inherited and the understanding of genetics,” insights into animal behavior, and ecological and evolutionary theories.

In order for biologists to propose new evolutionary hypotheses to explain biological traits, it is crucial to “make these traits computable.” Starting from a project funded by the National Science Foundation, Berger-Wolf founded Wild Me. This nonprofit organization has an ongoing initiative, Wildbook, that collects images containing animals from numerous sources, including camera traps, drones, and even tourists’ social media posts on YouTube, Instagram, and Flickr.

Those source images serve as a starting point for a branch of research in imageomics, which will allow researchers to develop open software and artificial intelligence for the research community. Those tools would allow biologists to discern biological traits that are too similar or too subtle to their eyes, such as animal coat patterns or species that look alike yet are genomically different. Computer vision would allow scientists to find out whether traits are inheritable or shared by multiple species. Based on those new insights, biologists could then conjure new evolutionary hypotheses and start asking even more interesting questions, to which only imageomics can provide the answers.

Berger-Wolf jokes that she has “multiple research personality,” with a passion for bringing her diverse backgrounds together. By helping to found the new Imageomics Institute, her interests were able to converge. Participating in both worlds—natural and technical—allows her to see “the better way” of working and increasing effectiveness.

She commented that starting conversations between fields increases “mutual respect and understanding of each other’s questions and where we can come together.” Berger-Wolf sums up her career by describing her work as “creating tools that expand our ability to look at more things more carefully and even be able to ask questions that people have never been able to ask before.”

Berger-Wolf is currently working on several projects. One looks at animal coat patterns and correlates them with genetics, heritability, and the overall scientific structure of why some traits are inheritable and others are not. By using imageomics, we are able to understand at a deeper level since humans cannot pay attention to every detail. In another project, she is working on species-level traits of butterflies that mimic other species. Computer algorithms can identify what is similar and different in their appearances, down to the small details. Computers can extract complex information and people can start asking different questions using information normally beyond the scope of human perception.

Berger-Wolf’s recent award for the new Imageomics Institute under the NSF Harnessing the Data Revolution program is extending this work and bringing it to a wider audience. The images to be used as sources come from existing research projects, citizen scientists, organizations like iNaturalist, eBird, and Wild Me, as well as the digitization of the natural history museum collections through the iDigBio project.

There are various opportunities for students at any level and researchers from all over the world to participate in the field of imageomics. Berger-Wolf emphasized that the goal is to have people understand what imageomics is and how it’s significant so that it can be accessible to all.

“It’s not just an opportunity to advance science, but also to engage people in science,” she explains. Her team is built up of multiple researchers and students, sharing a goal of building a community around it. More direct community engagement, outreach events, and conferences are great ways for informing people about imageomics and how people can change the way traits are seen.

“We have incredible privilege to do science. To spend time answering scientific questions that are interesting to us while the public is paying us to do so. It’s important to tell the science to the public, communicate why, and what science brings to the world.”

Get Involved

New community-building activities facilitated by the Midwest Big Data Innovation Hub are continuing throughout 2022. Contact the Hub if you’re interested in participating, or are aware of other people or projects we should profile here. The MBDH has a variety of ways to get involved with our community and activities.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Iowa State University, Indiana University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Profile: Crystal Lu

Nitrogen reduction in the Upper Mississippi River Basin

By Katie Naum

As extreme climate events become more frequent, some of their impact is visible—like the derecho that tore through Iowa in August 2020, leaving a wake of destruction in its path. Other impacts—including nutrient pollution in water systems—are less understood. In what ways will climate change affect the world around us? How can we use data science to better understand and adapt to the impact of climate extremes? 

Chaoqun (Crystal) Lu portrait
Chaoqun (Crystal) Lu

Chaoqun (Crystal) Lu is a quantitative ecosystem ecologist and assistant professor at Iowa State University, and a collaborator of the Midwest Big Data Innovation Hub. Her work focuses on water quality modeling, including the impact of extreme climate events and human activities on nutrient pollution. Her recent NSF CAREER award is titled “Understanding the dynamics and predictability of land-to-aquatic nitrogen loading under climate extremes by combining deep learning with process-based modeling”. The project will bridge the gaps between science and practice, sharing the most current knowledge of Earth system modeling to the public and making the complex concept of watershed management more concrete for the next generation of scientists, land managers, policy makers, and voters.

I spoke with Lu recently via Zoom to learn more about her work with water quality data. The following conversation has been edited and condensed for clarity.

Why is it important to study water quality here and now?

In the United States, nearly 60% of coastal rivers and bays have been degraded by nutrient pollution. Here in the Midwest, people have invested a lot of money and effort over the years to reduce nitrogen pollution. At the same time, climate-driven variations may far outweigh the effects of these nitrogen reduction practices. Increasing summer humidity, more frequent heavy rainfalls, and extreme floods have become a new normal in the central United States over the past few decades. There are a lot of unknowns about how extreme climate events have affected nitrogen leaching from soil and nitrogen loading through tiles, streams and rivers. Lots of data exist, though! 

Policymakers need science-based management suggestions. As a researcher, I would like to benchmark my model with long-term measurements of water quality, and scale up from site-specific measurements to a broader region such as the Upper Mississippi River Basin. If we can figure out how to reduce nitrogen pollution here in the Midwest, the solution we come up with will be very likely to be effective elsewhere. 

Can you tell readers more about the focus of your work, including your recent NSF CAREER award? (Congrats!)

I’m engaged in water quality modeling projects—studying, for example, the impact of nitrogen reduction practices on water quality. Our research team uses mathematical models to represent the physical processes involved in connected systems—the flow of water, the amount of nutrients used by plants or lost to runoff. We also quantify how climate change, land uses, and human management practices could affect nitrogen loading, and assess the effectiveness of nitrogen reduction practices in cleaning water.

The focus of this CAREER award is on how extreme climate events may affect nitrogen loading. My team wants to see how sensitive nitrogen leaching and loading are to events like these, which are increasing in the Midwest. We’re integrating machine learning approaches with a traditional process-based hydroecological model, using a large volume of water quality monitoring data that drains from various sized watersheds in the upper Mississippi–Ohio river basin. I want the key processes represented by traditional process-based models to be kept for water quality prediction, and at the same time improve the models’ outputs with “big data” and machine learning. Our integrated model uses data on water quality, weather, land cover, and human management practices, to better understand whether and where there are nitrogen pollution hotspots in the region. 

What are some of the challenges in working with water data? What are the insights you hope to gain from your research?

One important challenge is just the enormous amount of variation in the data. If you look at a time series for hydrological flow, you see huge variation in the relationship between flow and nitrogen concentration. The challenge we have is to quantify how varied and why. Why do some small watersheds have larger variations than others? Why are some regions more sensitive to climate than others? Is this pattern we’re seeing caused by a specific event, or the legacy of many such events over time? We want to get the whole picture on nitrogen dynamics, from vegetation to soil to water to rivers, from small to large watersheds, at daily time steps, using modeling to recreate such processes.

In our work under this award, we’re planning to include more small watersheds and high frequency data sets. I’m looking forward to new insights from such data analysis. There is so much data over the past few decades to work with, and the technology of water quality monitoring has really improved.

How does deep learning contribute to watershed management?

Deep learning has been transformative for hydrological science and earth system science, yet few studies have used it to digest the big data of water quality monitoring. Meanwhile, high-frequency water quality monitoring data are increasingly available, especially in smaller watersheds and at shorter time scales. This brings new opportunities to test the relationship between flow and nitrogen concentration in response to climate extreme events. All of this motivates me.

Do you consider yourself a data scientist as well as an ecologist? 

I consider myself an ecosystem ecologist, with data science skills. The question I want to find answers to are mostly ecological questions. Sustainability science, biogeochemical cycles, climate variability, natural and human drivers—these are all ecology questions. I say this even though I received training in ecosystem modeling and geospatial analysis for many years—but I consider these tools, the same way I consider machine learning a tool. I always keep my eyes open for tools that can help answer the ecological questions I care about. I tell my students this too: even if their degree or job title says ‘ecosystem modeler,’ I always hope they will step back and see the big picture.

How might interested stakeholders learn more or get involved?

We’ll be developing a project webpage where we will release research findings, future publications, and other relevant materials. Our results will be presented and disseminated to interested stakeholders through our collaborating institutions—not only to academic investigators, but also to the general public, because they are the people who actually make decisions on managing the land and improving the environment. 

This is a very multidisciplinary project, and others may have different ways of thinking about and analyzing the problem that we haven’t considered. We would love to hear from other researchers interested in analyzing the problem from another angle. We are also working actively to seek collaborators and more grants to leverage this project, putting available data sources online to allow easy access.

What do you love most about your research?

Being a modeler is a very precious role. Through multi-scale modeling, we try to connect a lot of different people—field scientists, computational experts, engineers, economists, stakeholders, and policy makers—who can work together to understand and build a more sustainable world for us to live in. This provides a lot of opportunity to collaborate with people in different fields. As a quantitative ecosystem ecologist and ecosystem modeler, I can serve as a bridge between field scientists, extrapolating their findings, and decision makers, who want to see and understand ecological outcomes. The work is really useful and applicable in real life. I enjoy the endless possibilities and the feeling that my research is useful and applicable for our world.

Katie Naum writes on science & technology, climate change, and culture. Follow her @naumstrosity and read more at

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, the University of Michigan, the University of Minnesota, Iowa State University, Indiana University, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the NSF Big Data Hubs community.

Introducing the COVID Information Commons

The Midwest Big Data Innovation Hub collaborated with the other three regional Big Data Innovation Hubs and the National Science Foundation (NSF) to launch the COVID Information Commons (CIC).

Funded by NSF COVID Rapid Response Research Award #2028999, the CIC is an open website to facilitate knowledge sharing and collaboration across various coronavirus research efforts, especially focusing on NSF-funded COVID Rapid Response Research (RAPID) projects.

The CIC serves as a resource for researchers, students, and decision-makers from academia, government, not-for-profits, and industry to identify collaboration opportunities and accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic.

WATCH: The recording of our launch and demo webinar is available at as well as on YouTube.

LEARN MORESlides from the webinar are available at, below the July 15 launch + demo video. While you’re there, you can explore the live site!

JOIN THE COMMUNITY: The CIC Slack community is a space for discussion and collaboration among PIs and other stakeholders engaged in COVID research.

We will be announcing further CIC events to showcase lightning talks from 40+ PI volunteers over the next few months. If you are interested in hearing more and did not opt-in at registration for future email updates, you may sign up here.

If you have any questions, please email us at

Midwest Big Data Hub successfully transitions to second phase with new NSF award

The National Science Foundation (NSF) this month announced the second phase of funding for the regional Big Data Innovation Hub (BD Hubs) program. Under the planned four year, $4 million award, the Midwest Big Data Hub will continue to be led from the National Center for Supercomputing Applications (NCSA) at the University of Illinois, Urbana-Champaign. The Hub’s priority focus areas will be co-led by five partner institutions in the region: Indiana University, Iowa State University, the University of Michigan, the University of Minnesota – Twin Cities, and the University of North Dakota.

First funded in 2015, the four regional BD Hubs were designed by NSF to follow U.S. Census Regions, with offices in the Midwest (led by Illinois), West (UC Berkeley), South (Georgia Tech and UNC Chapel Hill) and the Northeast (Columbia University). The Midwest Hub serves a 12-state region that encompasses Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin.

“Developing innovative, effective solutions to grand challenges requires linking scientists and engineers with local communities,” said Jim Kurose, Assistant Director for Computer and Information Science and Engineering at the National Science Foundation, which funded these awards. “The Big Data Hubs provide the glue to achieve those links, bringing together teams of data science researchers with cities, municipalities and anchor institutions.”

“The Midwest Big Data Hub has built a strong network of partners and a diverse community of stakeholders in the region,” said Bill Gropp, Principal Investigator for the award. “The Hub is well positioned to continue its record of fostering innovative partnerships and providing valued services to our stakeholders in its next phase. Our partner institutions are leaders in the region, and each brings unique strengths to the priority areas they lead.”

The Midwest Hub’s priority areas currently include:

  • Advanced Materials and Manufacturing – Led by the University of Illinois, this area focuses on next-generation materials research in a manufacturing context, and complements the 2016 NSF Big Data Spoke awards on integrative materials design (iMaD) to Northwestern University, the University of Chicago, the University of Illinois, University of Wisconsin – Madison, and the University of Michigan, as well as leveraging existing partnerships with the Materials Data Facility, the nanoMFG node at UIUC, and the Center for Hierarchical Materials Design (CHiMaD) at Northwestern University, all supported by NSF.
  • Water Quality – Led by a new Phase 2 partner, the University of Minnesota – Twin Cities, this area complements the existing water cyberinfrastructure focus of the MBDH through the NSF Big Data Spoke awards made in 2018 to Iowa State University, the University of Illinois, and the University of Iowa.
  • Big Data in Health – The University of Michigan will continue to lead this area, with contributions from Indiana University, building on prior work in Phase 1 as well as the Spoke awards for the Advanced Computational Neuroscience Network (ACNN).
  • Digital Agriculture – Iowa State University will lead this area, with continuing contributions from the University of North Dakota, the University of Nebraska, the University of Illinois, and other partners, including from the 2016 Spoke award for Unmanned Aircraft Systems, Plant Sciences and Education (UASPSE), to continue to build a vibrant stakeholder community engaged with transdisciplinary issues around data for agriculture, food production and plant and animal science.
  • Smart, Connected, and Resilient Communities – Led by Indiana University with contributions from Iowa State University, the University of Michigan, and the University of Illinois, this area continues to build a network and connect resources at the intersection between research and data-driven community decision-making.  

“By catalyzing partnerships that integrate academic researchers into the fabric of communities across the U.S., we can accelerate and deepen the impact of basic research on a range of societal issues, from water management to efficient transportation systems,” said Beth Plale, one of the National Science Foundation program directors managing the Big Data Hubs awards.

The Midwest Hub also leads cross-cutting initiatives for broadening participation in data science education, cyberinfrastructure for research data management, and cybersecurity issues around big data. MBDH participates in the BD Hubs Data Sharing and Cyberinfrastructure Working Group, the Open Storage Network, and other initiatives that foster access to research data under FAIR (findable, accessible, interoperable, reuseable) principles. By leading initiatives in data science education and workforce development, the MBDH aims to increase data science capacity within the region, in part through a growing network of Predominantly Undergraduate Institutions and Minority Serving Institutions.

The Midwest Big Data Hub was initially funded under NSF award # 1550320. The phase 2 award is # 1916613.

Explore the Hub at

Learn more about the BD Hubs ecosystem at

The MBDH project office is housed at the National Center for Supercomputing Applications (NCSA), which provides computing, data, networking, and visualization resources and expertise that help scientists and engineers across the country better understand and improve our world. NCSA is an interdisciplinary hub and is engaged in research and education collaborations with colleagues and students across the campus of the University of Illinois at Urbana-Champaign.

For interview requests, general questions, copyright permission and B-roll inquiries contact:

National Science Foundation (NSF) media contact:


Midwest Big Data Hub co-leads local events for 4th Annual Global Women in Data Science Conference

The Midwest Big Data Hub co-led local participation in the 4th annual Global Women in Data Science (WiDS) Conference, with sponsorship from the National Center for Supercomputing Applications (NCSA) and the University of Illinois. The event was free and open to all. The WiDS Conference, hosted on March 4th at 150 locations around the world, seeks to unite and connect women working in data science fields.

“We were very excited to co-sponsor this with NCSA, and support this inaugural Illinois event for Stanford’s Global Women in Data Science Day,” said Melissa Cragin, Executive Director of the Midwest Big Data Hub. “Partnering with others on events such as the Illinois WiDS allows us to best use our human resources and experts network to broaden participation in data science and Big Data research and education. I was honored to participate and have the opportunity to moderate such a terrific panel of accomplished leaders, who shared their perspectives on data science, data-enabled research, and opportunities for women in this space.”

panel discussion
Faculty panel moderated by MBDH Executive Director Melissa Cragin

The WiDS local events, hosted this year at NCSA, featured a variety of speakers from diverse backgrounds presenting sessions on opportunities for women in data science, technical vision talks, and the variety of data science and technology careers available in the Midwest.

“I always enjoy telling my story about how I got started working big data research,” said Ruby Mendenhall, Illinois Professor of Sociology and African-American Studies and NCSA faculty affiliate. “My story also demonstrates the importance of doing outreach to groups that are not traditionally represented in data science such as African American Studies.”

As part of her 2017-2018 NCSA Faculty Fellowship, Mendenhall and NCSA research programmer Kiel Gilleade completed a pilot study called the Chicago Stress Study that examines how the exposure to nearby gun crimes impacted African American mothers living in Englewood, Chicago. Mendenhall and Gilleade developed a mobile health study which used wearable biosensors to document 12 women’s lived experiences for one month last fall. As part of their research, Mendenhall, Gilleade, and their team were able to create an exhibit based on the study data they collected in order to bring the unheard, day-to-day stories of these mothers to life.

panel discussion
Panel discussion moderated by iSchool Professor Catherine Blake

Professor Donna Cox, Director of NCSA’s Advanced Visualization Lab, was a panelist at this year’s local conference, and praised the insights of the other speakers while emphasizing the importance of the larger WiDS conference. “It was valuable to hear other panelists,” said Cox. “The future of Women in Data Science should include raising awareness about important issues emerging in data science, especially socially-relevant issues. We need more women actively involved in the ethics of data science.”

Alice Delage, Associate Project Manager for NCSA and Program Coordinator for the MBDH, said, “Hosting WiDS Urbana-Champaign at Illinois was an opportunity to highlight the campus expertise around data science led by women.” Delage, who co-chairs the local Women@NCSA group, said, “Data science and technologies are increasingly impacting our lives and society, and it is imperative that women and minorities be part of these transformations. We wanted to showcase the groundbreaking work being done in that area by Illinois female data scientists and to inspire more women and underrepresented communities to engage in the field.”

There are also opportunities to expand the event next year by better incorporating student work in the program, Delage said, or running a datathon, for example. Some of this year’s participants have already volunteered to help with next year’s event.

A full list of this year’s speakers at the WiDS Conference at NCSA is here. For more information about the global WiDS conference and ways to get involved, please visit

The MBDH is one of four regional Big Data Innovation Hubs with support from the National Science Foundation (award # 1550320), and works to build capacity and skills in the use of data science methods and resources in the 12-state U.S. Midwest Census region. Learn more about the Hub at

Thanks to NCSA Public Affairs for contributing to an earlier draft of this post.

Big Data Hubs partner with NSF and JHU on new nationwide data storage network

The Midwest Big Data Hub and the three other regional Big Data Innovation Hubs are partnering with the National Science Foundation and Johns Hopkins University on development of a new nationwide research data network called the Open Storage Network. Partners include Alex Szalay, lead PI (Johns Hopkins), Ian Foster (University of Chicago), the National Data Service (NDS), and five supercomputing centers within the Big Data Hubs’ regions.

The official NSF press release is available here.

The Johns Hopkins story is here.

A story from NCSA with more details from Melissa Cragin, MBDH Executive Director and award PI, and NDS Executive Director Christine Kirkpatrick is here.

Links to partners:

Innovating in the Big Data Ecosystem: Public-Private Partnerships for a Data-enabled World

Solving complex data challenges require innovative cross-border, multi-sector partnerships

(This article first appeared in the Spring/Summmer 2018 issue of Current magazine, published by MBDH partner Council of the Great Lakes Region. There is a PDF version here. View the full issue here.)

by Melissa Cragin, Ph.D
Executive Director, Midwest Big Data Hub

Complex data challenges facing the Great Lakes region in the era of big data transcend industries, applications, and borders. While data is increasingly borderless, borders and barriers still present substantial problems to industry, academic, and government initiatives that are dependent on data policy and governance processes that structure access and use. These challenges require innovative cross-border, multi-sector partnerships that can leverage the benefits of shared high performance computing resources and cyberinfrastructure services.Read More

MBDH partners on US Ignite Reverse Pitch challenge

part of Hub’s focus on Smart, Connected, and Resilient Communities

US Ignite Hackathon
UIUC collaborators and mentors meet with HackIllinois teams on US Ignite Challenge

The University of Illinois at Urbana-Champaign (UIUC) was awarded a $20,000 grant from US Ignite to host a Smart Gigabit Communities Reverse Pitch Challenge. The MBDH, along with other local partners (see below), contributed towards matching the grant, bringing to $40,000 the total resources available to support the development of smart gigabit applications for the benefit of the local community. Read More