Skip to main content

Silos to (Eco)Systems: Open Referral’s Approach to the Resource Directory Data Commons

By Francie Fink

This story is part of a series on partnerships developed by the Midwest Big Data Innovation Hub with institutions across the Midwest through the Community Development and Engagement (CDE) Program.

Finding help for health and social services shouldn’t be as difficult as the challenges people already face. Yet, often these services are buried in dead-end mazes of incomplete or outdated information.  Imagine finding help for housing or healthcare as easily as Googling directions to a coffee shop. That’s the vision behind Open Referral’s latest project: a participatory tool kit for building community-resource directories that work—and that truly serve the public good.

The Open Referral Initiative has a clear-but-effective goal: transform access to health, human, and social services by developing data-sharing standards to make community resources a public good. If we think back to Econ 101, resource information as a public good means it’s accessible to everyone, and one person or organization’s use of it doesn’t reduce its usefulness for others; however, nobody is inclined to pay for it. This presents dilemmas. So, although there are many tactical questions pertaining to the technical details of any given resource directory, Open Referral is focusing on strategic questions of institutional design: ways in which communities can agree on a shared approach to supplying resource-directory data as an open, infrastructural resource that is accessible by all.

Greg Bloom is a visiting scholar with the Ostrom Workshop at Indiana University, and also leads the Open Referral Initiative. Bloom’s inspiration for Open Referral actually stems from firsthand experience with maintaining resource directories. As a professional, Bloom repeatedly encountered deep-rooted inefficiencies in how communities and their primary-service providers produced and shared information about services. His inquiries eventually led him to explore this “problem of the commons.”

The Ostrom Workshop, named for Elinor Ostrom, draws from the Nobel laureate’s groundbreaking research on how communities can successfully manage shared resources—often called “commons”—without centralized management. Through group-governance structures with clear frameworks, decentralized groups can sustainability manage common resources. Thinking about that problem and applying it to data, it’s easy to see how, without effective community governance, data can become unreliable, siloed, and redundant.

This problem is particularly tricky in smaller communities, but the problem is really a systemic one, as Bloom writes about in a December 2023 blog post that introduced a white paper sponsored by the Ostrom Workshop. Bloom chalks the resource-directory problem up to a lack of incentives for human-service providers to collaborate and share data—a phenomenon he calls the “resource directory anti-commons.”

As a practitioner, not an academic, Bloom began searching for practical solutions to this problem over a decade ago. Although academic literature on the “digital commons” exists, it is largely siloed in academia and university research. In other words, it’s not available—or at least not available in an accessible format—to communities themselves that can benefit from findings.  Bloom envisioned a common language—simple icons and terminology—for service providers and data intermediaries to apply data-governance frameworks in solving the digital-commons problem.

Bloom began engaging with the Ostrom Workshop in Bloomington in 2014 and five years later was invited to be a visiting scholar. Here was his opportunity to refine his vision of resource-directory information as a public good in line with Ostrom’s original theory of commons governance.

By 2023, Open Referral was prepared to engage communities with a practical tool kit. In partnership with the Midwest Big Data Innovation Hub’s Community Development and Engagement Program, Open Referral developed a participatory workbook and facilitated a pilot workshop to help communities eliminate redundancy and inefficiency in data collection. The intended users are information managers and their partners—those who do the work of data sharing and governance.

“Sometimes resources are more effectively shared as commons than privately managed—and this works best when the people who use those resources are also involved in the governance of the resources,” said Bloom. “That blending of user and producer is called co-production.”

The co-production approach paid off. Open Referral worked with a partner in Minnesota, Stratis Health, to facilitate a two-day pilot workshop in October 2024. Dozens of participants representing public, nonprofit, and for-profit organizations across the healthcare and human-service sectors worked with the Open Referral team to “draw” their own data-sharing plans. Partner organizations were challenged to answer core questions to data sharing, including Which agencies are involved with the same types of data? What services do they provide? Where are they located? When and how are they accessed? 

Pictured Left: Workshop participants with Greg Bloom (photo credit: Open Referral)

Pictured Right: Workshop participants take part in a system mapping exercise (photo credit: Open Referral)

Senka Hadzic is a Program Manager with Stratis Health, working on social determinants of health (SDOH) and opioid-overdose-prevention initiatives. Hadzic works with others at Stratis on a Shared Directory Infrastructure strategy, part of the organization’s larger initiative to improve social-needs referrals. 

“We first engaged with Greg Bloom [of] Open Referral in 2023 as a national subject matter expert on resource directories,” said Hadzic. “He helped convene a multisector work group of resource directories, health plans, health systems, and community organizations to craft an initial set of recommendations on how we as a state move toward the comprehensive and universally accessible resource directory infrastructure. That set of recommendations proposed a central coordinating role, data utility, or this shared infrastructure. The workshop held in 2024 was proposed to further flesh out what this could look like.”

Open Referral offers participants some shared terminology to describe the problem that they all experience in different ways. There are, of course, the actors that exist in any data-sharing framework: help seekers, service providers, data stewards, and referral providers. There are also key concepts to data governance: databases, service records, and resource data utilities. But perhaps most importantly, the Open Referral workbook invites participants to explore various institutional-design frameworks, ultimately choosing which best fits their own unique challenges and circumstances. 

Pictured: Status quo of ineffective and siloed resource-directory ecosystem. Source: Open Referral Community Resource Data Infrastructure Design Workbook

Pictured: Vision of a healthy resource-directory ecosystem. Source: Open Referral Community Resource Data Infrastructure Design Workbook

Data registers are official lists that are routinely monitored and verified for accuracy. A resource data utility is a designated provider of universally accessible, open data services within a specific domain that is often maintained for the good of the general public. Finally, a resource data collaborative is a group of organizations maintaining and collecting data together, resulting in higher quality and lower cost than if they had maintained a database independently.

Communities often benefit from more than one of these frameworks, with registers embedded in a resource data collective, for example. At the workshop, participants from similar service areas first drew their current data ecosystems. Then, they identified siloed areas and redrew an improved future state.

Each data-sharing ecosystem might look a little different; for instance, a group of individuals from organizations involved with food assistance (a directory of food assistance, 211, and a food pantry)  mapped out the landscape of local food-assistance providers in which a data utility maintains open-access data. Another group of community-based organizations (CBOs) relied more heavily on an open-data register. The exercise identified overlapping data collection in some cases and, in others, it even highlighted gaps in data.

“The tool kit is designed to empower groups of information intermediaries, particularly those doing the work of data management, along with their community partners who stand to mutually benefit from their work,” Bloom explained. “We want to provide clear guidance for designing collaborative conversations, asking productive questions about data governance and applying first principles without needing to learn all the jargon.”

Feedback from the Minnesota workshop was critical for the Open Referral team to iterate and improve on their workbook. By and large, the feedback from participants was positive. Said one participant, “[There were] productive and insightful conversations. [We discussed] creative solutions from different industry perspectives.” Participants in the Minnesota workshop consistently reported that the event enhanced their understanding of the resource-directory landscape in Minnesota and deepened their awareness of the challenges involved in managing resource data.

“While there are still outstanding questions that need to be addressed among the workshop participants, I am very pleased that we came out with specific outputs that we can act upon,” said Hadzic, of Stratis Health. “I also appreciated Greg’s approach to the design of this workshop by not only obtaining participant feedback before the workshop, but also taking into consideration the 2023 recommendations and our feedback as a convener on the needs of this community. His expertise and facilitation and the workshop provided us with the structure and outputs that we need in order to advance data utility in Minnesota.”

Bloom and his team are incorporating feedback to improve the tool kit, adding clearer prompts, expanding iconography, and refining examples of information flows. The workshop findings are outlined in a recent Open Referral blog post, and the workbook will be available online soon. Open Referral plans to bring this framework to more communities nationwide, addressing inefficiencies that impede service delivery.

Learn More/Get Involved:

Contact Greg Bloom if you are interested in hosting an Open Referral workshop in your own community.

Contact the Midwest Big Data Innovation Hub to learn more about other Community Development and Engagement projects we’ve supported. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Building an Open-Source Community for Hydrological Research and Education

By Francie Fink

This story is part of a series on partnerships developed by the Midwest Big Data Innovation Hub with institutions across the Midwest through the Community Development and Engagement (CDE) Program.

Water is one of our most vital resources, yet managing it has long been hindered by technological barriers. This and more makes hydrology, the study of water, a complex field. Local decision-makers need real-time insights for urban planning, flood response, and water-quality monitoring. But what happens when that data is locked behind paywalls or buried within separate research groups?

Imagine a hurricane slams into a city—think Hurricane Harvey in 2017, when 70% of Harris County, Texas was under at least 1.5 feet of water. Emergency teams scramble to get families to safety. In the aftermath, they need to assess flood damage. But without centralized, accessible data, response efforts are delayed and residents are left waiting for warnings that come too late. The consequences of inaccessible water data—whether in flood or drought—can be severe.

Though water is abundant, it’s not always easily accessible. Water supplies aren’t always resilient. Running water can carry harmful chemicals across long distances and threaten the health of vulnerable communities. Flood-related disasters have increased over 130% in the past 25 years and the frequency and intensity of droughts have ironically followed suit—up 29% over the past quarter decade.

Hydrology is also a field built on high-quality data. For example, remote sensors and river gauges track water levels. Or maybe you’ve seen how satellite imagery can almost instantaneously tell us about the turbidity, temperature, and rate of sedimentation of a body of moving water. But while data collection has evolved, access and analysis have lagged behind. Researchers and water-management professionals still face high barriers to creating predictive models and management strategies—specialized software, costly licenses, and fragmented datasets.

When a meteorologist talks about a flood warning, you don’t necessarily think about the complex system of models, sensors, and data streams behind that forecast. The raw data, like the water level of a given lake during different times of year or the flow rate of a tributary, is one part of hydrological analysis. The software and tools to integrate this data, in its many forms, is another part of the puzzle.

For too long, specialized and expensive software had made hydrological research a field with high barriers to entry. Enter HydroSuite, an open-source, browser-based tool that levels the playing field. No pricey installations, no exclusive paywalls—just a powerful, web-based platform where anyone with an internet connection can access, analyze, and visualize complex hydrological data.

Dr. Ibrahim Demir is the Michael A. Fitts Presidential Chair in Environmental Informatics and Artificial Intelligence at Tulane University and focuses on hydroinformatics research. Created by Demir and his hydroinformatics team at the University of Iowa when he was an Associate Professor with the College of Engineering, HydroSuite isn’t just another piece of software. It’s a research tool, a computation library, and a community-supported hydrological tool builder.

The platform allows easy access to real-time data, computational tools, and a collaborative space where researchers, planners, and students can model water systems, test scenarios, and share insights. A researcher might use HydroSuite as a sandbox for modeling water-management scenarios. Or, a planner might predict the impact of a particularly intense storm on a local river. They could input historical data, model and predict, and communicate their findings.

Recognizing that software alone isn’t enough, Demir partnered with the Midwest Big Data Innovation Hub (MBDH) in 2024 to supercharge HydroSuite’s impact. Partnering with the Cooperative Institute for Research to Operations in Hydrology (CIROH), Clemson University, and the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI), the team organized several workshops expanding HydroSuite’s capabilities, training new users, and fostering a growing community of hydrology innovators.

The rollout wasn’t just about building better technology—it was about getting it into the hands of those who need it. The University of Iowa team analyzed existing hydrology tools, integrated new data sources, and adapted HydroSuite to fill critical gaps. But awareness was key. So, they got to work: developing interactive tutorials on GitHub, hosting workshops, and participating in a hackathon to push HydroSuite’s capabilities to the limit.

Beginning with a review of gaps in existing hydrological analysis technologies, the team integrated new data sources into existing HydroSuite libraries. Next, the team set out to make HydroSuite known to its target users. Demir, graduate student Carlos Erazo Ramirez, and his team built out educational materials to accompany the enhanced HydroSuite library: interactive step-by-step tutorials on specific tools such as HydroLang, HydroCompute, and HydroRTC available to all on GitHub and explanations of each library of resources. They then engaged communities of users.

In May 2024, Demir and graduate student Carlos Erazo Ramirez introduced HydroSuite at the annual Developers Conference hosted by CIROH in Salt Lake City. Their workshop, attended by 35 academics and professionals, showcased HydroSuite’s real-world applications, its open-source nature, and its potential to transform water management.

Carlos Erazo Ramirez presenting a poster
Pictured: Graduate student Carlos Erazo Ramirez presents at the CIROH conference on HydroSuite

Building on that momentum, the team helped with an even bigger event: WaterSoftHack, a summer 2024 cybertraining program led by Clemson University, for hydrologists. This two-week virtual training drew 130 undergraduate and graduate students, postgrads, and researchers from around the world, immersing them in HydroSuite’s analytical tools. Student researchers from Florida State, Arizona State, Illinois State, Colorado School of Mines, Purdue, and numerous other universities joined online. Participants explored key components of the HydroSuite ecosystem—then, put their skills to the test in a hackathon. The same students used HydroSuite tools to identify innovative solutions to complex hydrological challenges with real-world impact.

Ibrahim Demir preparing for a presentation
Pictured: Demir prepares for a presentation to researchers on the use cases for HydroSuite

“HydroSuite has been a game changer for hydrology students and researchers,” said Demir. “By providing access to powerful, open-source tools, we’ve opened up new opportunities for students to engage in cutting-edge research and tackle real-world challenges. The feedback from students has been overwhelmingly positive—they’ve told us that HydroSuite has not only enhanced their technical skills but also inspired them to pursue innovative solutions in water management.”

By the end of the program, HydroSuite had evolved into a polished, user-friendly platform, backed by a robust network of researchers and practitioners. Today, civil engineering and hydrology departments nationwide have access to HydroSuite, helping students and professionals alike conduct advanced data analysis—all within a web browser.

The work doesn’t stop there. Ramirez and Demir, along with researcher Yusuf Sermet, published a peer-reviewed paper in April 2024 on HydroCompute. The team has ambitious plans for 2025 with the new funding from the NSF Pathways to Enable Open-Source Ecosystems (POSE) program: build an open-source ecosystem and community around HydroSuite; expand the library with support from the community, incorporating user feedback; and create new educational resources.

As we are able to gather more and better data about our watersheds and water systems, we also are faced with the need to process that data. HydroSuite makes “big data” more tangible. It’s enabled a more seamless roadmap from research to environmental protection and policy design. But most importantly, it’s inspired a community of data sharing and transparency.

Learn More/Get Involved

Visit the HydroSuite site for a description of its contents, an introduction to the open source softwares, references, and more. Learn more about other Community Development and Engagement projects we’ve supported. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

In Iowa: Small Towns, Big Data Potential

By Francie Fink

This story is part of a series on partnerships developed by the Midwest Big Data Innovation Hub with institutions across the Midwest through the Community Development and Engagement (CDE) Program.

Across the country, cities and towns are transforming the way they govern by prioritizing data literacy and evidence-based decision-making [pdf]. With improved protocols in place to collect and analyze data, cities can turn numbers into action for tangible results: faster emergency-response times, reduced environmental waste, and more-effective solutions to traffic. But it’s not just about efficiency; it’s about trust. By using data to set clear goals and being transparent about their progress, cities are strengthening the bond between residents and the services they rely on every day.

In rural Iowa, local leaders face a challenge that echoes across the nation: they’re stuck relying on broad state or federal datasets to design community events, services, and interventions. Without access to detailed local data, smaller towns and cities are left guessing how to best serve their residents—a frustrating gap that limits impact and innovation.

For instance, a local food bank might rely on state-level data to estimate the number of households experiencing food insecurity. But the state data may not reflect the unique challenges of that community, such as recent layoffs at a local factory or the needs of seasonal workers. As a result, that food bank might understock some foods, with real consequences for residents seeking assistance.

Iowa State University (ISU) Extension and Outreach has been working on addressing these gaps in data and equipping those who need data with the skills to use it for many years. Christopher J. Seeger is a Morrill Professor of Landscape Architecture and Extension Specialist in Geospatial Technology at ISU. In his years with ISU Extension, Seeger has worked plenty with demographic data and geospatial design, becoming an expert in cataloging large amounts of information.

In his current capacity, Seeger oversees three programs that address data literacy. One, the Community Indicators Program, works to share Census and other state data in visually compelling products for use by students, community development specialists, local decision-makers and the public throughout the state. Another, a geographic information system (GIS) training program, aims to upskill students, local leaders, and teachers in easy-to-use mapping and data-analysis tools. And finally, the Data Science for Public Good Program brings Iowa students together to develop solutions for challenges faced by local and state governments, focusing on significant social issues that resonate on a global scale today.

Seeger and others at ISU Extension began thinking about how the format of the GIS training program might extend to more general community-data-analysis needs in late 2023. They wanted to equip citizens and leaders with simple tools to collect the right data and the methods to share it back with their communities. To accomplish this, they partnered with the Midwest Big Data Innovation Hub (MBDH) through its Community Development and Engagement Program to host a three-part workshop series, geared towards nonprofits, county staff, extension staff, and department leaders in a four-county region near Dubuque, Iowa.

The workshop sessions were carefully planned and intentionally designed to lay a curriculum for years to come. “The big task at hand for us was thinking through the best modes of delivery and selecting the right tools to foster data literacy,” Seeger explained. In the end, the Extension team was very pleased with the tool kit they had put together: explainers on Tableau Public, Google Sheets, GitHub Pages, and GeoJSON.

Seeger and two other Extension specialists were able to work closely with participants. The workshop had 30 signups altogether, though six individuals completed the coursework in full. Seeger says that, in the future, the team may consider a fully online program to accommodate folks traveling long distances to learn. The smaller group format had its definite benefits, though, the team came to discover.

“We were able to understand from the participants their own needs when it comes to data collection and analysis,” said Seeger. Colleague and data analyst Bailey Hanson elaborated, “While facilitating the workshops, I learned that participants often have excellent ideas for what they want to achieve, and with a little training and hands-on guidance, they can bring those ideas to life. This experience reinforced the importance of combining training in technology and data methodology with practical support to empower participants to achieve their goals.”

While facilitating the workshops, I learned that participants often have excellent ideas for what they want to achieve, and with a little training and hands-on guidance, they can bring those ideas to life. This experience reinforced the importance of combining training in technology and data methodology with practical support to empower participants to achieve their goals.” — Bailey Hanson, Iowa State Extension

One participant was a volunteer with their local school system, interested in creating a dashboard that tracks student-truancy metrics. They hoped to be able to better demonstrate to their school committee the trends of student absences in meetings, to better develop a plan of action. Another participant wanted to better track attendance and scheduling for their local 4H chapter.

The first online session was an introduction to data visualization, and served as a way to assess the data skills that participants had coming into the series. Participants learned about which tools existed for different purposes. For instance, you might use GeoJSON to map the outline of a parade route for your city. Also at this session, the Extension team discussed with participants how to best construct Internet searches to find the data they need. Seeger and his team even touched on artificial intelligence (AI) and how it might be used with integrity.

The second, most extensive session, was fully on site. At this stage of the workshop series, participants were walked through the Tableau Public web version by Extension specialists. Throughout the session, the team worked through four relevant examples to the participants. For example, they demonstrated how a city’s tree database file could be analyzed to query attributes of different tree species, and even extended to create a map of city-managed trees.

Workshop session participants in a classroom.

Pictured: Participants learn how to analyze geospatial data in the second of three data-literacy workshops.

In the final session, participants presented what they had learned and worked on to date. By the end of the three-part workshop, almost every participant had created a working dashboard in Tableau Public. The participants were energized by their progress throughout the sessions. They were amazed at how much data is actually out there for use and impressed by how many free and open-source tools are available to create visually appealing summaries of data. “Helping the participants identify the resources for continued learning was another thing that we prioritized,” said Seeger.

A map of Iowa counties, outlining changes in K–12 school enrollment across Iowa from 2013 to 2023.

Pictured: Example visualization, created in Tableau Public, outlining changes in K–12 school enrollment across Iowa from 2013 to 2023.

The success of the workshop series was evident, and as it came to an end, the Extension team reflected on how to keep the momentum going. For one, they wanted to offer more courses on data literacy: accessing data using R, an introduction to GIS, and a more advanced version of the Tableau training. The Extension team has already been asked to deliver the workshop in another county. ISU Extension isn’t just teaching technical skills—they’re fostering a culture of evidence-based problem-solving that can shape smarter, more-responsive communities for years to come.

Learn More/Get Involved:

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to learn more about other Community Development and Engagement projects we’ve supported. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

Rethinking Waste in St. Louis: Embracing a Circular Economy Through Data

By Francie Fink

This story is part of a series on partnerships developed by the Midwest Big Data Innovation Hub with institutions across the Midwest through the Community Development and Engagement (CDE) Program.

We’ve all experienced it: that moment when a shiny new product catches our eye, and we justify the upgrade with a quick, “Well, I’ll need a replacement sooner or later . . .” Before we know it, the old item is forgotten—tossed in a drawer or discarded altogether.

This “single-use” mindset isn’t limited to phones. Every day, restaurants discard uneaten food, fast fashion fuels massive textile waste, and holiday shopping arrives wrapped in mounds of plastic. These examples represent the “linear economy”—a global system based on extracting resources, creating products, and discarding them after short-term use. It’s a cycle that drives waste, inefficiency, and environmental harm.

St. Louis has long played a pivotal role in America’s economy and history. Positioned at the confluence of the Mississippi and Missouri Rivers, it was a 19th-century trade hub and an industrial leader. As the host of the 1904 World’s Fair, St. Louis showcased groundbreaking innovations in urban design, manufacturing, and science, including sustainable landscaping techniques. Its Forest Park remains a triumph in green space preservation, spanning 1,371 acres with 30,000 trees.

Yet, St. Louis also grapples with a history of environmental injustice. Communities of color have disproportionately faced the consequences of industrial pollution, lead poisoning, and elevated asthma rates. Illegal dumping remains prevalent in neighborhoods already lacking stable access to food, compounding challenges like food insecurity and ineffective waste management.

In response to these challenges, organizations in St. Louis are redefining the city’s relationship with waste and resources. CircularSTL, a coalition led by 10 Billion Strong, earthday365, Race to Zero Waste, and other environmental leaders, is championing the concept of the circular economy—a system designed to reduce waste by keeping materials and products in use for as long as possible. Unlike the “take-make-waste” cycle, the circular economy promotes principles like repair, reuse, remanufacture, and recycle.

Globally, less than 10% of the economy is circular. But in St. Louis, CircularSTL is working to change that. Partnering with local organizations like Washington University’s Office of Sustainability, Cortex Innovation District, and many others, 10 Billion Strong is uniting independent efforts under a shared framework to tackle waste and build a more sustainable city.

10 Billion Strong, known for its global sustainability projects in 101 countries, has empowered over 25,000 people and diverted more than 45 tons of plastic waste. The nonprofit’s CEO Patrick Arnold co-founded Circular STL along with three other St. Louis-based environmental leaders, in an effort to effect change close to home. The team’s idea was to use data to better characterize not just the problem of waste generation and management in St. Louis, but also the city’s ongoing solutions.

To achieve this, CircularSTL partnered with the Midwest Big Data Innovation Hub (MBDH) through its Community Development and Engagement Program to create a circular-city dashboard. This tool uses key indicators—such as waste management efficiency, energy use, and food systems—to assess the region’s progress toward a circular economy. By visualizing the problem and highlighting ongoing solutions, the tool empowers policymakers, businesses, and residents to take informed action. 

The concepts of food waste and resource recycling differ significantly, making comparisons challenging. For example, tracking the volume of textiles recycled is quite different from measuring the weight of plastic diverted from landfills. Recognizing the need for an organized approach to address these complexities, project leaders prioritized gathering and interpreting St. Louis-specific data. They collaborated with over 50 stakeholders from local government, nonprofits, businesses, and academic partners to refine circular-economy indices and develop metrics tailored to the city. Two community meetings further identified gaps in data collection and potential sources to fill them.

The team was intentional about their approach. “Our methodology is to focus the data on solutions, not just the scope of the problem,” said Patrick Arnold, Founder and CEO at 10 Billion Strong. “With problems like food waste, we view it as a numerator and denominator challenge. The denominator represents the waste and the numerator represents the solutions. We know the denominator is huge, so we’re focusing on the numerator.”

In September 2024, CircularSTL hosted a symposium to share key findings and highlight progress. Questions like “Where is food waste being properly managed?” and “Where is e-waste being recycled?” were explored by speakers, panelists, and presenters. This progress was largely driven by the emphasis on data collection and analysis. For some organizations, this “culture of data” was a new but invigorating shift.

Pictured: CircularSTL partners present data and policy recommendations to community stakeholders.

“Taking part in the St. Louis Regional Circular Index program introduced me to a wide array of initiatives, from innovative startups to established businesses, that are all working towards sustainable resource use,” said project partner Chris Oestereich, Circular Economy Connector at Linear to Circular. “I joined the project to learn and contribute to a shared vision for a more circular and sustainable future for the region and beyond and I’m really happy that I did. I met so many passionate people and they left me excited about our shared trajectory.” 

Teresa Bradley, CEO of national nonprofit Race to Zero Waste, echoed that sentiment: “Thanks to the grant, and due to our outreach efforts, I learned there are many businesses in the St. Louis region contributing towards the circular economy. It was such a special day at the CircularSTL Symposium to learn best practices, network, and learn from each other.” 

The symposium was an incubator for new ideas. One presenting startup, Sustain-a-Plate, featured their software, which uses AI technology to streamline the inventory process for grocers. The technology takes the guesswork out of inventory management, sending alerts when products are nearing expiration and suggesting timely discounts or promotions to keep shelves fresh and waste low. It’s a shining example of how smart tracking can make a big impact.

A dedicated data science team is now finalizing the development of a community dashboard, with a preliminary version nearing release. The creation of this dashboard has involved an extensive data-collection process involving numerous community organizations, a case study in collaborative data-sharing. The dashboard will showcase and track metrics decided upon in community engagement meetings. 

Participating organizations will share, among other data points, pounds of food waste prevented, pounds of composted material, and pounds of inorganic material diverted from landfills. The project has tracked nearly 20,000 tons of materials diverted from area landfills over the past year and anticipates this number to grow significantly as additional community members submit their data. Partner organizations will update these figures annually, offering a dynamic snapshot of progress. CircularSTL hopes the public-facing tool will inspire both organizations and individuals to embrace the principles of a circular economy.

The work of CircularSTL is a testament to St. Louis’ potential to transform from a linear to a circular economy, leveraging its history of innovation and community collaboration. As the city pioneers this shift, it’s not just building a more sustainable and equitable future for its residents—it’s setting an example for communities worldwide. With continued collaboration, data-driven insights, and a commitment to rethinking waste, St. Louis has the opportunity to prove that a circular economy is not just an aspiration but an achievable reality.

Learn More/Get Involved: 

Learn more about CircularSTL by visiting the initiative’s website. To learn more about the lead organization, 10 Billion Strong, browse their impact and areas of focus here. If you are interested in learning more about the circular economy, the Environmental Protection Agency (EPA) provides basic information about the concept here.

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to learn more about other Community Development and Engagement projects we’ve supported. The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the national NSF Big Data Hubs community.

In Memoriam: Val Pentchev

Val Pentchev portrait

The Midwest Big Data Innovation Hub team is saddened to announce the passing of our longtime colleague Valentin (Val) Pentchev on December 31, 2021.

Val was most recently the PI on the MBDH partner award to Indiana University, which leads the Smart & Resilient Communities priority area. Val had a long, valued history with the MBDH, beginning in the first phase of the project when NSF initially funded the national network of four Regional Big Data Innovation Hubs.

After participating in the community in the early days of the Hub, Val was elected to the MBDH Steering Committee for the 2018–2020 term.

Val was especially generous with his time, and was committed to the success of the Hub. In addition to regular participation at Steering Committee meetings, he was always willing to join new activities to help the MBDH to grow and mature. We partnered to develop a session at Indy Big Data that was aimed across Industry, Government, and Academia. Val’s leadership and engagement with the organizers got us into the program where we delivered a comprehensive and well-received presentation. His kindness and collaboration will be greatly missed.” —Melissa Cragin, MBDH Executive Director in phase 1 of the Hub
Val Pentchev leading 2019 MBDH All Hands Meeting panel

A regular presence at the annual MBDH All-Hands Meetings, Val often served as a reviewer of student research poster submissions.

At our 2019 All-Hands Meeting in Chicago, the last in-person event sponsored by the Hub prior to the pandemic, Val co-organized and moderated one of the spotlight panels, “The ‘Smart’ Challenge: Delivering on Data-Enabled Decision-Making for Governments and Communities,” with panelists Amy Glasscock, Meera Raja, Ruby Mendenhall, and Charlie Catlett. At that meeting, Val also led a related breakout discussion with other interested participants.

2019 MBDH All Hands Meeting, with Alice Delage
Val was always an energetic and friendly presence at our MBDH meetings, and just simply a wonderful person to be around. He was faithfully involved with the Hub since the very beginning and contributed to this community in countless precious ways over the years. His loss is not only an absolute tragedy for all the many important projects he worked on, but also for all the people who worked beside him and loved him.“ —Alice Delage, Program Manager and Community Liaison for the MBDH in phase 1

In 2019, NSF awarded the BD Hubs an additional four years of support to continue regional and national data science community development. During this second phase, the MBDH continued to grow its work on the Smart & Resilient Communities theme. Val became a co-PI on the Indiana University team, and later became PI in 2021.

Val served on the Hub-wide leadership team throughout 2021, and contributed to our discussions about strategy, partnerships, and long-term sustainability.

I worked with Val from 2015 to 2021. Val was a wonderful human being. A positive coworker with contagious enthusiasm and energy that directly influenced me and others at the time at Indiana University. I have fond memories of Val and I will take the time to remember what Val has taught me over the years, primarily: passion for work and new projects and compassion for coworkers and human beings.” —Franco Pestilli, past PI of the Indiana University MBDH award

At Indiana, Val also led the Collaborative Archive & Data Research Environment (CADRE) project, of which the MBDH is a partner, and helped bring members of the academic library and research data management communities to the Hub.

2019 MBDH All Hands Meeting
“Val was a tremendous colleague. His positive attitude, passion, and commitment to his work made him stand out. He had a way of seeing the big picture and his enthusiasm was contagious. He was a remarkable human being and it was a privilege to know him.” —Lourdes Gonzalez, MBDH Site Coordinator at Indiana University

Val represented the Hub at the Midwest AI Day in-person cross-sector conference in Indiana in August 2021, bringing the MBDH story to attendees from industry, government, and academia.

In October 2021, Val co-organized and participated on a panel discussion at the online MBDH Regional Community Meeting, with a focus on community building across the Smart & Resilient Communities and Data Science for Social Good spaces, with panelists Kimberly Zarecor (Iowa State), Tayo Fabusui (University of Michigan), and moderator Anita Say Chan (UIUC). In 2022, we had planned to continue this work with Val co-leading and helping to establish new partnerships in the region.

“The MBDH will continue to build on the legacy of work that Val helped create,” said John MacMullen, MBDH Executive Director. “His goal with the Hub was to broaden the impact of data science in addressing societal challenges. Due to his dedicated engagement, we are ready to accelerate our data needs assessment and community development efforts in the Data Science for Social Good and Smart & Resilient Communities spaces across the region in 2022.”

National Science Foundation (NSF) Awards $2 Million for COVID Information Commons Extension for Pandemic Recovery (CIC-E)

New York, NY – October 5, 2021

The COVID Information Commons (CIC) project, a program led by the Northeast Big Data Innovation Hub in the Data Science Institute at Columbia University, in collaboration with the Midwest Big Data Innovation Hub, the South Big Innovation Data Hub, and the West Big Data Innovation Hub, received additional funding from the National Science Foundation (NSF) to support the COVID Information Commons Extension for Pandemic Recovery (CIC-E) proposal (NSF #2139391). This new grant will provide an additional $2 million in funding to the COVID Information Commons project through September 30, 2025.

The COVID Information Commons (CIC) was established in May 2020 via an NSF COVID Rapid Response Research (RAPID) award (NSF #2028999) to facilitate information sharing and collaboration across NSF-funded COVID research efforts. The initial focus was on compiling publicly available information from COVID-related RAPID projects funded by the various NSF Directorates in order to create an easily searchable corpus. In addition to the publicly available information, the CIC also collected self-reported information from the project leaders via a voluntary survey. A CIC research webinar series was created, featuring talks by researchers from the NSF-funded COVID RAPID research projects. The CIC Extension will extend this initial CIC effort to include all projects funded by NSF related to COVID-19 including the pandemic recovery phase. In addition, it will seek to include publicly available information on COVID-related efforts beyond those funded by the NSF.

The initial CIC effort clearly demonstrated the benefits of bringing together information about a diverse set of COVID-related projects into a single place, thereby enabling interested users to efficiently search for information and discover linkages among diverse efforts. This helped foster the creation of a CIC community of researchers and students, and helped catalyze local and global collaborations. The CIC Extension will carry forward this idea to include projects in the pandemic recovery phase, and will additionally incorporate contemporary ways of interacting with the information such as via search and discovery of linked information using semantic search methods, and the use of domain ontologies and knowledge graph mechanisms.

Broad impact is central to the idea of the COVID Information Commons, which pulls together publicly available information along with voluntary self-reported information on NSF-funded COVID-related research projects in order to enable search and discovery of information and collaborations among individual efforts. The CIC has demonstrated early successes in creating such collaborations among researchers from diverse scientific disciplines and from different parts of the country, and around the world, drawn together by their common interest in studying the COVID pandemic. By extending the CIC effort to the pandemic recovery phase, the CIC Extension will reach an even larger and more diverse community of COVID researchers and facilitate networking among researchers engaged in COVID-related research. The CIC Extension will also build upon and expand the successful CIC research webinar series and undergraduate engagement programs initiated in the initial phase of this effort. COVID researchers funded by NSF and NIH, including those newly funded through the American Rescue Plan of 2021 (ARP), will be invited to join the open CIC community and participate in collaborative webinars and events to increase researcher collaboration and accelerate COVID-19 recovery. Visit us at https://covidinfocommons.net to learn more and join the CIC community.



The Northeast Big Data Innovation Hub

The mission of the Northeast Big Data Innovation Hub is to build and strengthen partnerships across industry, academia, nonprofits, and government to address societal and scientific challenges, spur economic development, and accelerate innovation in the national big data ecosystem.

The Northeast Hub is a community convener, collaboration hub, and catalyst for data science innovation in the Northeast Region. The Hub amplifies successes of the community and shares credit across the community to encourage collaboration and mutual success in data science endeavors.

The goals of the Northeast Hub are to: build collaborations to address real-world challenges through translational data science approaches; foster innovation and scale endeavors that reflect regional interests and align with national priorities related to data science; support and promote representative community engagement/impact across all Hub activities; and increase data science capacity and talent, emphasizing underserved communities. Visit us at  https://nebigdatahub.org/about to learn more.

The COVID Information Commons

The COVID Information Commons (CIC) is an open website to facilitate knowledge sharing and collaboration across various COVID research efforts, initiated by the NSF Convergence Accelerator. The initial focus of the CIC website was on NSF-funded COVID Rapid Response Research (RAPID) projects. The CIC serves as a resource for researchers, students and decision-makers from academia, government, not-for-profits and industry to identify collaboration opportunities, to leverage each other’s research findings, and to accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic.

The CIC community is a dynamic, collaborative community of over 1,500 researchers, practitioners and students working on COVID-19 research and insights to enable pandemic recovery and mitigation. The entire CIC community is invited to monthly CIC PI lightning talk webinars, which have attracted over 835 participants from the CIC launch webinar in July 2020 through September 2021. The monthly CIC webinars have featured 78 PI lightning talks which are individually available on demand on the CIC website on the “Meet the Researchers” page. The full recordings of all monthly webinars are also available in the CIC Video Library on the CIC website. COVID researchers find research collaborators by participating in the live webinars and by watching recordings through the CIC portal. The addition of more researchers, research, publications, datasets, and metadata will further accelerate and increase collaboration on COVID research, through the CIC-E funded by NSF.

Upcoming COVID Information Commons Events

Every month, the CIC brings together a group of researchers studying wide-ranging aspects of the current pandemic, to share their research and answer questions from our community. Attend this event to learn more about their ongoing efforts in the fight against COVID-19, including opportunities for collaboration. Register here for your unique Zoom link and calendar information.
 



Media Contacts

Florence Hudson
Executive Director, Northeast Big Data Innovation Hub
Email: florence.hudson@2417@columbia.edu

Lauren Close
Operations & Communications Manager, Northeast Big Data Innovation Hub
Email: lc3460@columbia.edu

Sign up for the COVID Information Commons newsletter to receive future updates, including event notifications and program announcements.

Integrating Regional Water Quality Data with the Upper Mississippi Information System (UMIS) Project

By KJ Naum

Photo of the Mississippi river near Fort Snelling & Minnehaha, Minnesota
Photo by Mathew Benoit on Unsplash

As the Mississippi River flows from its source in northern Minnesota to its mouth on the Louisiana coast, its waters cross the boundaries of ten states, picking up a lot along the way. This includes nutrients such as nitrogen and phosphorous, which contribute to “dead zones” where the river drains into the Gulf of Mexico. Dead zones occur when too much nutrient pollution causes algae to grow excessively. When they die, the decaying cells consume oxygen, depriving other life forms of the oxygen they need to survive. This condition, known as hypoxia, can lead to the devastation of entire ecosystems if left unchecked.

There’s not a lot of mystery about what causes nutrient pollution. Widespread agricultural practices in the Midwest’s Corn Belt encourage the plentiful use of nutrient-based fertilizer, so much so that much of it washes away even before the crops can use it. But trying to understand how it’s happening remains a challenge. The data on the river is as free-flowing as the water itself—and often just as slippery.

“Lots of people are doing water quality monitoring, and there are maybe hundreds or thousands of water quality parameters that can be tracked,” says Chris Jones. Jones is a research engineer at the University of Iowa, who works with the Upper Mississippi Information System (UMIS), an online platform that aims to make this deluge of data more accessible and manageable. Jones also works on the Iowa Water Quality Information System (IWQIS), an ongoing effort that informs this newer project. IWQIS makes real-time water quality data from within the state of Iowa available to researchers and the general public. However, the UMIS team is thinking bigger than that. Jones notes, “Watershed boundaries are different from political boundaries. We have to think within their context if we’re going to improve water quality, and so our vision was to bring the IWQIS concept to a larger geographical area.” The Upper Mississippi Information System aims to do exactly that. A team of researchers at the University of Iowa, Iowa State University, and the University of Illinois at Urbana-Champaign are working together on building the UMIS platform and wrangling the data for public consumption. The online platform provides one-stop access to independently managed data streams—both real-time and historical.

The initial site is live, and Jones characterizes it as about halfway complete. The biggest task for the team is to acquire still more data through building partnerships with other organizations. “We’re mainly focused on nutrients like nitrogen and phosphorus right now, but some other data will likely be available,” Jones says. “We had to start somewhere. This is a good place to start because it’s what many people are most interested in.”

Despite the widespread interest, combating nutrient pollution in the Midwest is an uphill battle. Unlike other U.S. water systems like the Chesapeake Bay, the states of the Mississippi basin have chosen not to regulate nutrient reduction, thanks to a powerful agricultural lobby that is opposed to such mandates. Instead, the state governments each try to promote and incentivize more widespread adoption of practices that reduce nutrient flow. 

Jones, however, is skeptical that meaningful change can happen without collaboration. “The states will have to work in concert in order to have any meaningful impact on solving hypoxia,” he says. “That means giving scientists access to a lot of data. Having access to sound scientific data is critical for making policy.”

Individuals and organizations that are interested in the UMIS project can sign up to be a data partner or beta user via the UMIS website, or contact the team via email. Jones and the team are hopeful that UMIS will help drive change at the scale that is needed. “Nutrient pollution is one of the wicked problems, along with climate change, but we know there are solutions out there,” he says. “Solving this is a sociological and economic issue. Hopefully, UMIS can be a tool for policymakers to do just that.”


Get involved

Contact the Midwest Big Data Innovation Hub to suggest other projects we should highlight on this blog, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing data science collaborations in the 12-state Midwest region. Learn more about the NSF Big Data Innovation Hubs community.

Profile: Crystal Lu

Nitrogen reduction in the Upper Mississippi River Basin

By Katie Naum

As extreme climate events become more frequent, some of their impact is visible—like the derecho that tore through Iowa in August 2020, leaving a wake of destruction in its path. Other impacts—including nutrient pollution in water systems—are less understood. In what ways will climate change affect the world around us? How can we use data science to better understand and adapt to the impact of climate extremes? 

Chaoqun (Crystal) Lu portrait
Chaoqun (Crystal) Lu

Chaoqun (Crystal) Lu is a quantitative ecosystem ecologist and assistant professor at Iowa State University, and a collaborator of the Midwest Big Data Innovation Hub. Her work focuses on water quality modeling, including the impact of extreme climate events and human activities on nutrient pollution. Her recent NSF CAREER award is titled “Understanding the dynamics and predictability of land-to-aquatic nitrogen loading under climate extremes by combining deep learning with process-based modeling”. The project will bridge the gaps between science and practice, sharing the most current knowledge of Earth system modeling to the public and making the complex concept of watershed management more concrete for the next generation of scientists, land managers, policy makers, and voters.

I spoke with Lu recently via Zoom to learn more about her work with water quality data. The following conversation has been edited and condensed for clarity.

Why is it important to study water quality here and now?

In the United States, nearly 60% of coastal rivers and bays have been degraded by nutrient pollution. Here in the Midwest, people have invested a lot of money and effort over the years to reduce nitrogen pollution. At the same time, climate-driven variations may far outweigh the effects of these nitrogen reduction practices. Increasing summer humidity, more frequent heavy rainfalls, and extreme floods have become a new normal in the central United States over the past few decades. There are a lot of unknowns about how extreme climate events have affected nitrogen leaching from soil and nitrogen loading through tiles, streams and rivers. Lots of data exist, though! 

Policymakers need science-based management suggestions. As a researcher, I would like to benchmark my model with long-term measurements of water quality, and scale up from site-specific measurements to a broader region such as the Upper Mississippi River Basin. If we can figure out how to reduce nitrogen pollution here in the Midwest, the solution we come up with will be very likely to be effective elsewhere. 

Can you tell readers more about the focus of your work, including your recent NSF CAREER award? (Congrats!)

I’m engaged in water quality modeling projects—studying, for example, the impact of nitrogen reduction practices on water quality. Our research team uses mathematical models to represent the physical processes involved in connected systems—the flow of water, the amount of nutrients used by plants or lost to runoff. We also quantify how climate change, land uses, and human management practices could affect nitrogen loading, and assess the effectiveness of nitrogen reduction practices in cleaning water.

The focus of this CAREER award is on how extreme climate events may affect nitrogen loading. My team wants to see how sensitive nitrogen leaching and loading are to events like these, which are increasing in the Midwest. We’re integrating machine learning approaches with a traditional process-based hydroecological model, using a large volume of water quality monitoring data that drains from various sized watersheds in the upper Mississippi–Ohio river basin. I want the key processes represented by traditional process-based models to be kept for water quality prediction, and at the same time improve the models’ outputs with “big data” and machine learning. Our integrated model uses data on water quality, weather, land cover, and human management practices, to better understand whether and where there are nitrogen pollution hotspots in the region. 

What are some of the challenges in working with water data? What are the insights you hope to gain from your research?

One important challenge is just the enormous amount of variation in the data. If you look at a time series for hydrological flow, you see huge variation in the relationship between flow and nitrogen concentration. The challenge we have is to quantify how varied and why. Why do some small watersheds have larger variations than others? Why are some regions more sensitive to climate than others? Is this pattern we’re seeing caused by a specific event, or the legacy of many such events over time? We want to get the whole picture on nitrogen dynamics, from vegetation to soil to water to rivers, from small to large watersheds, at daily time steps, using modeling to recreate such processes.

In our work under this award, we’re planning to include more small watersheds and high frequency data sets. I’m looking forward to new insights from such data analysis. There is so much data over the past few decades to work with, and the technology of water quality monitoring has really improved.

How does deep learning contribute to watershed management?

Deep learning has been transformative for hydrological science and earth system science, yet few studies have used it to digest the big data of water quality monitoring. Meanwhile, high-frequency water quality monitoring data are increasingly available, especially in smaller watersheds and at shorter time scales. This brings new opportunities to test the relationship between flow and nitrogen concentration in response to climate extreme events. All of this motivates me.

Do you consider yourself a data scientist as well as an ecologist? 

I consider myself an ecosystem ecologist, with data science skills. The question I want to find answers to are mostly ecological questions. Sustainability science, biogeochemical cycles, climate variability, natural and human drivers—these are all ecology questions. I say this even though I received training in ecosystem modeling and geospatial analysis for many years—but I consider these tools, the same way I consider machine learning a tool. I always keep my eyes open for tools that can help answer the ecological questions I care about. I tell my students this too: even if their degree or job title says ‘ecosystem modeler,’ I always hope they will step back and see the big picture.

How might interested stakeholders learn more or get involved?

We’ll be developing a project webpage where we will release research findings, future publications, and other relevant materials. Our results will be presented and disseminated to interested stakeholders through our collaborating institutions—not only to academic investigators, but also to the general public, because they are the people who actually make decisions on managing the land and improving the environment. 

This is a very multidisciplinary project, and others may have different ways of thinking about and analyzing the problem that we haven’t considered. We would love to hear from other researchers interested in analyzing the problem from another angle. We are also working actively to seek collaborators and more grants to leverage this project, putting available data sources online to allow easy access.

What do you love most about your research?

Being a modeler is a very precious role. Through multi-scale modeling, we try to connect a lot of different people—field scientists, computational experts, engineers, economists, stakeholders, and policy makers—who can work together to understand and build a more sustainable world for us to live in. This provides a lot of opportunity to collaborate with people in different fields. As a quantitative ecosystem ecologist and ecosystem modeler, I can serve as a bridge between field scientists, extrapolating their findings, and decision makers, who want to see and understand ecological outcomes. The work is really useful and applicable in real life. I enjoy the endless possibilities and the feeling that my research is useful and applicable for our world.


Katie Naum writes on science & technology, climate change, and culture. Follow her @naumstrosity and read more at katienaum.com.


Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other people or projects we should profile here, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the NSF Big Data Hubs community.

Midwest water researchers explore COVID-19 in wastewater

This story is part of a series on coronavirus research in the Midwest region

Researchers in the Midwest are looking in a surprising place for clues about the COVID-19 pandemic: wastewater.

Because so many people who are infected with COVID-19 are asymptomatic, scientists are interested in measuring the prevalence of the SARS-CoV-2 coronavirus in wastewater as a way to understand the population-level spread of the virus in communities. In-person testing can be problematic for a variety of reasons, so researchers are interested in alternatives.

Minnesota Public Radio interviewed one research group that is exploring new ways to explore coronavirus spread without directly testing people. “We’ve decided that one of the easiest ways to do that would be to noninvasively kind of scan the population for the presence of the virus,” University of Minnesota professor Glenn Simmons Jr. said. “And one easy way of doing that would be to look at the wastewater.”

Simmons, along with his collaborator Richard Melvin at UMN Duluth, are testing samples collected from wastewater treatment facilities for the presence of genetic material from the SARS-CoV-2 virus. Other researchers in the Midwest are working on similar sample collection, data analysis, and developing new tools and resources.

One resource under development is a publicly accessible, web-based Wastewater Pathogen Tracking Dashboard (WPTD). Dr. Rachel Spurbeck, research scientist at the non-profit Battelle Memorial Institute in Columbus Ohio, leads the creation of this project.

“The WPTD program is tracking SARS-CoV-2 and other viral pathogens found in the wastewater of four different locations in Toledo, Ohio over time and comparing the sequencing results to the public health and demographic data for these sites”, Spurbeck said. “This comparison will be used to generate risk models for COVID-19 spread in the community as well as other viruses present. We will also be identifying mutations in SARS-CoV-2 which will not only tell us that the virus is in the communities being studied, but also if there are any differences in the virus that could enable identification of how the virus is affecting the population and where the virus came from geographically.”

The data collected will be entered into the Wastewater Pathogen Tracking Dashboard for use by local public health officials to aid in identifying where contact tracing will be most useful. The project is funded by the National Science Foundation (NSF).

Since March 2020, the NSF has made hundreds of new awards focused on COVID-19 research to help address the pandemic. The NSF and the four regional Big Data Innovation Hubs collaborated on the creation of the COVID Information Commons resource to bring together information on these projects. Researchers can use the site to help find tools and resources, and to develop collaborations with other researchers.

Other wastewater tracking projects in the Midwest include two led by Kyle Bibby, Associate Professor of Engineering at Notre Dame university in Indiana. Bibby is leading an effort to develop methods to monitor for the presence of SARS-CoV-2 in wastewater and to connect these measurements to epidemiology models. Bibby also leads a project to create a national Research Coordination Network (RCN) focused on wastewater surveillance, in collaboration with partners from Howard University, Stanford University, Arizona State University, and the Water Research Foundation.

At the national level, the U.S. Centers for Disease Control and Prevention (CDC) has announced the development of a National Wastewater Surveillance System (NWSS) that collects data from local, state, tribal, and territorial health departments to supplement the efforts above.

Get involved

Contact the Midwest Big Data Innovation Hub if you’re aware of other projects we should include here, or to participate in any of our community-led Priority Areas.

The Midwest Big Data Innovation Hub is an NSF-funded partnership of the University of Illinois at Urbana-Champaign, Indiana University, Iowa State University, the University of Michigan, the University of Minnesota, and the University of North Dakota, and is focused on developing collaborations in the 12-state Midwest region. Learn more about the NSF Big Data Hubs community.

Introducing the COVID Information Commons

The Midwest Big Data Innovation Hub collaborated with the other three regional Big Data Innovation Hubs and the National Science Foundation (NSF) to launch the COVID Information Commons (CIC).

Funded by NSF COVID Rapid Response Research Award #2028999, the CIC is an open website to facilitate knowledge sharing and collaboration across various coronavirus research efforts, especially focusing on NSF-funded COVID Rapid Response Research (RAPID) projects.

The CIC serves as a resource for researchers, students, and decision-makers from academia, government, not-for-profits, and industry to identify collaboration opportunities and accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic.

WATCH: The recording of our launch and demo webinar is available at covidinfocommons.net as well as on YouTube.

LEARN MORESlides from the webinar are available at covidinfocommons.net, below the July 15 launch + demo video. While you’re there, you can explore the live site!

JOIN THE COMMUNITY: The CIC Slack community is a space for discussion and collaboration among PIs and other stakeholders engaged in COVID research.

We will be announcing further CIC events to showcase lightning talks from 40+ PI volunteers over the next few months. If you are interested in hearing more and did not opt-in at registration for future email updates, you may sign up here.

If you have any questions, please email us at info@covidinfocommons.net

Midwest Big Data Innovation Hub announces leadership changes

As its second year of new funding begins, there is new leadership at the Midwest Big Data Hub (MBDH), with a swap in principal investigators and the appointment of a new executive director. Catherine Blake, a co-principal investigator (PI) on the project, has moved into the PI role, while William (Bill) Gropp transitions to co-PI duties. Long-time Hub staff member John MacMullen was named executive director in January.

Catherine Blake

Blake is an associate professor in the School of Information Sciences (iSchool) at the University of Illinois at Urbana-Champaign, with an affiliate appointment in the Department of Computer Science. At the iSchool, she serves as associate director of the Center for Informatics Research in Science and Scholarship (CIRSS) and director of the graduate programs in information management and bioinformatics. Gropp is director and chief scientist of the National Center for Supercomputing Applications (NCSA) and the Thomas M. Siebel Chair in the Department of Computer Science at Illinois. Prior to joining the MBDH, MacMullen was a faculty member in the iSchool.

“I’m excited and honored to step into the role of principal investigator for the Midwest Big Data Hub,” said Blake. “The community developed during the first phase has made the MBDH well positioned to leverage the rapidly growing data and information collections and technologies in Phase 2 that focus on opportunities, interests, and resources that are unique to the Midwest.”

The MBDH, co-led by the NCSA and the iSchool, serves a twelve-state region that encompasses Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin. It is part of the National Science Foundation’s regional Big Data Innovation Hub (BD Hubs) program that comprises offices in the Midwest, West, South, and the Northeast. Initially funded in 2015, the second phase started in summer 2019 and will run until 2023. The goal of the MBDH awards, which will total over $4 million for both phases, is to catalyze data science efforts around important priority areas in the Midwest.

“This month we’re starting our second year of the new phase of the Hub with the launch of our Community Development and Engagement funding program,” said MacMullen. “We look forward to continuing to develop a vibrant and diverse data science community in the Midwest that includes the range of academic institutions in the region, and grows participation from nonprofits, government agencies, and industry partners.”

Priority areas for MBDH currently include advanced materials and manufacturing; water quality; big data in health; digital agriculture; and smart, connected, and resilient communities. In addition, MBDH leads cross-cutting initiatives to broaden the participation in data science education, develop cyberinfrastructure for research data management, and address cybersecurity issues around big data. MBDH engages with the BD Hubs Data Sharing and Cyberinfrastructure Working Group, the Open Storage Network, and other initiatives that foster access to research data under FAIR (findable, accessible, interoperable, reusable) principles. By leading initiatives in data science education and workforce development, the MBDH aims to increase data science capacity within the region, such as by growing a network of predominantly undergraduate institutions and minority-serving institutions.

“The MBDH is building on the momentum of its first phase by growing the stakeholder community in the Midwest,” said Gropp, who began as PI of the Hub in 2017. “At the same time, we’re actively participating in the evolution of the national data science ecosystem. I look forward to continuing to develop long-term sustainability for the Hub’s activities through strategic projects such as the COVID Information Commons collaboration between the Hubs and NSF, launching in July 2020.”

Follow MBDH on Twitter: @MWBigDataHub

The Midwest Big Data Innovation Hub was initially funded under NSF award #1550320. The current Phase 2 award is #1916613.

Midwest Big Data Hub successfully transitions to second phase with new NSF award

The National Science Foundation (NSF) this month announced the second phase of funding for the regional Big Data Innovation Hub (BD Hubs) program. Under the planned four year, $4 million award, the Midwest Big Data Hub will continue to be led from the National Center for Supercomputing Applications (NCSA) at the University of Illinois, Urbana-Champaign. The Hub’s priority focus areas will be co-led by five partner institutions in the region: Indiana University, Iowa State University, the University of Michigan, the University of Minnesota – Twin Cities, and the University of North Dakota.

First funded in 2015, the four regional BD Hubs were designed by NSF to follow U.S. Census Regions, with offices in the Midwest (led by Illinois), West (UC Berkeley), South (Georgia Tech and UNC Chapel Hill) and the Northeast (Columbia University). The Midwest Hub serves a 12-state region that encompasses Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin.

“Developing innovative, effective solutions to grand challenges requires linking scientists and engineers with local communities,” said Jim Kurose, Assistant Director for Computer and Information Science and Engineering at the National Science Foundation, which funded these awards. “The Big Data Hubs provide the glue to achieve those links, bringing together teams of data science researchers with cities, municipalities and anchor institutions.”

“The Midwest Big Data Hub has built a strong network of partners and a diverse community of stakeholders in the region,” said Bill Gropp, Principal Investigator for the award. “The Hub is well positioned to continue its record of fostering innovative partnerships and providing valued services to our stakeholders in its next phase. Our partner institutions are leaders in the region, and each brings unique strengths to the priority areas they lead.”

The Midwest Hub’s priority areas currently include:

  • Advanced Materials and Manufacturing – Led by the University of Illinois, this area focuses on next-generation materials research in a manufacturing context, and complements the 2016 NSF Big Data Spoke awards on integrative materials design (iMaD) to Northwestern University, the University of Chicago, the University of Illinois, University of Wisconsin – Madison, and the University of Michigan, as well as leveraging existing partnerships with the Materials Data Facility, the nanoMFG node at UIUC, and the Center for Hierarchical Materials Design (CHiMaD) at Northwestern University, all supported by NSF.
  • Water Quality – Led by a new Phase 2 partner, the University of Minnesota – Twin Cities, this area complements the existing water cyberinfrastructure focus of the MBDH through the NSF Big Data Spoke awards made in 2018 to Iowa State University, the University of Illinois, and the University of Iowa.
  • Big Data in Health – The University of Michigan will continue to lead this area, with contributions from Indiana University, building on prior work in Phase 1 as well as the Spoke awards for the Advanced Computational Neuroscience Network (ACNN).
  • Digital Agriculture – Iowa State University will lead this area, with continuing contributions from the University of North Dakota, the University of Nebraska, the University of Illinois, and other partners, including from the 2016 Spoke award for Unmanned Aircraft Systems, Plant Sciences and Education (UASPSE), to continue to build a vibrant stakeholder community engaged with transdisciplinary issues around data for agriculture, food production and plant and animal science.
  • Smart, Connected, and Resilient Communities – Led by Indiana University with contributions from Iowa State University, the University of Michigan, and the University of Illinois, this area continues to build a network and connect resources at the intersection between research and data-driven community decision-making.  

“By catalyzing partnerships that integrate academic researchers into the fabric of communities across the U.S., we can accelerate and deepen the impact of basic research on a range of societal issues, from water management to efficient transportation systems,” said Beth Plale, one of the National Science Foundation program directors managing the Big Data Hubs awards.

The Midwest Hub also leads cross-cutting initiatives for broadening participation in data science education, cyberinfrastructure for research data management, and cybersecurity issues around big data. MBDH participates in the BD Hubs Data Sharing and Cyberinfrastructure Working Group, the Open Storage Network, and other initiatives that foster access to research data under FAIR (findable, accessible, interoperable, reuseable) principles. By leading initiatives in data science education and workforce development, the MBDH aims to increase data science capacity within the region, in part through a growing network of Predominantly Undergraduate Institutions and Minority Serving Institutions.

The Midwest Big Data Hub was initially funded under NSF award # 1550320. The phase 2 award is # 1916613.

Explore the Hub at http://MidwestBigDataHub.org

Learn more about the BD Hubs ecosystem at http://BigDataHubs.org

The MBDH project office is housed at the National Center for Supercomputing Applications (NCSA), which provides computing, data, networking, and visualization resources and expertise that help scientists and engineers across the country better understand and improve our world. NCSA is an interdisciplinary hub and is engaged in research and education collaborations with colleagues and students across the campus of the University of Illinois at Urbana-Champaign.

For interview requests, general questions, copyright permission and B-roll inquiries contact: publicaffairs@ncsa.illinois.edu.

National Science Foundation (NSF) media contact: media@nsf.gov

###

Midwest Big Data Hub co-leads local events for 4th Annual Global Women in Data Science Conference

The Midwest Big Data Hub co-led local participation in the 4th annual Global Women in Data Science (WiDS) Conference, with sponsorship from the National Center for Supercomputing Applications (NCSA) and the University of Illinois. The event was free and open to all. The WiDS Conference, hosted on March 4th at 150 locations around the world, seeks to unite and connect women working in data science fields.

“We were very excited to co-sponsor this with NCSA, and support this inaugural Illinois event for Stanford’s Global Women in Data Science Day,” said Melissa Cragin, Executive Director of the Midwest Big Data Hub. “Partnering with others on events such as the Illinois WiDS allows us to best use our human resources and experts network to broaden participation in data science and Big Data research and education. I was honored to participate and have the opportunity to moderate such a terrific panel of accomplished leaders, who shared their perspectives on data science, data-enabled research, and opportunities for women in this space.”

panel discussion
Faculty panel moderated by MBDH Executive Director Melissa Cragin

The WiDS local events, hosted this year at NCSA, featured a variety of speakers from diverse backgrounds presenting sessions on opportunities for women in data science, technical vision talks, and the variety of data science and technology careers available in the Midwest.

“I always enjoy telling my story about how I got started working big data research,” said Ruby Mendenhall, Illinois Professor of Sociology and African-American Studies and NCSA faculty affiliate. “My story also demonstrates the importance of doing outreach to groups that are not traditionally represented in data science such as African American Studies.”

As part of her 2017-2018 NCSA Faculty Fellowship, Mendenhall and NCSA research programmer Kiel Gilleade completed a pilot study called the Chicago Stress Study that examines how the exposure to nearby gun crimes impacted African American mothers living in Englewood, Chicago. Mendenhall and Gilleade developed a mobile health study which used wearable biosensors to document 12 women’s lived experiences for one month last fall. As part of their research, Mendenhall, Gilleade, and their team were able to create an exhibit based on the study data they collected in order to bring the unheard, day-to-day stories of these mothers to life.

panel discussion
Panel discussion moderated by iSchool Professor Catherine Blake

Professor Donna Cox, Director of NCSA’s Advanced Visualization Lab, was a panelist at this year’s local conference, and praised the insights of the other speakers while emphasizing the importance of the larger WiDS conference. “It was valuable to hear other panelists,” said Cox. “The future of Women in Data Science should include raising awareness about important issues emerging in data science, especially socially-relevant issues. We need more women actively involved in the ethics of data science.”

Alice Delage, Associate Project Manager for NCSA and Program Coordinator for the MBDH, said, “Hosting WiDS Urbana-Champaign at Illinois was an opportunity to highlight the campus expertise around data science led by women.” Delage, who co-chairs the local Women@NCSA group, said, “Data science and technologies are increasingly impacting our lives and society, and it is imperative that women and minorities be part of these transformations. We wanted to showcase the groundbreaking work being done in that area by Illinois female data scientists and to inspire more women and underrepresented communities to engage in the field.”

There are also opportunities to expand the event next year by better incorporating student work in the program, Delage said, or running a datathon, for example. Some of this year’s participants have already volunteered to help with next year’s event.

A full list of this year’s speakers at the WiDS Conference at NCSA is here. For more information about the global WiDS conference and ways to get involved, please visit https://www.widsconference.org.

The MBDH is one of four regional Big Data Innovation Hubs with support from the National Science Foundation (award # 1550320), and works to build capacity and skills in the use of data science methods and resources in the 12-state U.S. Midwest Census region. Learn more about the Hub at https://midwestbigdatahub.org.

Thanks to NCSA Public Affairs for contributing to an earlier draft of this post.

BD Hubs profiled in SIGNAL magazine

The NSF-funded Big Data Innovation Hubs were highlighted in a recent article in SIGNAL magazine, a publication of the AFCEA (Armed Forces Communications and Electronics Association). The Executive Directors of the Midwest and Northeast Big Data Hubs, Melissa Cragin and René Baston, were quoted extensively from interviews that covered the wide-ranging communities and activities of the Hubs. Here is an excerpt from the article:

“[W]hile we’re called the Big Data Innovation Hubs, we’re very focused on building capacity in data science, building expertise, access to data-related services and networks related to all things data science,” said Cragin.

That means making available “to all kinds of communities” access to data-related skills, services, tools and opportunities, Cragin states. By developing public/private partnerships and working with groups to leverage these resources, the hubs can help coordinate solutions to “shared grand challenges,” she notes. The hub also is endeavouring to extend data science research and education to predominantly undergraduate institutions—including minority-serving institutions—to help add data skills for the developing workforce, she states.

The regional aspect allows each hub to identify priority areas or “spokes” that they are pursuing. For the Midwest, issues relating to water quality; digital agriculture and unmanned aerial systems; and food, energy and water, among others, play a major role.

Read the full article here.

Guest post – Diverse programs from ISU address sustainable cities challenges

By Iowa State University’s Sustainable Cities team

Researchers with the Sustainable Cities team at Iowa State University recognize the difficulty that public officials face in transforming vast amounts of climate and energy research into contextualized public policy. In attempting to address this critical issue, the team’s mission goes beyond the creation of new climate analysis tools to also investigate new methods for integrating communities into the discourse of data creation and energy conservation. To accomplish this agenda, our team engages in various research avenues that range from the creation of new spatial-data tools to enabling community youth activism. Here are just a few highlights of the team’s most recent achievements:

Sustainable Cities’ team leader Ulrike Passe, associate professor of architecture, presented our hybrid physics data modeling framework at the National Science Foundation-sponsored Research Coordination Networking (RCN) workshop held at Carnegie Mellon University on May 17, 2018. The presentation, which capstones one of the major branches of the Sustainable Cities initiatives, demonstrated the integration of our recently developed thermo-physical data simulator with our research into human energy-use behavior to demonstrate how a more holistic neighborhood energy model could be constructed. This same model was presented by graduate research assistant Himanshu Sharma at the fifth High Performance Building’s Conference on July 9, 2018, at Purdue University.

image from Krejci et al. (2016)

The Community Growers Program, a public-engagement initiative started back in March of 2017, has become another core pillar of the Sustainable Cities group research. Spanning a course of eight weeks, researchers worked with 22 leadership-minded youth in the Baker Chapter of the Boys and Girls Club at Hiatt Middle School in Des Moines, Iowa, to create a community garden based on a methodology of spatial, socio-technical storytelling. Through this process, the youth participants were able to learn more about their community through access to geographic information system (GIS) and spatial mapping tools. Associate English professor Linda Shenk, our community engagement lead, and Mallory Riesberg, a collaborator with the Baker Chapter of the Boys and Girls Club, presented this methodology in a presentation titled, “Fostering the Next Generation of Big Data Scientists and Sustainable City Planners” at The Growing Sustainable Communities Conference in Dubuque, Iowa, on Oct. 4, 2017. Team members Linda Shenk, Passe and Alenka Poplin, assistant professor of community and regional planning, would later be published in the 35th Journal of Interaction Design and Architectures for the inclusion of this work in their entry, titled, Engaging Youth with Pervasive Technologies for Resilient Communities.

Poplin, an established researcher in the field of geo-spatial mapping, also leads a research group that seeks to understand how to better develop feedback loops through innovative user-interfaces. An inquiry into mapping places of emotional power was highlighted in a 2017 paper entry to the second edition of Kartographische Nachrichten on Empirical Cartography Journal, titled, “Mapping Expressed Emotions: Empirical Experiments on Power Places.” More recently, Poplin and her researcher team have begun testing an energy survey game they have developed called E-Footprints. The framework of this game includes the extraction of user-performance data to measure and analyze what learning opportunities may help guide more environmentally efficient decision making. This feedback is then generated back into learning mini-games throughout the game, such that the user gets more “energy savvy” as they play. This project begins field-testing in November 2018.

With a diverse, multifaceted research team of nearly 50 members, the Sustainable Cities group continues to advance the capabilities of communities and cities to think sustainably about a better future.

 

Image reference:

Krejci, C. C., Passe, U., Dorneich, M. C., & Peters, N. (2016), “A Hybrid Simulation Model for Urban Weatherization Programs”, Proceedings of the 2016 Winter Simulation Conference, Arlington, VA, December 11–14. T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds. (pdf)

 

Read more about the MBDH’s Smart, Connected, and Resilient Communities initiatives.

Guest post – Data Science Education at Two-Year Colleges

By Matt Fall

Executive Director, Center for Data Science, Lansing Community College

Recently, the American Statistical Association (ASA), with support from the National Science Foundation (NSF), hosted a two-day summit in Washington D.C. to discuss outcomes and curricula for data science programs at two-year colleges. The Two-Year College Data Science Summit (TYCDSS) was intended to help spur the growth of data science programs at these institutions and included representatives from two and four-year institutions, government, and industry.

Sallie Keller (Virginia Tech) plenary talk (photo: Nicholas Horton)

The summit included several plenary talks discussing the role of two-year colleges in addressing the need for data scientists as well as a brief presentation from a graduate of a community college data science program. The majority of the summit, however, was devoted to a series of working sessions where the participants discussed ideal outcomes and competencies for three categories of students:

  • Category 1: students intending to complete an Associate’s degree and begin working
  • Category 2: students intending to earn an Associate’s degree and transfer to a 4-year program
  • Category 3: students seeking a certificate

The working discussions provided an opportunity for the summit participants to discuss what was expected and feasible for a student from each category to complete. The discussions were captured by a designated writing group and there will be a forthcoming write-up summarizing the recommendations of the summit participants with guidelines for two-year college data science programs.

This summit was particularly timely for my colleagues at Lansing Community College (LCC) as we have recently begun development of a data science program. Prior to the summit, participants were provided access to a list of resources that included relevant research, reports from related workshops, and sample syllabi. Of particular interest to us, as we design the layout of our program, were the Park City Math Institute’s Curriculum Guidelines for Undergraduate Programs in Data Science (2016) [PDF], the Oceans of Data Profile of the Data Practitioner (2016), and the Oceans of Data workshop report on Building Global Interest in Data Literacy (2016). The resources provided, candid discussions with other two-year colleges regarding their programs, and the discussions about realistic competency expectations were also of interest and informative to our program design.

The intent of the TYCDSS directly supports the MBDH’s priority area of interest in data science, education and workforce development. Two-year colleges provide higher education accessibility to many students who could not or would not otherwise pursue an advanced degree. An increasing number of these schools are offering certificate and Associate’s degree programs in data science and analytics to support growing workforce demand. Growth in these types of programs should naturally lead to an increase in data competency, enrollment in university programs, and larger hiring pools for data science based careers.

Related information:

Guest post – URSSI: Conceptualizing a US Research Software Sustainability Institute

First URSSI workshop attendees (Credit: Mike Hucka)

Contributed by Daniel S. KatzJeff CarverSandra GesingKarthik RamNic Weber

 

The NSF-funded conceptualization of a US Research Software Sustainability Institute (URSSI) is making the case for and planning a possible institute to improve science and engineering research by supporting the development and sustainability of research software in the US.

Research software is essential to progress in the sciences, engineering, humanities, and all other fields. In many fields, research software is produced within academia, by academics who range in experience and status from students and postdocs to staff members and faculty. Although much research software is developed in academia, important components are also developed in national laboratories and industry. Wherever research software is created and maintained, it can be open source (most likely in academia and national laboratories) or commercial/closed source (most likely in industry, although industry also produces and contributes to open source.)

The open source movement has created a tremendous variety of software, including software used for research and software produced in academia. This plethora of solutions is not easy for researchers to find and use out-of-the-box. Standards and a platform for categorizing software for communities are lacking, which often leads to novel developments rather than reuse of solutions. Three primary classes of concern are pervasive across research software in all research disciplines and have stymied research software from achieving maximum impact:

  • Functioning of the individual and team: issues such as training and education, ensuring appropriate credit for software development, enabling publication pathways for research software including novel methods beyond “classical” academic publications, fostering satisfactory and rewarding career paths for people who develop and maintain software, increasing the participation of underrepresented groups in software engineering, and creating and sustaining pipelines of diverse developers.
  • Functioning of the research software: supporting sustainability of the software; growing community, evolving governance, and developing relationships between organizations, both academic and industrial; fostering both testing and reproducibility, supporting new models and developments (for example, agile web frameworks, software as a service), and supporting contributions of transient contributors (for example, students).
  • Functioning of the research field itself: growing communities around research software and disparate user requirements, avoiding siloed developments, cataloging extant and necessary software, disseminating new developments, and training researchers in the usage of software.

The goal of this conceptualization project is to create a roadmap for a URSSI to minimize or at least decrease these types of concerns. To do this, the two aims of the URSSI conceptualization are to:

  1. Bring the research software community together to determine how to address the issues about which we have already learned. In some cases, there are already subcommunities working together on a specific problem, including those that we are part of, but those subcommunities might not be working with the larger community. This leads to a risk of developing solutions that solve one issue but don’t reduce (or might even deepen) other concerns.
  2. Identify additional issues URSSI should address, identify communities for whom these issues are relevant, determine how we should address the issues in coordination with the communities, and determine how to prioritize all the issues in URSSI.

We are not working in a vacuum, but with other like-minded projects. In addition to Better Scientific Software (BSSw) and activities around research facilitators (ACI-REF) in the US, there are two ongoing institutes in science gateways (SGCI) and molecular sciences (MolSSI); a recently completed conceptualization in high energy physics (S2I2-HEP); two other conceptualization projects now underway in geospatial software and fluid dynamics; and a large number of software development and maintenance projects. In the UK, the Software Sustainability Institute (SSI), which has been in operation since 2010, is an inspiration and a potential model for our work.

Given these existing activities, part of our challenge is to define how we will work with these other groups. For example, we might decide that they perform an activity so well that we should point to it, such as the SSI’s software guides. Or we might decide to either duplicate or enhance an activity they do to expand its impact, such as working with the SGCI to offer incubator services to a wider community than just gateway developers. Or we might decide to collaborate with one or more groups, such as on policy campaigns aimed at providing better career paths for research software developers in universities.

We have held one workshop and are planning three more, in addition to a community survey we plan to have out soon, and a set of ethnographic studies of specific projects. We are communicating through our website, a series of newsletters, and a community discussion site.

URSSI welcomes members of the research software community to join us, both to help us determine how to proceed and to directly contribute. Please sign up for the URSSI mailing listcontribute to our discussions, and potentially publish a guest blog post on the URSSI blog on a topic around software sustainability.

Welcome to the new MBDH Community Blog

Greetings!

Today we are launching a new MBDH Community Blog, which is intended to extend information sharing around events and projects, as well as expand our channels for Community conversation.

We plan to run 1-2 posts per month, and we are now seeking submissions from the MBDH Community – including the Spokes and our other collaborative projects – that describe your contributions and developments in the broader data ecosystem. Of interest are short reports and highlights from data-related meetings, events, or project outcomes, inclusive of the role and impact of the MBDH for these efforts.

We welcome contributions from the Social Sciences and Humanities, including short contributions that address data and algorithmic ethics, or coming changes for work, daily life, and public engagement in U.S data policy.

We encourage submissions from practitioner and NGO perspectives, as well as those from academia, industry, or government. We will provide additional guidelines shortly. If you are interested in submitting a Blog post, please send your contact information and the subject area to: info@midwestbigdatahub.org

Our first guest post is by Daniel Katz, Assistant Director for Scientific Software and Applications at the National Center for Supercomputing Applications (NCSA). Check out his post on the US Research Software Sustainability Institute (URSSI) project.

Finally, I’ll note a couple of activities where we are currently seeking input and engagement:

Add your voice to our Midwest Big Data Hub evaluation

  • To create a robust strategic plan for the Midwest Hub.
  • To plan toward long-term sustainability, especially financial sustainability, for the Midwest Hub.
  • Provide your input here: https://www.surveymonkey.com/r/MBDHSurvey

Participate in our election of five (5) At-large representatives for the MBDH Steering Committee:  https://midwestbigdatahub.org/2018-steering-committee-at-large-nominees/

As always, please contact us with any ideas or questions.
Thank you for your continued support!

All the best,
Melissa Cragin
Executive Director, Midwest Big Data Hub

Midwest Big Data Summer School 2018

Midwest Big Data Summer School reveals how big data can advance research efforts

By Paula Van Brocklin, Office of the Vice President for Research, Iowa State University

Iowa State University logo
The Midwest Big Data Summer School, held May 14-17 at Iowa State University, helped nearly 140 academic and industry researchers, graduate students and post-docs from nine states broaden their understanding of big data and its ability to advance their research interests. Iowa State has organized and hosted the event since 2016.

 
“The summer school seeks to bridge the gap between scientists and engineers using data science technology by introducing them to data science techniques and vocabulary,” said Hridesh Rajan, lead organizer of the Midwest Big Data Summer School and professor of computer science at Iowa State. “The idea is to help these individuals better communicate and leverage their data-science needs.”

The curriculum

The school’s first three days introduced attendees to a range of big data topics, including data acquisition, data preprocessing, exploratory data analysis, descriptive data analysis, data analysis tools and techniques, visualization and communication, ethical issues in data science, reproducibility and repeatability, and understanding domain/context.

On the final day, participants selected one of four tracks, which focused on a sub-area of big data analysis. The tracks were:

  • Foundations of Data Science
  • Software Analytics
  • Digital Agriculture
  • Big Data Applications

Several individuals at Iowa State were instrumental in developing and organizing the tracks’ curricula. Click here for a list of those involved.

Speakers

Keynote presenters at this year’s summer school were:

  • Chid Apte, director, Mathematical Sciences and Blockchain Solutions, IBM Research
  • Tom Schenk, chief data officer, City of Chicago
  • Jacek Czerwonka, principal software engineer, Microsoft Research
  • Will Snipes, principal scientist, ABB Research

A complete list of speakers, including their bios, is available here.

Data science evolving quickly

The field of big data, also referred to as data science, is relatively new yet advancing quickly. For this reason, organizers encourage researchers and scientists to learn as much as they can through resources like the Midwest Big Data Summer School.

“Our aim is for early career researchers and professionals – both in academia and industry – to get a taste of what it’s about, what the state of the art is and how they can start thinking about using data science in their own domains,” said Chinmay Hegde, assistant professor of electrical and computer engineering at Iowa State and a co-organizer of the summer school.

Many thanks

Rajan recognizes the summer school would not be possible without the help of many.

“We are especially thankful for the Midwest Big Data Hub, the National Science Foundation, the Office of the Vice President for Research, Iowa State’s College of Liberal Arts and Sciences, and the departments of computer science and statistics for providing both funding and personnel support for this event.”

Next year
Plans are in the works for the 2019 Midwest Big Data Summer School, though no dates have been set. Rajan said more application-specific tracks may be added to next year’s curriculum. Watch the Midwest Big Data Summer School website for more details in the spring of 2019.

——

Reposted from Iowa State University’s Research News blog. View the original post here.

Big Data Hubs partner with NSF and JHU on new nationwide data storage network

The Midwest Big Data Hub and the three other regional Big Data Innovation Hubs are partnering with the National Science Foundation and Johns Hopkins University on development of a new nationwide research data network called the Open Storage Network. Partners include Alex Szalay, lead PI (Johns Hopkins), Ian Foster (University of Chicago), the National Data Service (NDS), and five supercomputing centers within the Big Data Hubs’ regions.

The official NSF press release is available here.

The Johns Hopkins story is here.

A story from NCSA with more details from Melissa Cragin, MBDH Executive Director and award PI, and NDS Executive Director Christine Kirkpatrick is here.

Links to partners:

Innovating in the Big Data Ecosystem: Public-Private Partnerships for a Data-enabled World

Solving complex data challenges require innovative cross-border, multi-sector partnerships

(This article first appeared in the Spring/Summmer 2018 issue of Current magazine, published by MBDH partner Council of the Great Lakes Region. There is a PDF version here. View the full issue here.)

by Melissa Cragin, Ph.D
Executive Director, Midwest Big Data Hub

Complex data challenges facing the Great Lakes region in the era of big data transcend industries, applications, and borders. While data is increasingly borderless, borders and barriers still present substantial problems to industry, academic, and government initiatives that are dependent on data policy and governance processes that structure access and use. These challenges require innovative cross-border, multi-sector partnerships that can leverage the benefits of shared high performance computing resources and cyberinfrastructure services.Read More

MBDH partners on US Ignite Reverse Pitch challenge

part of Hub’s focus on Smart, Connected, and Resilient Communities

US Ignite Hackathon
UIUC collaborators and mentors meet with HackIllinois teams on US Ignite Challenge

The University of Illinois at Urbana-Champaign (UIUC) was awarded a $20,000 grant from US Ignite to host a Smart Gigabit Communities Reverse Pitch Challenge. The MBDH, along with other local partners (see below), contributed towards matching the grant, bringing to $40,000 the total resources available to support the development of smart gigabit applications for the benefit of the local community. Read More

New Report on “Keeping Data Science Broad”

A new report on the “Keeping Data Science Broad: Negotiating the Digital and Data Divide Among Higher Education Institutions” initiative was released by the South Big Data Hub and collaborators, including the Midwest Hub. This initiative brought together the BD Hubs community and other stakeholders to explore pathways for keeping data science education broadly inclusive. Read More

MBDH 2017 All-Hands Meeting Recap

by Keith Hollenkamp

On October 1-2, we hosted the Midwest Big Data Hub All-Hands Meeting at the beautiful Kiewit University in Omaha, Nebraska. Over the course of the two day event, researchers, academics, students, and more, connected over similar interests and attended panels all centered around this year’s theme: Data-Enabled Midwest Resilience.

Melissa Cragin, Executive Director of the MBDH, welcomes attendees to the All-Hands Meeting

Read More

An Introduction to Dr. William Gropp, MBDH PI

Dear MBDH Community,

I wanted to officially introduce myself as the new PI of the Midwest Big Data Hub. In a previous MBDH newsletter, it was announced that I would be taking over the role after the former PI, Dr. Ed Seidel, accepted the job of Vice President for Economic Development and Innovation for the University of Illinois System. It is with great enthusiasm that I join the MBDH community, and I’m eager to contribute to all the ways the Hub is expanding and enhancing the Midwest Big Data ecosystem.

Read More

Machine Learning: Farm-to-Table Workshop

by Keith Hollenkamp –

In April, the MBDH teamed up with the International Food Security at Illinois (IFSI) to host the Machine Learning: Farm-to-Table Workshop. The workshop brought together domain scientists to stimulate new data-driven R+D activity at the intersections of the Agriculture, Bioinformatics, Food-Energy-Water, and Food Security communities.

Read More