2018 All Hands posters – Midwest Big Data Hub

Student Posters:

Exploring Drivers for diabetes in the US: A data mining approach
Authors: Ganga Prasad Basyal, David Zeng and Omar El-Gayar, Dakota State University
Keywords: Diabetes; Data mining; Machine learning
Abstract (pdf) | Poster (pdf)

Transfer Learning in Medical Image Classification
Authors: James Boit, Rajesh Godasu, David Zeng, Dakota State University
Keywords: Transfer learning, deep learning, medical images
Abstract (pdf) | Poster (pdf)

Exercise Discrimination using Canonical Correlation Analysis
Authors: Pushkar Sathe, Bach Tran, Mohammad Ranjbar, Shivakumar Sastry, University of Akron
Keywords: Exercise Analysis, Wellness Management
Abstract (pdf) | Poster (pdf)

Automating Genetic Classification for Hemagglutinin and Neuraminidase genes from Influenza A Viruses through Machine Learning Methods
Authors: Michael Zeller, Tavis Anderson, Amy Vincent, Phillip Gauger, Iowa State University, Ames, Iowa;
Keywords: influenza A virus, swine, machine-learning, classification
Abstract (pdf) | Poster (pdf)

Generating Product Traceability Trees for Harvesting from GPS Tracks
Authors: Yaguang Zhang, Andrew Balmos, Aaron Ault, Dennis Buckmaster, James Krogmeier, Purdue University
Keywords: GPS, precision agriculture, product traceability
Abstract (pdf) | Poster (jpg)

Taking Out the Guesswork: An Analytical Approach to Police Traffic Stop Contraband Searches
Authors: Tong Zhou, Anthony Bonifonte, Denison University
Keywords: Data analysis, Linear regression, Contraband Searches
Abstract (pdf) | Poster (pdf)

Invited Posters:

Urbana-Champaign Smart Gigabit Community
Authors: Chieh-Li Chin, University of Illinois at Urbana-Champaign
Keywords: smart community, digital inclusion, community engagement
Abstract: The University of Illinois has partnered with US Ignite to build a living smart gigabit applications testbed in Urbana-Champaign that serves as one of the dozens of Smart Gigabit Communities in the nation. The goal of the project is to foster the development and deployment of next-generation innovative applications that leverage the high-speed network such as Urbana-Champaign Big Broadband (UC2B) and the rich human and technology resources in our community. This poster overviews the strategy, approaches, and activities developed to build the inclusive smart community ecosystem, introduces the gigabit applications from our research community, and features our partnership with the US Ignite Smart Gigabit Communities Program, the local government, and Midwest Big Data Hub.
Poster (pdf)

The Open Ag Technology and Systems Center: demonstrations, frameworks, and community for open source for agriculture
Authors: Dennis Buckmaster, Purdue University ABE; Jim Krogmeier, Purdue University ECE; Aaron Ault Purdue University ECE
Keywords: open-source, real-time, data exchange
Abstract: The Open Agricultural Technologies and Systems Center (OATS) was recently launched and involve industry partners and researchers in real-time exchange (RtX) projects in an open source culture; and will build community and momentum for scalable, distributed agricultural data processing that is readily translated to practice. We find that the data exchange among systems, people, and projects is the most critical component for achieving data-driven sustainability goals. Our aim is to foster an open source culture to agriculture in order to democratize innovation, reduce barriers to building upon previous efforts, build bigger markets, and streamline talent discovery and attraction. The three pillars of the OATS Center are education (online materials for beginner and advanced levels), research (in data engineering topics of sensing, communications, and computing applied to agriculture and food problems and opportunities), and community (industry associations, academic circles, github, and center-sponsored conferences).
Poster (pdf)

An Integrated Big Data Framework for Water Quality Issues in the Upper Mississippi River Basin
Authors: I. Demir, W.F. Krajewski, C.S. Jones, K. Schilling, L.J. Weber (University of Iowa), J. S. Lee, R.E. Warner (University of Illinois at Urbana-Champaign), P.W. Gassman (Iowa State University)
Keywords: Upper Mississippi, Water Quality, Information System
Abstract: NSF Midwest Big Data Hub Spokes project, “Collaborative: An Integrated Big Data Framework for Water Quality Issues in the Upper Mississippi River Basin,” will develop a cyberinfrastructure framework to support large-scale water-quality data integration, analyses, and visualization in the Upper Mississippi River Basin (UMRB) in real time using data-enabled information technologies. Seamless integration of existing real-time and ad-hoc water-quality data streams with continuous modeling in the context of relevant data resources is a major challenge in big data domain. Such a project at the UMRB scale is only possible within the framework of a Big Data Hub and Spokes ecosystem because it requires significant: 1) expertise in collection of water-quality data from a wide range of academic, agency, and NGO sources across several states; 2) integration of data (of varying quality, format, specification, and duration) into a single user-friendly system; 3) input from partners and stakeholders to understand the great variety of ways in which the data may be best accessed and used; and 4) computing and storage resources.
Poster (pdf)

Spokes: MEDIUM: MIDWEST: Smart Big Data Pipeline for Aging Rural Bridge Transportation Infrastructure (SMARTI)
Authors: Robin A. Gandhi, Deepak Khazanchi, Brian Ricks (University of Nebraska-Omaha); Daniel Linzell, Chungwook Sim (University of Nebraska-Lincoln)
Keywords: Bridge Structural Health, Next-Generation Health Monitoring, Data Management, Decision Support Systems, Socio-Technical Impact
Abstract: America’s bridges received a C+ from the American Society of Civil Engineers (ASCE) in 2017. The SMARTI Spoke address the need for better rural bridge health management using big data technologies that improve transportation network performance and enhances safety. By combining bridge infrastructure datasets using a novel big data pipeline, our findings will empower decision makers, including a better understanding of the socio-technical impacts associated with infrastructure decisions.
Poster (pdf)

The yt Project: Visualizing and Analyzing Volumetric Data Across Domains
Authors: Nathan Goldbaum, Matthew Turk, Sam Walkow (UIUC/NCSA)
Keywords: Community-Developed Software, Visualization, Data Analysis
Abstract: The yt Project aims to produce an integrated science environment based on the Python programming language for collaboratively asking and answering questions about volumetric data. Currently yt is mostly used by astrophysicists who run and analyze simulations of astrophysical phenomena but we are in the process of adding support for data from sources across the physical sciences. yt is designed to guide scientific inquiry (analysis, visualization, simulation) through physically-motivated understanding. It is released under the BSD license, developed completely in the open, and is designed to present a library of loosely-coupled components that can be easily integrated with other Python tools.
Poster (pdf)

Data Curation Network: Methods of Education and Shared Expertise
Authors: Hannah Hadley
Abstract: Data Curation Network conducted a study in 2016-2017 across six partner institutions with results suggesting a need to define and apply data curation activities. The methods used included the refinement of a network model of shared expertise and specialized data curation workshops that will result in actionable data curation primers.
Poster (pdf)

Preparing the Public Sector Research Workforce to Impact Communities through Data Science
Authors: Libby Hemphill, Christopher Brooks, Lynette Hoelter, Clifford A. Lampe
Abstract: This poster describes a project to train undergraduate students, graduate students, and community stakeholders in collecting, extracting, cleaning, annotating, and analyzing data generated and used by government organizations to further enable data-based decision making. Working with the Midwest Big Data Hub (MBDH), we identify interested community partners who will guide the development of innovative and scalable instructional materials, and will suggest relevant data sources, for in-person and online training offered by faculty at the University of Michigan. Direct community involvement ensures authentic learning experiences centered on skills directly applicable to public-sector research.
Poster (pdf)

SCC-RCN: Developing an Informational Infrastructure for Building Smart Regional Foodsheds
Authors: Ayaz Hyder, Ohio State University
Keywords: Foodshed, Informatics, Network
Abstract: This RCN seeks to address food system failures, inequities and other challenges by characterizing the food system within two specific foodsheds: a 6-county Sacramento, California foodshed, and a 7-county Columbus, Ohio foodshed. This network will focus on creating an informatics framework for generating “smart and connected foodsheds” around these two cities, enabling them to link and convert multiple types of data from multiple sources into usable information through data analytics approaches.
Poster (pdf)

Community-Driven Data Engineering for Opioid and Substance Abuse in the Rural Midwest
Authors: Raghu Machiraju, Ayaz Hyder, Anish Arora, Courtney Lynch, Pamela Salsberry (Ohio State University), Simon Lin (Nationwide Children’s Hospital), Rayid Ghani (University of Chicago), Amit Sheth (Wright State University)
Poster (pdf)

Clowder: Open Source Data Management for Long Tail Data
Authors: Luigi Marini, Rob Kooper, Indira Gutierrez, Max Burnette, Sandeep Puthanveetil Satheesan, Bing Zhang, Mike Lambert, Todd Nicholson, Yan Zhao, Jong Lee, Kenton McHenry (National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign)
Keywords:Data Management, Data Curation, Big Data
Abstract: Clowder is a customizable and scalable data management system that can be installed in the cloud, on regular hardware, or you can partner with NCSA for a custom instance. Clowder is open source. Users and developers can contribute to the core software and by creating new metadata extractors and data visualizations.
Poster (pdf)

Open Storage Network: national data storage cyberinfrastructure for the 21st century
Authors: Santiago Nuñez-Corrales, Alex Szalay, Alainna White, Michael Norman, Christine Kickpatrick, Melissa Cragin, Kenton McHenry, Stanley Ahalt, Lea Shanley, John Goodhue, Derek Simmel
Keywords: cyberinfrastructure; data driven science; national data storage
Abstract: The Open Storage Network (OSN) is an NSF-funded pilot of a distributed storage infrastructure that leverages these existing high speed links across the US, bringing together a team that will build a community to govern, test, and evaluate this prototype network. Such a system must fully utilize the network, be robust and fault tolerant, and support various patterns for large scale scientific analyses. The OSN will enable science driven collaborations across universities, establishing it now as scalable and flexible, and enable the creation of robust data science software so that unique data resources can be combined regularly and easily.
Poster (pdf)

A Data Analysis of Champaign’s Infrastructure
Authors: Ajaita Saini, Mackenzie Kirkham, Rose Nowak (University of Illinois at Urbana-Champaign)
Keywords: Civic-Tech, Sustainability, Infrastructure
Abstract: Our team is developing an autonomous system that monitors and detects weaknesses in the town of Champaign’s infrastructure. Our focus is on sustainable solutions that can help detect flooding and other potential hazards to public health, and use sensor and public city data to create a visualization of infrastructure and health issues in the town for residents and city officials to observe. Not only do we intend for our system to be used to save thousands of dollars in damage repairs, but also to build empathy within our community and allow residents to help one another.
Poster (pdf)

TRIPODS+X:EDU: Investigations of Student Difficulties in Data Science Instruction – Year 0
Authors: Karl R. B. Schmitt (Valparaiso University), Katherine Kinnaird (Smith College), Bjorn Sandstede (Brown University), Ruth Wertz (Valparaiso University)
Abstract: Web-browsing histories, online newspapers, streaming music, and stock prices all show that we live in an age of data. Extracting meaning from data is necessary in many fields to comprehend the information flow. This need has fueled rapid growth in data science education aiming to serve the next generation of policy makers, data science researchers, and global citizens. Initially, teaching practices have been drawn from data science’s parent disciplines (e.g., computer science and mathematics). This grant begins the process of investigating data science education as its own field of research. It aims to identify preconceptions students may have when they first enter a data science classroom, and what other courses from related programs are shaping their preconceptions. This poster will detail the grant’s mixed-method educational investigation designed to collect data and documentation of conceptual misunderstandings and difficulties in data science. This investigation will (1) identify classes in a variety of disciplines currently teaching the critical topics identified in the National Academy of Sciences, Engineering, and Medicine (NASEM) Report: Data Science for Undergraduates: Opportunities and Options; (2) work with instructors of those courses to gather evidence of student thinking (especially misconceptions) surrounding those topics; and (3) survey early career data science practitioners to assess those misconceptions that persist to employment. During this educational investigation, we will gather student work as those students are first engaging with data science concepts as well as the teaching materials used in those courses. Additional research methodologies include student interviews and surveys. Item (3), working directly with data science practitioners, will assist in identifying cutting-edge technical topics and oversights by instructors that might be otherwise missed, based on current workforce demands.
Poster (pdf)

Correlative analysis of metal organic framework structures through manifold learning of Hirshfeld surfaces
Authors: Xiaozhou Shen, Tianmu Zhang, Scott Broderick, Krishna Rajan (University at Buffalo)
Keywords: Manifold Learning, Metal Organic Framework, Hirshfeld Surfaces
Abstract: Thousands of the Metal Organic Frameworks(MOFs) have been experimentally synthesized since the its first discovery. As synthesizing and testing a large number of MOFs is not feasible in practice, the high-throughput computational screening of the MOFs database can help expedite the experimental efforts. However, typical MOF database is high-dimensional and sparse that pose the challenge of extracting the key features and trends that could guide the discovery process. To address this challenge, we develop a library of MOF fingerprints based on their geometric and chemical bonding interactions. Such fingerprints are computational ready to be analyzed with various machine learning methods. This feasibility has been demonstrated with the application of manifold learning to map the connectivity and extent of similarity between diverse MOF structures in terms of their surface areas. By examining nearest neighbor connections, we discovered structural and chemical correlations among MOF structures that would not have been discernible otherwise. Examples of the types of information that can be uncovered using this approach are given.
Poster (png)

CADRE: Collaborative Archive & Data Research Environment
Authors: Xiaoran Yan, Valentin Pentchev, Patricia L. Mabry, Jamie Wittenberg, Robert Van Rennes, Matthew Hutchinson, Benjamin Serrette
Keywords: Data platform, community building, reproducibility
Abstract: The proposed Collaborative Archive & Data Research Environment (CADRE) project aims to provide sustainable, scalable, and standardized data and analytic services for open and licensed big bibliometric data. By joining forces with our partners in both academia and industry, using the state-of-the-art big data and cloud technologies, CADRE will provide an efficient integrated data solution for researchers from different disciplines.
Poster (pdf)

The Social Media Macroscope (socialmediamacroscope.org)
Authors: Joseph Yun (University of Illinois at Urbana-Champaign)
Keywords: social media analytics; data science; science gateway
Abstract: The Social Media Macroscope is a science gateway with the goal of making social media data, analytics, and visualization tools accessible to researchers and students of all levels of expertise. The SMM provides a single point of access to a suite of intuitive web interfaces for performing social media data collection, analysis, and visualization via for open-source and commercial tools.
Poster (pdf)