Listed in order of presentation
Wednesday, October 27, 2021 – 3:00–4:00 p.m. CT / 4:00–5:00 p.m. ET
Heather Johnston, University of Michigan
Public Opinions on COVID-19 Vaccines from Social Media: A Machine Learning Study (Slides)
Understanding the discussion surrounding COVID-19 vaccines is vital in a time of vaccine hesitancy and public misinformation. As the pandemic continues and variants emerge, it is useful to understand how the conversation has evolved. Online discussion on Twitter provides substantial data about public opinion surrounding COVID-19 vaccines.
To understand the themes of these tweets, we combine state-of-the-art computational social science, statistics, and natural language processing. Specifically, we propose a fine-tuned BERT (Bidirectional Encoder Representations from Transformers) classifier to categorize COVID-19 vaccine-related tweets into seven topic classes: getting vaccinated & gratefulness, production & politics, research, side effects, skepticism, others, and noise. The training and development set for this model contains 4,685 human-labeled tweets. The performance on the development set of the BERT model was 78.9% with Macro-F1 .706. The performance on the test set was 78.3% accuracy with Macro-F1 .715.
We further applied our BERT model to 869,229 publicly available tweets posted within the timeline of January 2021 to August 2021 from Covaxxy. We found the top 3 most prevalent topics are getting vaccinated and gratefulness (26.7%), politics and production (19.8%), and skepticism (19.1%). The proportion of tweets about politics and production decreases from 27.8% to 10.0% over the course of the 8 months, while the proportion of tweets observed indicating vaccine skepticism increases from 15.1% to 30.7%. Our approach can be used for further monitoring of public discussion on COVID-19 vaccines to help governments encourage vaccination and respond appropriately to the concerns and interests of the general public.
Sushanth Sreenivasa Babu, University of Illinois at Urbana-Champaign
Bangalore Vaccination Drive – Slot Opening Insights (Slides)
The Indian Government opened the vaccination drive for people older than 18 years on the 1st of May. Given the huge shortage of vaccines and the vast number of people trying to get a slot on the CoWin portal, the experience has been far from ideal. However, people have rightly used their technology prowess to build bots that send alerts as soon as hospitals open up slots on Twitter/Telegram. While there are thousands of people getting these alerts, less than 5% of the people actually successfully book a slot. This leaves the rest frustrated considering the amount of time they have spent in trying to get a slot.
Jongwon Park, University of Illinois at Urbana-Champaign
Continual BERT: Data-driven summarization of scientific literature (Slides)
The scientific community continues to publish an overwhelming amount of new research related to COVID-19 on a daily basis, leading to much literature without little to no attention. To aid the community in understanding the rapidly flowing array of COVID-19 literature, we propose a novel BERT architecture that provides a brief yet original summarization of lengthy papers. The model continually learns on new data in online fashion while minimizing catastrophic forgetting, thus fitting to the need of the community. Benchmark and manual examination of its performance show that the model provide a sound summary of new scientific literature.
Yicheng Yang, Iowa State University
General-Purpose Open-Source Program for Ultra Incomplete Data-Oriented Parallel Fractional Hot Deck Imputation (UP-FHDI) (Slides)
Parallel fractional hot-deck imputation (P-FHDI) is a general-purpose, assumption-free tool for handling item nonresponse in big incomplete data by combining the theory of FHDI and parallel computing. FHDI cures multivariate missing data by filling each missing unit with multiple observed values (thus, hot-deck) without resorting to distributional assumptions. P-FHDI can tackle big incomplete data with millions of instances (big-n) or 10, 000 variables (big-p). However, handling ultra incomplete data (i.e., concurrently big-n and big-p) with tremendous instances and high dimensionality has posed challenges to P-FHDI due to excessive memory requirement and execution time. We developed the ultra data-oriented P-FHDI (named as UP-FHDI) capable of curing ultra incomplete data. In addition to the parallel Jackknife method, this work enables a computationally efficient ultra data-oriented variance estimation by using parallel linearization techniques. Results confirm that UP-FHDI can handle an ultra dataset with one million instances and 10, 000 variables. This paper illustrates the special parallel algorithms of UP-FHDI and confirms its positive impact on the subsequent deep learning performance.
Samantha Walkow, University of Illinois at Urbana-Champaign
Describing Scientific Workflows with yt (Slides)
Working with scientific data and open source software requires understanding a myriad of tools and best practices. This includes paradigms like imperative programming and for loops to write code, and nuances in syntax like dot notation. Once users master that, there’s learning the internal functionality of individual packages, which are often domain specific.
Interdisciplinary work provides an extra barrier in addition to those mentioned above, as new ways of thinking and new tools are added to the work load, butting heads with the domain focused tools available. While discovery and reconfiguration of software tools can be an intersection of creativity and innovation, too often learning curves get in the way and slow research down, or cause the wheel to be reinvented over and over again in each domain.
Scientific workflow description provides an alternative to the cognitive overhead of learning a new software package and new infrastructure. This description is encoded in a JSON schema, accessed by the user through a configuration file, and run using python modules that attach the configuration file to the code which produces output. In this case, ‘the code’ is yt, an open source python library designed for scientific analysis and visualization of volumetric data for the computational astrophysics domain. We use yt, an computational astrophysics tool, to demonstrate how a domain specific software can operate within a descriptive framework.
Through this framework, users can visualize and analyze scientific data without knowledge of a programming language or of the individual library. Future work will include support for multiple python libraries across several domains.
Emmanuel Akintunde, University of Nebraska–Lincoln
Developed Singular Value Decomposition Based Novelty Index for Damage Detection in full scale in-situ Bridges (Slides)
The Smart Big Data Pipelines for Aging Rural Bridge Transportation Infrastructure (SMARTI) project, currently supported by the NSF, is developing technologies and tools to help identify and assess damage in full-scale, in-service bridges utilizing few physical sensors. As part of the SMARTI project, this work focused on developing and validating a Singular Value Decomposition (SVD) based novelty index, as a damage detection tool. The tool was developed and validated using data from a series of controlled tests on a full-scale bridge mock-up that was subjected to three levels of cumulative damage. Measured data was cleansed, saved as snapshot matrix, then analyzed using SVD as a feature in order to identify damage. Following that, the novelty detection framework utilized Proper Orthogonal Modes computed from the SVD of snapshot matrix as an input. Results demonstrated that SVD-based novelty indices could detect all levels of induced damage, including the initial damage level, crash-induced barrier damage, which was not visible during a visual assessment after the incident. The detection tool was further validated using data from two in-situ rural, steel multi-beam, single span bridges scheduled for replacement. The bridges were tested in their original condition as well as after damage was created by flame-cutting chosen beam bottom flanges and webs at mid and quarter spans, with a known weight truck. Measured data was evaluated in the same way as the mock-up experiments, and results demonstrated that utilizing the SVD novelty index approach, induced damage could be accurately identified by sensors in proximity.
Pavan Kumar Ghantasala, Purdue University
Customer Life Time Value Prediction: A Case of Mobile Gaming Company (Slides)
A large mobile gaming company wants to increase its engagement among its player customers. It is planning to target its marketing efforts towards the likely churning customers. So, I have identified the probability of the players who remains active after 30 days and who are likely to become inactive. I have used Fader Lee’s BG/NBD model for calculating customer probabilities.
Using survival analysis, fitting geometric and beta distributions to the duration cycles of the customers, we are able to identify the likelihood of the life time value of players which helps the company in optimizing the promotions tailored and targeted to customers. This results in increased return on investment for the company in its marketing efforts.
Thursday, October 28, 2021 – 1:00–1:45 p.m. CT / 2:00–2:45 p.m. ET
Akshay Kale, University of Nebraska at Omaha
Building interpretable methods to analyze bridge health, deterioration, and maintenance (Slides)
The United States has over 600,000 bridges. Approximately 9.1% of the bridges in the United States are in poor condition and require immediate attention. The rehabilitation process takes over $80 billion and over 80 years. With the lack of resources, there is a need for developing efficient strategies for bridge repair and maintenance. Our research study focuses on understanding insights from over 18 million bridge inspection records from 1992-2020. Our approach focuses on time-series analysis in developing interpretable methods to measure bridge health and identify influential factors and patterns that lead to bridge deterioration and maintenance using machine learning methods.
Ji Young Lee, University of Nebraska–Lincoln
Deep learning vision-based inspection for concrete bridge deficiencies (Slides)
Cracking, spalling, and delamination are the typical deficiencies observed in concrete bridge elements which human inspectors monitor. In the report, the location of these deficiencies or the area of delamination is often recorded. If detailed inspection is required, crack maps may be generated, and additional measurements will be taken at multiple locations as needed. To assist the inspection process, our research group is working towards training a deep learning model called a Mask R-CNN that can assist the health monitoring of aging concrete bridges by detecting the crack location from images. We applied the transfer learning to the model with the benchmark dataset to avoid data insufficiencies, then fed the data collected by the research group with a UAV. The collected images, including deficiencies from the bridge deck and pier, are precisely reviewed and labeled by pixel-level crack location information. The trained model is then evaluated by the test case bridge located at Lincoln, Nebraska, to construct crack maps for further analysis, such as measuring the relative quantity (crack widths). Many current practices in assessing bridge health often rely exclusively on qualitative and subjective data provided through human inspections, which is challenging with limited resources. Our image-based data pipeline can be used to reference how vision-based data analytics can provide useful information for bridge inspections. An image-based deterioration model will be used to link the temporal and spatial changes observed in the datasets in future work.
Ashley Ramsey, University of Nebraska at Omaha
Explaining AI Outputs with a Feature Heatmap
Showing the outputs of an artificial intelligence system in an understandable and comprehensive way is a major challenge in the field of AI. This area of research is called explainable artificial intelligence (XAI) and it is critical to building trust in AI systems and the data they produce. This research showcases an interactive feature heatmap to aid bridge engineers’ understanding of an AI that evaluates which features of a bridge and its environment make it more likely to receive repairs. This is a complex dataset to analyze without post hoc changes to make it more comprehensible. This feature heatmap allows bridge engineers to interact with the data in a deliberate, meaningful way. Users can customize the feature heatmap by imposing specific restrictions on the data including restrictions on which states are displayed, which types of repairs are displayed, which features are displayed, the range of the color scale, and how the data is sorted. Ultimately, these restrictions come together to provide bridge engineers with a highly customizable way to interact with this dataset so they can draw conclusions for their own specific data questions.
Anoop Mishra, University of Nebraska at Omaha
CaNet-Cracks Attention Network: Cracks Detection Using Weak Labels
This paper targets localizing cracks from image labels using CBAM attention module. In past research, several complex algorithms like Mask RCNN, pyramid networks, Fast RCNN, etc., were used to localize and mask the cracks. The detection and segmentation tasks are labor-intensive and costly considering the time and space since image labels like masks are required to localize the cracks. Hence, this paper proposes CaNet, a crack attention network that uses image labels (weak labels) to localize cracks. The CaNet utilizes the grad-cam approach to segment the cracks. The output of the CaNet can be used in bridge health inspection, where crack detection is an essential factor for analyzing bridge health.
Prasad Chetti, University of Nebraska at Omaha
A Population Analysis Approach to Identify Significant Parameters of Highway Bridges (Slides)
Infrastructure development plays a vital role in growth and advancement of any country. American Society for Civil Engineers assigned a C+ grade to highway bridges in the United States of America in 2017. Safety and performance of bridges depend on many parameters including the geographic location, material used, and average daily traffic of the bridge. National Bridge Inventory of the USA maintains the data of more than 600,000 bridges with more than 100 parameters. Deterioration patterns/rates of various bridge elements including substructures vary due to different parameters, though they are of same age. Identifying such critical parameters of the bridges that differentiate the deterioration rates is highly essential to construct either new bridges or to allocate funds for the existing bridges efficiently. We introduce the concept of correlation networks and population analysis to model the bridges into graphs and thereby clustering the bridges with similar deterioration patterns, compare the communities of bridges with one another, and identify the significantly enriched parameters of bridges. A case study was conducted on 1,136 same aged bridges of national bridge inventory. A correlation network graph of bridges was created, and eight candidate communities were extracted using Markov clustering algorithm. Hyper-geometric distribution was applied on these communities to compare them and identify the significantly enriched parameters. Preliminary results show that out of eight communities detected, five communities have at least one significant parameter. The bridge substructures in the southeast region perform better compared to the substructures in the northeast region. Further, the substructures in the northeast region, which made up of wood or timber material are fast deteriorating compared to the substructures in the High Plains region which made up of steel material. The five candidate communities are further divided into two groups, such as high performance and low performance groups based on the average performance of all the bridges in the significant candidate communities. The obtained results are supported by the existing literature and the groups are further validated by factor analysis.