By Matt Fall
Executive Director, Center for Data Science, Lansing Community College
Recently, the American Statistical Association (ASA), with support from the National Science Foundation (NSF), hosted a two-day summit in Washington D.C. to discuss outcomes and curricula for data science programs at two-year colleges. The Two-Year College Data Science Summit (TYCDSS) was intended to help spur the growth of data science programs at these institutions and included representatives from two and four-year institutions, government, and industry.
The summit included several plenary talks discussing the role of two-year colleges in addressing the need for data scientists as well as a brief presentation from a graduate of a community college data science program. The majority of the summit, however, was devoted to a series of working sessions where the participants discussed ideal outcomes and competencies for three categories of students:
- Category 1: students intending to complete an Associate’s degree and begin working
- Category 2: students intending to earn an Associate’s degree and transfer to a 4-year program
- Category 3: students seeking a certificate
The working discussions provided an opportunity for the summit participants to discuss what was expected and feasible for a student from each category to complete. The discussions were captured by a designated writing group and there will be a forthcoming write-up summarizing the recommendations of the summit participants with guidelines for two-year college data science programs.
This summit was particularly timely for my colleagues at Lansing Community College (LCC) as we have recently begun development of a data science program. Prior to the summit, participants were provided access to a list of resources that included relevant research, reports from related workshops, and sample syllabi. Of particular interest to us, as we design the layout of our program, were the Park City Math Institute’s Curriculum Guidelines for Undergraduate Programs in Data Science (2016) [PDF], the Oceans of Data Profile of the Data Practitioner (2016), and the Oceans of Data workshop report on Building Global Interest in Data Literacy (2016). The resources provided, candid discussions with other two-year colleges regarding their programs, and the discussions about realistic competency expectations were also of interest and informative to our program design.
The intent of the TYCDSS directly supports the MBDH’s priority area of interest in data science, education and workforce development. Two-year colleges provide higher education accessibility to many students who could not or would not otherwise pursue an advanced degree. An increasing number of these schools are offering certificate and Associate’s degree programs in data science and analytics to support growing workforce demand. Growth in these types of programs should naturally lead to an increase in data competency, enrollment in university programs, and larger hiring pools for data science based careers.
- American Mathematical Association of Two-Year Colleges’ Data Science Resources Page
- National Academies of Sciences, Engineering, and Medicine: Data Science for Undergraduates: Opportunities and Options
First URSSI workshop attendees (Credit: Mike Hucka)
The NSF-funded conceptualization of a US Research Software Sustainability Institute (URSSI) is making the case for and planning a possible institute to improve science and engineering research by supporting the development and sustainability of research software in the US.
Research software is essential to progress in the sciences, engineering, humanities, and all other fields. In many fields, research software is produced within academia, by academics who range in experience and status from students and postdocs to staff members and faculty. Although much research software is developed in academia, important components are also developed in national laboratories and industry. Wherever research software is created and maintained, it can be open source (most likely in academia and national laboratories) or commercial/closed source (most likely in industry, although industry also produces and contributes to open source.)
The open source movement has created a tremendous variety of software, including software used for research and software produced in academia. This plethora of solutions is not easy for researchers to find and use out-of-the-box. Standards and a platform for categorizing software for communities are lacking, which often leads to novel developments rather than reuse of solutions. Three primary classes of concern are pervasive across research software in all research disciplines and have stymied research software from achieving maximum impact:
- Functioning of the individual and team: issues such as training and education, ensuring appropriate credit for software development, enabling publication pathways for research software including novel methods beyond “classical” academic publications, fostering satisfactory and rewarding career paths for people who develop and maintain software, increasing the participation of underrepresented groups in software engineering, and creating and sustaining pipelines of diverse developers.
- Functioning of the research software: supporting sustainability of the software; growing community, evolving governance, and developing relationships between organizations, both academic and industrial; fostering both testing and reproducibility, supporting new models and developments (for example, agile web frameworks, software as a service), and supporting contributions of transient contributors (for example, students).
- Functioning of the research field itself: growing communities around research software and disparate user requirements, avoiding siloed developments, cataloging extant and necessary software, disseminating new developments, and training researchers in the usage of software.
The goal of this conceptualization project is to create a roadmap for a URSSI to minimize or at least decrease these types of concerns. To do this, the two aims of the URSSI conceptualization are to:
- Bring the research software community together to determine how to address the issues about which we have already learned. In some cases, there are already subcommunities working together on a specific problem, including those that we are part of, but those subcommunities might not be working with the larger community. This leads to a risk of developing solutions that solve one issue but don’t reduce (or might even deepen) other concerns.
- Identify additional issues URSSI should address, identify communities for whom these issues are relevant, determine how we should address the issues in coordination with the communities, and determine how to prioritize all the issues in URSSI.
We are not working in a vacuum, but with other like-minded projects. In addition to Better Scientific Software (BSSw) and activities around research facilitators (ACI-REF) in the US, there are two ongoing institutes in science gateways (SGCI) and molecular sciences (MolSSI); a recently completed conceptualization in high energy physics (S2I2-HEP); two other conceptualization projects now underway in geospatial software and fluid dynamics; and a large number of software development and maintenance projects. In the UK, the Software Sustainability Institute (SSI), which has been in operation since 2010, is an inspiration and a potential model for our work.
Given these existing activities, part of our challenge is to define how we will work with these other groups. For example, we might decide that they perform an activity so well that we should point to it, such as the SSI’s software guides. Or we might decide to either duplicate or enhance an activity they do to expand its impact, such as working with the SGCI to offer incubator services to a wider community than just gateway developers. Or we might decide to collaborate with one or more groups, such as on policy campaigns aimed at providing better career paths for research software developers in universities.
We have held one workshop and are planning three more, in addition to a community survey we plan to have out soon, and a set of ethnographic studies of specific projects. We are communicating through our website, a series of newsletters, and a community discussion site.
URSSI welcomes members of the research software community to join us, both to help us determine how to proceed and to directly contribute. Please sign up for the URSSI mailing list, contribute to our discussions, and potentially publish a guest blog post on the URSSI blog on a topic around software sustainability.
Today we are launching a new MBDH Community Blog, which is intended to extend information sharing around events and projects, as well as expand our channels for Community conversation.
We plan to run 1-2 posts per month, and we are now seeking submissions from the MBDH Community – including the Spokes and our other collaborative projects – that describe your contributions and developments in the broader data ecosystem. Of interest are short reports and highlights from data-related meetings, events, or project outcomes, inclusive of the role and impact of the MBDH for these efforts.
We welcome contributions from the Social Sciences and Humanities, including short contributions that address data and algorithmic ethics, or coming changes for work, daily life, and public engagement in U.S data policy.
We encourage submissions from practitioner and NGO perspectives, as well as those from academia, industry, or government. We will provide additional guidelines shortly. If you are interested in submitting a Blog post, please send your contact information and the subject area to: firstname.lastname@example.org
Our first guest post is by Daniel Katz, Assistant Director for Scientific Software and Applications at the National Center for Supercomputing Applications (NCSA). Check out his post on the US Research Software Sustainability Institute (URSSI) project.
Finally, I’ll note a couple of activities where we are currently seeking input and engagement:
Add your voice to our Midwest Big Data Hub evaluation
- To create a robust strategic plan for the Midwest Hub.
- To plan toward long-term sustainability, especially financial sustainability, for the Midwest Hub.
- Provide your input here: https://www.surveymonkey.com/r/MBDHSurvey
Participate in our election of five (5) At-large representatives for the MBDH Steering Committee: http://midwestbigdatahub.org/2018-steering-committee-at-large-nominees/
As always, please contact us with any ideas or questions.
Thank you for your continued support!
All the best,
Executive Director, Midwest Big Data Hub
Midwest Big Data Summer School reveals how big data can advance research efforts
By Paula Van Brocklin, Office of the Vice President for Research, Iowa State University
“The summer school seeks to bridge the gap between scientists and engineers using data science technology by introducing them to data science techniques and vocabulary,” said Hridesh Rajan, lead organizer of the Midwest Big Data Summer School and professor of computer science at Iowa State. “The idea is to help these individuals better communicate and leverage their data-science needs.”
The school’s first three days introduced attendees to a range of big data topics, including data acquisition, data preprocessing, exploratory data analysis, descriptive data analysis, data analysis tools and techniques, visualization and communication, ethical issues in data science, reproducibility and repeatability, and understanding domain/context.
On the final day, participants selected one of four tracks, which focused on a sub-area of big data analysis. The tracks were:
- Foundations of Data Science
- Software Analytics
- Digital Agriculture
- Big Data Applications
Several individuals at Iowa State were instrumental in developing and organizing the tracks’ curricula. Click here for a list of those involved.
Keynote presenters at this year’s summer school were:
- Chid Apte, director, Mathematical Sciences and Blockchain Solutions, IBM Research
- Tom Schenk, chief data officer, City of Chicago
- Jacek Czerwonka, principal software engineer, Microsoft Research
- Will Snipes, principal scientist, ABB Research
A complete list of speakers, including their bios, is available here.
Data science evolving quickly
The field of big data, also referred to as data science, is relatively new yet advancing quickly. For this reason, organizers encourage researchers and scientists to learn as much as they can through resources like the Midwest Big Data Summer School.
“Our aim is for early career researchers and professionals – both in academia and industry – to get a taste of what it’s about, what the state of the art is and how they can start thinking about using data science in their own domains,” said Chinmay Hegde, assistant professor of electrical and computer engineering at Iowa State and a co-organizer of the summer school.
Rajan recognizes the summer school would not be possible without the help of many.
“We are especially thankful for the Midwest Big Data Hub, the National Science Foundation, the Office of the Vice President for Research, Iowa State’s College of Liberal Arts and Sciences, and the departments of computer science and statistics for providing both funding and personnel support for this event.”
Plans are in the works for the 2019 Midwest Big Data Summer School, though no dates have been set. Rajan said more application-specific tracks may be added to next year’s curriculum. Watch the Midwest Big Data Summer School website for more details in the spring of 2019.
Reposted from Iowa State University’s Research News blog. View the original post here.
The Midwest Big Data Hub and the three other regional Big Data Innovation Hubs are partnering with the National Science Foundation and Johns Hopkins University on development of a new nationwide research data network called the Open Storage Network. Partners include Alex Szalay, lead PI (Johns Hopkins), Ian Foster (University of Chicago), the National Data Service (NDS), and five supercomputing centers within the Big Data Hubs’ regions.
The official NSF press release is available here.
The Johns Hopkins story is here.
Links to partners:
- National Data Service (NDS)
- National Center for Supercomputing Applications (NCSA)
- San Diego Supercomputer Center (SDSC)
- Renaissance Computing Institute (RENCI)
- Massachusetts Green High Performance Computing Center (MGHPCC)
- Pittsburgh Supercomputing Center
Solving complex data challenges require innovative cross-border, multi-sector partnerships
by Melissa Cragin, Ph.D
Executive Director, Midwest Big Data Hub
Complex data challenges facing the Great Lakes region in the era of big data transcend industries, applications, and borders. While data is increasingly borderless, borders and barriers still present substantial problems to industry, academic, and government initiatives that are dependent on data policy and governance processes that structure access and use. These challenges require innovative cross-border, multi-sector partnerships that can leverage the benefits of shared high performance computing resources and cyberinfrastructure services. Read More »
Part of the Hub’s engagement on connected communities
Melissa Cragin, Executive Director of the Midwest Big Data Hub, and Alice Delage, Program Manager and Community Liaison, were invited by Hub partner US Ignite to attend the Smart Cities Connect Conference & Expo and co-located US Ignite Application Summit in Kansas City on March 26-29.
The Midwest Big Data Hub is pleased to announce a new strategic partnership with the bi-national Council of the Great Lakes Region. Hub affiliates will also be participating in the Council’s Great Lakes Economic Forum in May 2018. The details of the partnership are below (PDF version here).
UIUC collaborators and mentors meet with HackIllinois teams on US Ignite Challenge
The University of Illinois at Urbana-Champaign (UIUC) was awarded a $20,000 grant from US Ignite to host a Smart Gigabit Communities Reverse Pitch Challenge. The MBDH, along with other local partners (see below), contributed towards matching the grant, bringing to $40,000 the total resources available to support the development of smart gigabit applications for the benefit of the local community. Read More »
Co-located with the annual Midwest Big Data Hub All-Hands meeting in Omaha in 2017, 15 experts from several disciplines gathered on October 3rd to discuss community-oriented research topics, data and methods, and community engagement. They shared ideas and suggestions to launch a program focused on Smart & Resilient Communities. Read More »