a guide to using the online tools
In undergraduate classrooms, traditional STEM training rarely requires in-depth knowledge of quantitative analysis, especially “big data”; these skills are often developed much later. But understanding data science — broadly speaking math, statistics, and coding — is increasingly fundamental for STEM scientists at all levels. It provides a valuable and transferable skill that can be applied across many different industries. What resources and techniques are available to help students master these vital skills?
Learning to code can be daunting; first attempts can be frustrating and require repetition and feedback to gain confidence. Often, learners progress at very different rates, depending on their previous experience and confidence. Therefore, online or blended learning, combining self-paced workbooks with the touch of an instructor, can be the ideal environment for teaching data science. Fortunately, there are many platforms and tools to enable successful blended learning, and I present them with some valuable tips for successful data science learning.
Spark interest in data science
Live analysis demonstrations as early in the course as possible help students connect with the material and allow instructors to highlight the most important aspects of their class. Data visualizations are a fun and engaging way to show what learners can accomplish with coding skills. Including intentional errors and your error checking and debugging process is a great addition here.
Going through common mistakes can humanize you as an instructor and demonstrate that learners often write code that doesn’t produce the expected results. You can display your techniques to understand errors and communicate that it takes time and patience.
Building a simple narrative is a handy tool for building engagement. As with all hands-on courses, authentic problems work best and care should be taken to develop projects that are real to students and fit well with the rest of their curriculum.
learn to code
Data visualizations are very useful as a starting point for learners to work with data. Wickham & Grolemund begin their book, R for Data Science, with a chapter on visualization. This approach allows learners to create something they can share immediately.
Getting students to start writing code early and often, with frequent and visual feedback, is key to building engagement and confidence. Frequent low-stakes quizzes and tasks are key to maintaining engagement and building trust.
Create effective online learning environments
Barriers to entry include checking system requirements, downloading and installing required software, and installing related packages. It is easy to assume that students will be intuitively able to use computer software, but this is often not the case.
Interactive tutorials allow learners to run their analyzes immediately and receive instant feedback without having to navigate a new coding environment. The Binder Project is a software project for packaging and sharing reproducible interactive environments. A Binder repository can contain all the dependencies, tools, and data needed to launch interactive sessions. Access is through a web browser and can be initiated regardless of a student’s local computing resources. This configuration has been used successfully to run many big data course during periods of confinement in the first waves of the pandemic
The community around the R coding language, in particular, has developed many tools for education and interactivity. There is a dizzying array of resources and interactive worksheets for new learners. The learnR package is very easy to create interactive tutorials that host videos, text, images, quizzes and code snippets. Students can download these files to run on local computers, deploy them with Binder, or as a shiny application – an R package makes it easy to create an interactive website apps directly from R.
Eventually, students should be introduced to a standard graphical user interface (GUI) or command-line interface. Cloud services such as Amazon Web Services, Google Cloud, or the RStudio cloud can host programming environments and facilitate the deployment of files and dependencies while removing local system limitations. RStudio Cloud has an educational license and allows rapid deployment of data projects with pre-installed dependencies. Instructors can also remotely access any student project to track progress or troubleshoot issues.
The construction and maintenance of these courses will have a cost; cloud services are limited in available hours and memory space before a subscription is required. Although initially free, Binder environments are limited in terms of the amount of data that can be stored and the processing power that can be applied.
Organization of learning resources
Finally, it is essential to have a well-organized “hub” that reminds learners where to find relevant resources, access course materials, and contact their instructors. You may have access to a proprietary learning system such as Blackboard – and this may be the preferred option, especially if learners are familiar with this environment. However, GitHub Pages and GitHub Classroom, or bookdown or blogdown course websites, which are R packages that build websites directly from R, are all viable alternatives and offer the ability to integrate interactive tutorials and other HTML widgets.
Learning data science can be challenging, but basic pedagogical principles apply: remove barriers to access, encourage student autonomy, and foster trust. Discussions around responsible data science should be encouraged, for example, concerns about the iris dataset. Now is a great time for us to refresh our approach to data science with the tools to improve pedagogy for online and blended learning.
Philip Leftwich is a Lecturer in Genetics and Data Science at the University of East Anglia.
If you found this interesting and would like to receive advice and insights from academics and university staff straight to your inbox each week, sign up for THE Campus newsletter.