Data Analysis in the Digital Ocean

My last post introduced the idea of the Digital Ocean. We have reached a point where we can digitally gather information from student activities while they learn, rather than use more artificial measures given at separate times to make inferences about students.

What would be the implications of this for data and data analysis?

Currently, standardized assessments give us small, periodic samples of data. We gather data about millions of students, but the actual sample from each one is tiny. How many questions about any particular math skill are on an end-of-year standardized state assessment? Not many. This absence of data requires us to make large inferential stretches about what students can and cannot do. It has led to an entire science of how to make inferences from small samples.

However, if we can imagine ubiquitous data collection from student learning activities, we end up with entirely different data challenges. What if we have a record of every student’s activity in solving math problems in a simulated environment three days a week for an entire school year? And we don’t just have their answers, but information showing us how they solved the problem. Now we have much less need to make inference about the students’ knowledge, skills, and abilities. However, we are faced with the problem of how to interpret this vast amount of data. We need to identify detailed mechanisms and rules for making inferences from the collected actions and ways to communicate these results in a meaningful way.

In our work in the Cisco Networking Academy, we are just beginning to make steps toward this vision. For example, we have an online performance-based assessment built on a simulation tool called Packet Tracer. (Note that this is still given as an assessment, but the simulation tool is also embedded in the curriculum and the next challenge is to gather data from that activity too.) As a result of this performance assessment, we have both the final network configuration that students submit as their answer and all of the commands they used to configure the devices on the network to get to that solution. On one hand, we can score various pieces of the final configuration. On the other hand, we want to provide feedback about the process used to get to that response.

This has led us to work on trying to identify meaningful patterns in log files that can be summarized and reported back in an automated fashion. Some of this feedback might be in relatively simple terms. For example, we might find the average number of commands used to complete the problem was 92 and a given student used 158. We can provide feedback that their solution was not perhaps the most efficient solution. We can then dive deeper and start tagging commands as to their purpose and look at patterns of configuration of devices and verification of their configuration. This allows us to provide feedback about whether students might wish to test their solutions as they go, rather than waiting until the end of the problem.

The point is, if we start looking at activities of any complexity, just one activity can be deconstructed into a huge number of actions that could potentially provide feedback to instructors and students. The task of making sense of this and providing automated feedback (which is the only way this can scale) presents challenges in data mining, evidence identification, and evidence accumulation that are far different from those faced with previous assessment paradigms.

Posted in Research Tagged with: ,

Leave a Reply

Your email address will not be published. Required fields are marked *