Data Infrastructures for Learning Analytics

Building for Consistency, Scalability, and Reproducibility.

An ECTEL25 Workshop.

Important Note: If you want to discuss your architecture, you can submit a short paper describing your architecture. Your paper should contain a short example for an experimental setup, a description of the components you use and the purpose they fulfill, and how you store your data in the long term. It should have 1-2 pages of text + 1 overview diagram + references, CEUR Template.

Table of Contents

As we implement our Learning Analytics experiments, throughout our team, the same question you might be familiar with comes up regularly: what technology do we choose? There can’t be an answer to this that’ll always be the same. Still, we came to experience that in Learning Analytics research, our requirements usually have enough overlap that it makes sense to standardize, especially on the data management layer.

We always want to generate data sets that are easily accessible to our researchers. We always want to access our data well after we take down the application instances were the data was collected. We always want our data to be in a well-known format that is widely supported in the tools we like. We want to have total control over learner data, since we consider being sensitive. And we also want to store the data that is generated by our experimental models and algorithms and that was shown to students right beside the data on the basis of which the outcomes were computed – especially when it was shown to learners as part of the experiment. So wouldn’t it be awesome if there was a standard way to do all this? Better yet: to also have a standard way that helps us with the anonymization and creation of public data sets?

Finding a Solution That Fits Your Needs

Today, there are many open source solutions to solve these problems. Often, you will need many of these solutions and stick them together with your own custom glue code. Confronted with this, we went onto a research journey to figure out what open source solutions to extract, transform and load data into a shared, unified store there are. In our workshop at ECTEL25, we are keen to share this knowledge with you, so you can build or improve your own LA infrastructure. To give you an overview, we will show you various categories of components in the open source solution space, showing the most popular components and highlighting our favorite ones. We’ll show you how you can combine them to build a solid foundation for LA experimentation and productions scenarios and we want to discuss the interpersonal challenges of such a standardization process.

Finally, we also want to help you, the participant, to improve your own infrastructure. Together with the other participants, let’s discuss your approach to LA data infrastructure! If you want to discuss your architecture, you can submit a short paper describing your architecture. Your paper should contain a short example for an experimental setup, a description of the components you use and the purpose they fulfill, and how you store your data in the long term. It should have 1-2 pages of text + 1 overview diagram + references, CEUR Template. You can then present your LA infrastructure and discuss it with others in groups.

Oraganizational Details

DEADLINE for Your Proposals: 10.09. 23:59 AOE

Mail receiver for your proposal: menzel[A T]studiumdigitale.uni-frankfurt.de

Date/Time: September 15, 09:00 to 12:30.

Location: Room 2

TimeActivity
09:00 to 10:30Input Presentation: Navigating the current Open Source Solution Space to Data Infrastructure and the Problems We Faced at studiumdigitale and the DIPF
10:30 to 11:00Break
10:45 to 12:00Workshop Attendee Presentations and Group Work: Improving your Infrastructure
12:00 to 12:30Feedback
12:30Lunch