The goal of this project in the TEEL lab is to develop a portable and interoperable online learning ecosystem that enables effective and efficient learning that leverages social interactions between students as a substantial learning resource. Furthermore, in addition to large scale software development, the lab conducts studies of student learning and evaluates innovative approaches for incorporating social learning as a driver for developing cognitive skills and motivation through reflection, interaction, and cohort building.
The Data Engineer is responsible to design and build the infrastructure to describe, collect and exchange learning activity data and Technology Enhanced Learning (TEL) tool usage data from various resources to enable research experiments. To support these research-driven endeavors, one of the team's goals is to use evidence-based learning science to create effective online learning. This approach requires ongoing data collection and analysis of the learning process.
You will be responsible for the requirements gathering, design, and implementation of a data pipeline to support learning research. Such a data pipeline is composed of data ingestion, data persistence technologies (in the form of data lake, data warehouse, etc.), ETL, and analytics tools to aggregate and consume various data streams (Spark, Hadoop, etc.). You will also be required to design and build a logging collection, storage and analytics solution to support online education platforms that collect course-based learning logs from both Learning Management System (LMS) microservices and external cloud platform logs. The framework will provide function-specific interactive services for data querying and visualization, report generation and notification.
Design and adopt a logging standard (e.g., IMS Caliper) to model learning activity events with relevant context from various sources that help facilitate learning.
Establish a common vocabulary for describing learning interactions (including social interactions between students in a variety of channels, beginning but not limited to text-based interactions) to prompt data interoperability and sharing.
Collect heterogeneous logs from various sources and move into a data lake. The logs include but are not limited to:
Submission attempts and grading outcomes evaluated by an auto-grading service.
Student enrollment and demographic data.
Student pageview logs on the LMS platform.
Discussion Forum contributions
Interaction data from Blogs (including editing behavior logs)
Cloud resource usage logs on cloud platforms such as Azure.
Logs from other microservices.
Perform Extract, Transform and Load (ETL) to transform various logs into the common logging standard and store them into a data lake using the fully managed ETL service.
Define the data schema and store structural and operational metadata using a metadata repository.
Analyze and visualize the transformed logs to answer research questions which may depend on the insight to:
Annotation of textual data
Application and extension of text mining tools
Validate through statistical significance which behaviors and content correlate with performance and consistently produce the desired learning outcomes.
Compare the effectiveness of different content or interaction types through A/B testing.
Establish predictive measures to support early intervention systems.
Embed data hooks into the LMS platform and the TEL tools to feed the data to the new data pipeline.
At least a Bachelor's degree or higher in computer science or related field.
At least 3+ years of professional experience with at least 1 year spent building, deploying and troubleshooting logging pipelines for production systems using the ELK stack (ElasticSearch, Logstash, and Kibana) and related technologies (Fluentd, Kafka, etc.)
At least one (1) years of experience applying basic text mining or NLP tools
At least one (1) years of experience with commercial cloud services including Amazon Web Services (AWS), Google Cloud Platform (GCP) or Microsoft Azure.
Demonstrated skills in agile development for scalable ETL pipelines.
Experience with Test Driven Development.
Experience with RESTful web services.
Experience with RESTful API specification and the toolset (OpenAPI, Swagger).
Experience working in an agile development environment.
Experience building LMS and/or TEL tools and embedding data hooks to predict student performance and evaluate the efficacy of educational experiments to improve student learning.
Experience of quantitative and/or qualitative research methods and educational data mining, especially in the area of Discourse Analytics.
Exposure to logging standards such as IMS Caliper and xAPI to produce learning analytics.
Familiarity with CI/CD tools (Jenkins, Travis CI), containerized microservices (Docker, Kubernetes), serverless, and infrastructure automation tools (Terraform).
Familiarity with data lakes and data warehousing.
How we work in the TEEL Lab:
Learner-centered decision making.
Fast-paced research-based environment.
Ability to work independently, take ownership of tasks and deliver high-quality work.
Effective collaboration within a team environment.
Effective project and time management skills.
Ability to respond to urgent requests for deployed services.
Ability to communicate with engineers, researchers, students, and CSP partners.
Internal Number: 2012019
About Carnegie Mellon University
Carnegie Mellon (www.cmu.edu) is a private, internationally ranked research university with programs in areas ranging from science, technology and business, to public policy, the humanities and the arts. More than 12,000 students in the university’s seven schools and colleges benefit from a small student-to-faculty ratio and an education characterized by its focus on creating and implementing solutions for real problems, interdisciplinary collaboration and innovation. A global university, Carnegie Mellon’s main campus in the United States is in Pittsburgh, Pa. It has campuses in California’s Silicon Valley and Qatar, and programs in Africa, Asia, Australia, Europe and Mexico.