Master Thesis: Topic-Aware Semantic Segmentation and Learning Objective Induction

dated

2026-04-21

On March 12, 2026, Thomas Plangger successfully defended his master’s thesis.

In his work, he developed an end-to-end pipeline that transforms unstructured educational PDFs into structured learning content. The system applies large language models to identify semantic boundaries in texts and segment them into coherent chunks. These are subsequently grouped into higher-level Learning Objectives, forming the basis for automatically generated lessons and quiz prototypes.

To evaluate the approach, a synthetic benchmark dataset was created, enabling systematic and reproducible testing of segmentation quality. The results show that LLM-based methods can reliably detect meaningful topic transitions and outperform heuristic baselines, although challenges such as over-segmentation in longer documents remain.

The work highlights the potential of LLMs to make existing educational resources more accessible and reusable for digital learning environments.

Full Master Thesis in Repository

The slides from his presentation

2026-MasterThesisPresentationPlangger Download