Advanced Text Analysis Using Natural Language Processing (10SMSTS-603)
Description
This online course is designed for beginners who are curious about how to analyze and make sense of large amounts of text data, especially in the field of health research. No previous knowledge in text analysis is required—just an interest in learning new ways to work with data.
In this course, students will explore how to uncover hidden topics in text in data-driven fashion (like finding themes in health articles) using a technique called topic modeling. They will also learn how to create simple tools (called classifiers) that can automatically sort and categorize text into different groups. The course will cover some basic ideas from natural language processing (NLP), which is the engine behind e.g. chatbots and search engines. Throughout the course, students will work with real examples from health research, but are also welcome to bring their own data if they have it.
The course consists of four online sessions, supplemented by project-based work between sessions. It will combine short lectures, group discussions, and hands-on, large-scale text analysis. We will work with open source language models from the Hugging Face AI platform (https://huggingface.co/). We will use the BERTopic Python library for topic modeling and the Transformers Python library for classifier training. Throughout the course, students will work on a group project that they will present in the final session.
Target group
MA, PhD
ECTS Credits
1 ECTS
Course catalogue
You can find more information about the module here.