Teaching doctoral researchers to collect digital data — and why it matters

The Economic and Social Research Council's (ESRC) 2022 Postgraduate Training and Development Guidelines identified clear skill gaps among social science doctoral students — particularly in digital skills, data management, and analysing large and complex data. The guidelines call for "innovation in core training content and delivery" and encourage Doctoral Training Partnerships to source specialist training from external providers. In February 2026, Braw Data delivered a one-day computational methods course for the Scottish Graduate School of Social Science (SGSSS), teaching PhD students how to collect data from the web using Python and R.

The course — "Collecting Digital Data: The Role of Web-scraping and APIs" — covered four topics across lecture-practical pairs: how the web works, what APIs are, the API landscape, and LLMs as coding assistants. No prior programming experience was required.

Figure 1 shows the final session on using large language models (LLMs) as coding assistants — a topic not yet covered in most methods curricula but increasingly central to how researchers write and debug code.

Figure 1. Lecture slide from the session on LLMs as coding assistants

Figure 2 shows part of the hands-on API practical, where students made their first programmatic request to the UK Police API using Python. In a few lines of code, students retrieved structured data on every police force in England and Wales — the kind of data collection task that would take hours manually.

Figure 2. Jupyter notebook showing students' first API call to the UK Police API

The ESRC is right that doctoral training must keep pace with how research is actually done. Web scraping, APIs, and LLM-assisted coding are not niche skills — they are becoming core to social science research practice. Short, intensive courses delivered by specialist providers offer one practical route to closing the digital skills gap, without requiring Doctoral Training Partnerships to build all capacity in-house.

Explore the open-access course materials on GitHub: https://github.com/SGSSSonline/collecting-digital-data

Previous
Previous

Charities rose and fell together — what does that mean for policy?

Next
Next

Scotland's charity sector is restructuring — but are social enterprises filling the gap?