Automated Coding of Communication Data with ChatGPT using a Hierarchical Coding Framework

Cui, Wenju; Hao, Jiangang; Jiang, Yang; Kyllonen, Patrick  Charles; Kerzabi, Emily

doi:10.3389/feduc.2026.1764154

ORIGINAL RESEARCH article

Front. Educ.

Sec. Assessment, Testing and Applied Measurement

Automated Coding of Communication Data with ChatGPT using a Hierarchical Coding Framework

Wenju Cui ¹

Jiangang Hao ²

Yang Jiang ²

Patrick Charles Kyllonen ²

Emily Kerzabi ²

1. ETS, Princeton, United States
2. Educational Testing Service, Princeton, United States

The final, formatted version of the article will be published soon.

Abstract

Coding communication data is essential for assessing 21st-century skills such as collaboration and communication, but large-scale human coding is labor-intensive. Large language models (LLMs) such as ChatGPT offer a scalable alternative, yet their accuracy depends on both coding framework complexity and prompting strategy. Using a communication coding framework with five main categories and seventeen subcategories, we compared two prompting strategies: a hierarchical strategy that first assigns main categories and then codes subcategories, and a direct strategy that directly codes subcategories in a single step. Coding accuracy was evaluated against human coding using Cohen's Kappa and mixed-effects logistic regression. Both strategies achieved agreement comparable to human–human reliability (overall κ ≈ 0.57–0.59). However, direct prompting consistently outperformed hierarchical prompting, yielding an approximately 18% increase in the odds of agreement. Hierarchical prompting was more susceptible to error propagation when main categories were misclassified, whereas direct prompting produced more stable subcategory coding. These results provide guidance for using LLMs to code communication data under complex coding frameworks.

Summary

Keywords

automated coding, ChatGPT, Communication, Hierarchical framework, Prompt Engineering

Received

09 December 2025

Accepted

20 February 2026

© 2026 Cui, Hao, Jiang, Kyllonen and Kerzabi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jiangang Hao

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Assessment, Testing and Applied Measurement

ORIGINAL RESEARCH article

Automated Coding of Communication Data with ChatGPT using a Hierarchical Coding Framework

Abstract

Summary

Outline

Article metrics

ORIGINAL RESEARCH article

Automated Coding of Communication Data with ChatGPT using a Hierarchical Coding Framework

Abstract

Summary

Outline

Share article

Article metrics