
Gavin Brooks
About
No profileSessions
Workshop Simple NLP for Learner Corpus Analysis: A Hands-On Workshop more
In applied linguistics, particularly within ESL and EFL contexts, the ability to develop and analyse learner corpora has become increasingly important (Deshors & Gries, 2020). Learner corpora, structured collections of texts produced by language learners, can provide invaluable insights into language development and inform pedagogical strategies. Integrating these corpora with natural language processing (NLP) techniques, such as part-of-speech tagging and lemmatisation, enhances the automated calculation of lexical richness measures, thereby offering more nuanced assessments of EFL written (Spring & Johnson, 2022) and spoken (Kyle, 2021) proficiency. Moreover, NLP can help develop targeted pedagogical materials (Granger et al., 2007). This hands-on workshop introduces participants to foundational Python programming and provides practical experience with NLP libraries, including the Natural Language Toolkit (NLTK) and spaCy. The workshop will cover installing and employing NLTK and spaCy to analyze textual data for lexical properties such as tokenization and part-of-speech tagging. No prior programming knowledge is required; the workshop is designed to guide participants through a series of tasks within a shared Jupyter Notebook. We will use a Google Colab notebook explicitly developed to provide participants with step-by-step instructions for writing simple code in Python and importing and analyzing text in a practical, hands-on learning environment.


Presentation Speed Reading in the Digital Age: Advice for Teachers more
Reading speed is a crucial yet often overlooked component of reading fluency (Tran, 2012). Research shows that learners can significantly increase their reading speed in as few as 20 sessions of 10 minutes each (Chang, 2010; Chung & Nation, 2006). While most studies focus on printed texts, little attention has been given to the impact of reading medium (i.e., paper vs digital devices) on fluency development. This presentation explores the pedagogical implications of medium choice in speed reading instruction, and offers practical advice for implementing a digital speed reading program in the EFL classroom. It reports on a study in which 68 university students in Japan read short stories using paper copies or ESL Speed Readings, a free mobile application. After six weeks, students switched formats and reflected on their experiences in questionnaires. Findings suggest digital reading enhanced fluency development and offered practical classroom advantages, with both students and teachers expressing positive attitudes toward the medium. Based on these insights, a framework for implementing an effective speed reading program will be outlined. Attendees will receive practical guidance, including access to free speed reading software and a learner management system, to help integrate digital speed reading into their teaching contexts.


Presentation Streamlining Learner Corpus Development with LLMs and NLP more
While recent studies have explored how learner corpora can help teachers develop materials that meet learners’ needs (Brezina et al., 2022), there remains a need for targeted learner corpora that provide insights into specific groups of learners (Götz & Granger, 2024). However, cleaning a corpus for analysis is time-consuming (Brezina et al., 2019). This presentation demonstrates a Large Language Model (LLM) powered corpus cleaning workflow that can address this by using advances in LLMs and Natural Language Processing (NLP) tools like spaCy and Stanza to streamline the process. Our approach addresses this by integrating LLMs with NLP libraries to identify spelling errors, classify words (e.g., proper nouns, technical terms, foreign words), and apply structured markup. Leveraging API-based processing from modern LLMs like Claude or ChatGPT, this approach allows these LLMs to assist with the systematic analysis and cleaning of a corpus. This presentation showcases the workflow in action. By using a subset of texts from our existing learner corpus, along with a cleaned and annotated gold-standard version of these texts, we will illustrate how LLMs facilitate preprocessing and structuring learner corpora. The results suggest that this method enhances efficiency and consistency, allowing researchers to focus on linguistic analysis rather than data cleaning.
