THESIS 

 
LEXVET-ESP: DEVELOPING A NEW VOCABULARY TEST AND THE INTEGRATION 

OF VR FOR SPECIALIZED VOCABULARY ACQUISITION 

 
Submitted by 

Paula Izquierdo García 

Department of Languages, Literatures and Cultures 

 
In partial fulfillment of the requirements 

For the Degree of Master of Arts 

Colorado State University 

Fort Collins, Colorado 

Summer 2025 

 
Master’s Committee:  

Advisor: Alyssia Miller De Rutté 
 
Shannon Zeller  
Lily Edwards-Callaway 
 

Copyright by Paula Izquierdo García 2025 

All Rights Reserved 

 
 ii 
 

ABSTRACT 

 
LEXVET-ESP: DEVELOPING A NEW VOCABULARY TEST AND THE INTEGRATION 

OF VR FOR SPECIALIZED VOCABULARY ACQUISITION 

 
The number of Spanish speakers in the U.S. continues to increase, which leads to a 

growing population of individuals who do not speak English as their primary language, making 

access to various services, including veterinary care, difficult. To address this issue, Languages 

for Specific Purposes (LSP) courses have been developed to train (future) professionals to speak 

in their client’s preferred language. Alongside the rising popularity of LSP courses, technological 

advancements have also been implemented in language education. Virtual reality (VR) platforms 

and artificial intelligence (AI) are two examples of relevant tools in second language learning as 

they allow for the integration of immersive and interactive experiences. This thesis aimed to 

combine LSP, particularly in Spanish for Veterinary Medicine, and technology to understand the 

potential effects on language learners’ vocabulary acquisition and retention. The thesis was 

divided into two projects. The first project focused on developing an assessment tool to test 

learners’ receptive vocabulary knowledge, defined as words that learners can recognize even if 

they cannot yet define or use them in different contexts. This assessment builds upon previous 

studies, which created validated and reliable instruments for measuring vocabulary and 

proficiency levels. A specialized vocabulary test for Veterinarian Spanish (known as the LexVet-

Esp) was developed and validated. The second project implemented a VR/AI platform as part of 

a Spanish for Veterinary Medicine course and used the LexVet-Esp to assess the impact of the 

technology on vocabulary acquisition and retention and whether explicit or implicit vocabulary 


 iii 
 

exercises contributed to language development. To examine this, students completed the 

vocabulary test four times. The first test, administered during week nine of a sixteen-week 

semester, served as the pre-test as participants had not yet interacted with the VR/AI platform. 

Students took the test again in week twelve after completing three weeks of explicit vocabulary 

exercises on the platform and again in week fifteen after completing implicit vocabulary 

activities. Finally, a fourth test was conducted three months after the end of the course to 

determine long-term vocabulary retention. The results did not show significant differences across 

the tests, which may have been influenced by the short length of the study (six weeks) and the 

small sample size (n=15). Despite the lack of statistically significant differences across the tests, 

results indicated varying complexities related to the vocabulary acquisition and retention 

processes. Pedagogical implications and future research opportunities are discussed. 

 
 iv 
 

TABLE OF CONTENTS 

 
ABSTRACT .................................................................................................................................... ii 

CHAPTER 1: INTRODUCTION ................................................................................................... 1 

1.1 Chapter Overview ............................................................................................... 4 

CHAPTER 2: LITERATURE REVIEW ........................................................................................ 6 

2.1 Language for Specific Purposes.......................................................................... 6 

2.1.1 Spanish for Specific Purposes ......................................................................... 8 

2.1.1.1 Spanish for Doctor of Veterinary Medicine Students ............................ 11 

2.1.2 Language Needs Analysis ............................................................................. 13 

2.1.3 Task-Based Language Teaching ................................................................... 15 

2.1.3.1 LNAs, TBLT, and LSP .......................................................................... 17 

2.2 Technology in Language Education ................................................................. 18 

2.2.1 VR and Language Education ........................................................................ 20 

2.2.1.1 Grounded Cognition and Vocabulary Acquisition ................................ 22 

2.2.2 AI and Language Education.......................................................................... 23 

2.2.3 VR and AI in LSP ......................................................................................... 24 

2.2.4 VR and AI in the Veterinary Field ................................................................ 25 

2.3 Vocabulary Acquisition in Language Learning ................................................ 26 

2.3.1 Vocabulary Assessments .............................................................................. 28 

2.4 Research Questions ........................................................................................... 31 

CHAPTER 3: THE DESIGN AND VALIDATION OF THE LEXVET-ESP TEST .................. 32 

3.1 Introduction to Project 1 ................................................................................... 32 

3.2 Methodology ..................................................................................................... 32 

3.2.1 Materials ....................................................................................................... 32 

3.2.2 Procedure and Participants ............................................................................ 35 

3.2.3 Data Analysis ................................................................................................ 36 

3.3 Results ............................................................................................................... 37 

3.3.1 Point-biserial Correlation .............................................................................. 38 

3.3.2 Item Response Theory .................................................................................. 39 

3.3.3 Comparisons across proficiency levels ......................................................... 42 

3.4 Discussion ......................................................................................................... 43 

3.4.1 Comparison of the LexVet-Esp with LexITA and Lextale-Esp ................... 43 


 v 
 

3.4.2 Addressing the Limitations of Point-Biserial Correlations with IRT ........... 44 

3.4.3 Differentiation of Proficiency Levels ........................................................... 45 

3.4.4 Limitations .................................................................................................... 46 

CHAPTER 4: MEASSURING VOCABULARY ACQUISTION WITH LEXVET-ESP IN A 
VIRTUAL REALITY ENVIRONMENT..................................................................................... 47 

4.1 Introduction to Project ...................................................................................... 47 

4.2 Methodology ..................................................................................................... 48 

4.2.1 Participants and Context ............................................................................... 48 

4.2.2 Materials ....................................................................................................... 48 

4.2.3 Procedure ...................................................................................................... 55 

4.2.4 Data Analysis ................................................................................................ 56 

4.3 Results ............................................................................................................... 57 

4.3.1 Assessing Performance Consistency and Retention Over Time ................... 62 

4.4 Discussion ......................................................................................................... 63 

4.4.1 Impact of VR and AI on Vocabulary Acquisition ........................................ 63 

4.4.2 Implication for Educational Practice ............................................................ 66 

4.4.3 Limitations and Future Research .................................................................. 67 

CHAPTER 5: DISCUSSION AND CONCLUSION ................................................................... 68 

5.1 Validation of LexVet-Esp ................................................................................. 69 

5.2 Considerations for VR, AI, and Vocabulary Retention .................................... 69 

5.3 Future Research ................................................................................................ 70 

REFERENCES ............................................................................................................................. 72 

APPENDICES .............................................................................................................................. 88 

Appendix A: Complete list of 180 terms used to develop the LexVet-Esp .............................. 88 

Appendix B: Part 1, Instructions, and Example Included in the LexVet-Esp .......................... 90 


 1 
 

CHAPTER 1: INTRODUCTION 

 
The world is becoming increasingly globalized, leading to a rise in international 

migration. This phenomenon often results in people relocating to countries where they do not 

speak the primary language. This process happens in the United States (U.S.), and U.S. 

immigrants often find employment in a variety of sectors, where language barriers can 

significantly impact their daily lives and work experiences. An examination of the U.S. 

occupational data shows that 36.2% of immigrants are classified as being in management, 

business, sciences, and arts roles, while 21.3% are in the service sector (US Census Bureau, 

2024c). When considering the educational level of this population, it is important to note that 

only 14.9% have a graduate or professional degree, which is in contrast to the 25.6% who do not 

hold a high school diploma or equivalent (US Census Bureau, 2024c). 

Language barriers can be especially pronounced in service-oriented professions, where 

clear communication is essential for successful interactions. One critical area where 

communication gaps manifest is in healthcare settings, particularly in patient-doctor interactions. 

Language differences can lead to inaccurate patient assessments, misunderstandings about 

treatment plans, and ultimately, poorer health outcomes. Similarly, language-related challenges 

extend to other professions that require direct engagement with clients, including veterinary 

medicine. In recognition of these challenges, there is a growing emphasis on addressing language 

barriers across professional fields with increasing focus in language education on teaching 

Languages for Specific Purposes (LSP). This field of study aims to equip professionals with the 

language and intercultural skills necessary for their workplace, which will ensure effective 

communication with diverse client population. In the context of this study, the focus is on 


 2 
 

teaching Spanish for Veterinary Medicine in which the goal is helping veterinary students in the 

U.S. establish better rapport with their Spanish-speaking clients through the Spanish language, as 

the Hispanic population in the U.S. continues to rise. 

To better understand this study’s importance, it is essential to present current statistics 

surrounding immigration within the U.S. According to data provided by the US Census Bureau 

(2024b), the number of foreign-born people has increased to over 45.3 million, which equals 

13.7% of the total U.S. population. A significant proportion of this demographic is concentrated 

in California, New Jersey, New York, and Florida. According to the U.S. Census (2024a), 

approximately 65.2 million people in the U.S. (19.5% of the population) identify as Hispanic. As 

shown in the American Community Survey five-year estimates, the U.S. population is projected 

to be 27% Hispanic/Latino by 2060 (US Census Bureau, 2023). Spanish is the second most 

spoken language in the U.S., with 22.5% of the population communicating in a language other 

than English and 13.7% using Spanish as their primary language (U.S. Census Bureau, 2024a). 

As Spanish continues to grow in usage, the need for Spanish-proficient professionals across 

industries, including veterinary medicine, is becoming increasingly apparent. 

As the Hispanic population has grown, so has the rise in pet ownership. Veterinary 

professionals frequently engage with Hispanic clients, many of whom may prefer to 

communicate in Spanish when discussing the health and care of their animals. As reported by 

Larkin (2024), there has been a significant increase in the number of pets within households 

when comparing data from 1996 to 2024. The growing number of pet owners reinforces the need 

for accessible veterinary care that accommodates linguistic diversity. According to Brown 

(2023), 62% of the U.S. adult population owns a pet, and 66% of Hispanic individuals report 


 3 
 

owning at least one pet. The canine population alone in 2024 was 89.7 million in the U.S., 

underscoring the broad presence of pets in American households. 

Even though there has been a high level of pet ownership, there has been a drop in cost 

associated with veterinary care for dogs. For example, the average cost for routine checkups or 

preventive care, which represents 80% of visits in 2024, was $147 per visit compared to $190 in 

previous years (Larkin, 2024). This trend, combined with the growing Hispanic population, 

highlights the increasing accessibility of veterinary care. However, as more Spanish-speaking pet 

owners seek routine services, the need for improved communication between veterinary 

professionals and their clients becomes even more essential. Despite this growing demand, only 

10% of veterinary professionals report proficiency in Spanish, although it is unknown how 

proficiency was assessed to arrive at this statistic (Hopkins, 2023). A range of solutions to 

increase the number of Spanish-speaking veterinary professionals has been proposed and include 

the development of specialized courses or certificate programs designed to teach Spanish to 

veterinary students across the U.S. (Hopkins, 2023). These initiatives aim to equip veterinary 

students with the language skills necessary to communicate effectively with Spanish-speaking 

clients, ultimately improving the quality of care provided. 

In addition to specialized courses and certificate programs, advancements in technology, 

including virtual reality (VR) and artificial intelligence (AI), are being used to provide language 

learners immersive and interactive learning experiences in real-world environments. VR allows 

learners to practice a foreign language in authentic, real-world scenarios, which enhances 

engagement and motivation (Chen et al., 2022; Özgün & Sadik, 2023). Similarly, AI can 

facilitate training in the language and give students real-time feedback on writing and 

pronunciation (Macinska & Vinkler, 2021; Wetherbee, 2023). These technological innovations 


 4 
 

have the potential to help learners improve their linguistic skills (Sun, 2023), which has the 

potential to revolutionize language learning, making it more accessible and effective for learners. 

Therefore, the purpose of this thesis was to integrate a VR and AI platform into a Spanish for 

Veterinary Medicine course to investigate the effects on second language acquisition with a 

focus on vocabulary learning and retention.  

1.1  Chapter Overview 

The remaining chapters in this thesis present the theoretical framework, methods, and 

practical applications of the study. Chapter 2 provides a comprehensive review of previous 

literature relevant to the study. It explores LSP, with a focus on Spanish for veterinary 

professionals, alongside methodologies such as the Language Needs Analysis and Task-Based 

Language Teaching. Additionally, the chapter examines the role of technology in language 

education, discussing VR, AI, and their applications within the LSP and veterinary fields. The 

chapter continues with a review of vocabulary retention in language learning and assessment 

methods used to measure vocabulary knowledge. It then concludes with the research questions 

that guide the thesis. 

Chapter 3 presents Project 1, which focuses on the creation and validation of a 

vocabulary assessment tool for Spanish for Veterinary Medicine, known as the LexVet-Esp. The 

methodology section outlines materials, procedures, and data analysis techniques, including 

point-biserial correlations and Item Response Theory to ensure the test’s reliability and validity. 

The results present the discriminative power of test items, comparisons across proficiency levels, 

and how the LexVet-Esp aligns with existing assessments in other languages and areas.  


 5 
 

Chapter 4 details Project 2, which examines vocabulary retention in a VR/AI-enhanced 

learning environment using the LexVet-Esp as an assessment tool. The methodology describes 

participant recruitment, study context, materials, and procedures for integrating the VR/AI 

technology into a Spanish for Veterinary Medicine course. The results assess student 

performance and retention across multiple test points and the effects of explicit vs. implicit 

vocabulary instruction.  

Chapter 5 synthesizes findings from both projects, offering a more extensive discussion 

on the thesis as a whole. It revisits LexVet-Esp’s validation and discriminative power, explores 

trends in VR-assisted vocabulary retention, and highlights methodological limitations. The 

chapter concludes by summarizing the study’s contributions and suggesting areas for future 

exploration, particularly regarding long-term vocabulary retention and the integration of learning 

technologies in LSP instruction.  


 6 
 

CHAPTER 2: LITERATURE REVIEW 

 
This chapter provides an analysis of the most relevant literature and theoretical 

frameworks that are related to this study. It is organized into four main sections, starting with an 

overview of LSP before focusing on the subfield of Spanish for Specific Purposes (SSP). Then, a 

summary of two connected areas is presented. The first is an explanation of a Language Needs 

Analysis (LNA), which is an LSP research methodology, and the second is an overview of Task-

Based Language Teaching (TBLT), a language teaching methodology associated with the results 

of an LNA. The second main section centers on the implementation of VR and AI in language 

education with emphasis on the use of these technologies in LSP. The third main section is 

focused on the area of vocabulary acquisition and retention in the field of Second Language 

Acquisition (SLA). The fourth and final section presents the research questions associated with 

this project. 

2.1  Language for Specific Purposes  

To better understand what LSP is, specialty languages have to be defined. Gómez de 

Enterría (2009) defines specialty languages as the languages used in the fields of science, 

technology, and the professions. These languages are used to transmit specialized knowledge, 

and Gómez de Enterría (2009) stated that all specialty languages can share common linguistic 

and functional characteristics. The field of LSP emerged as a way to address the language 

training of professionals who would use specialty language in their careers. Swales (2000) stated 

that the beginnings of LSP can be traced back to 1964 when a relationship between linguistic 

analysis and educational materials was first established. Initially, there was little emphasis on 

language educators needing expert knowledge of the specialized fields, and instead, LSP 


 7 
 

practitioners were skilled at conducting basic descriptions of target discourses. As the LSP field 

grew, there was a need for a more refined approach to research and teaching LSP, and the focus 

shifted from basic language skills to discourse and genre analysis during the 1980s. Nowadays, 

LSP maintains a unique relationship with other branches of applied linguistics with close 

connections to discourse analysis and pragmatics with connections to business and technical 

communication, translator training, language assessment, and communicative language teaching 

(Swales, 2000). LSP’s status as a discipline and profession varies globally with limited presence 

in U.S. graduate programs due to its separation from language acquisition, language teaching 

methodology, psycholinguistics, and sociolinguistics (Swales, 2000).  

In the context of the U.S., LSP research faces several challenges, including the need for 

interdisciplinary integration, the difficulty of collecting authentic and relevant data, and the 

development of effective assessment tools for specific professional skills. Collaboration between 

language departments and field-specific programs has become important as “land grant 

institutions…working in cohort with language departments are particularly well suited to 

spearhead domain-specific language programs that are methodologically sound and are based on 

established best practices within the LSP field” (Zeller & Velázquez-Castillo, 2018, p. 295). 

Adapting to new technologies and incorporating them into LSP curricula presents ongoing 

challenges. Other challenges to be considered are the global variability of its status (what is the 

status/recognition in different countries) and the recognition of LSP as a discipline (Lafford, 

2012). Additionally, there is often a lack of trained LSP practitioners and researchers with 

expertise in SLA. 

Sánchez-López et al. (2017) conducted research in the U.S. in the context of higher 

education to determine the main areas of LSP that are relevant for research amongst LSP 


 8 
 

scholars. Through interviews with researchers, Modern Language Association (MLA) members, 

and department chairs from various institutions in the U.S., Sánchez-López et al. (2017) found 

the areas of greatest interest for LSP research were business, culture, translation, academic 

purposes, service learning/community engagement, and health. These findings highlight the 

diverse professional domains in which language proficiency is essential, reinforcing the growing 

demand for specialized language instruction tailored to specific career fields. 

Due to the increasing interest in applying language study to professional contexts, these 

types of courses appeal to many students (Pastor Cesteros, 2013). Students who enroll in LSP 

classes typically have a different motivation compared to general language learners as LSP 

courses have lexical, grammatical, and textual properties that are also different from general 

language instruction. To align with professional objectives, research suggests that LSP courses 

integrate meaningful material, authentic texts, and goal-oriented activities designed for real-

world application (El Arbaoui, 2024; Nepravishta & Roseni, 2014). In response to this need, 

educators have developed specialized courses targeting specific field, such as Chinese or Korean 

for Business, Spanish for Nurses, English for Home Care, Mandarin for Tourism, Legal Arabic, 

among others (Helms et al., 2023; Trace et al., 2015).   

2.1.1  Spanish for Specific Purposes  

There has been an increasing interest in learning Spanish for Specific Purposes with the 

most popular areas being science and technology, law, medicine, and tourism (Pastor Cesteros, 

2013). The growth of the field of SSP is characterized by an emphasis on global context and 

collaboration. Significant contributions from Europe and Latin America highlight the importance 

of communication, connections, and collaboration among SSP scholars and practitioners 

(Lafford, 2022). Yu et al. (2020) stated that interdisciplinary work is indispensable in order to 


 9 
 

have authentic and relevant LSP courses as through collaboration, LSP instructors can become 

more familiarized with the specialty area, and the content expert can become more aware of the 

importance that language and culture have in their discipline. The interdisciplinarity of this 

collaboration can occur at different points of the process, such as during course design or the 

delivery of the course, and it can be an intra-institutional or extra-institutional partnership (Yu et 

al., 2020).  

SSP has become an integral part of Spanish curricula in universities, particularly in the 

U.S., where there is a focus on developing students’ communicative competence to address the 

needs of marginalized Spanish-speaking communities (Lafford, 2024). The integration of SSP in 

language curricula aims to provide students with the practical language skills necessary for 

specific professional contexts, enhancing their employability and effectiveness in diverse fields. 

In the U.S., the fields that are most researched are related to Science, Technology, Engineering, 

and Math (STEM), healthcare, business, education, and agriculture (Helms et al., 2023; Lafford, 

2024; Miller De Rutté et al., 2024; Pérez, 2021; Salazar et al., 2024; Zeller et al., 2016).  

Institutions are developing specialized SSP curricula and conducting extensive research 

on curriculum design and effective teaching methodologies (Pérez, 2021). Zeller & Velázquez-

Castillo (2018) provide a deeper contextualization of the need for the development of SSP 

programs. In their study, they offer a review of how various institutions have addressed SSP 

offerings, and they discuss how there exists a cohort of undergraduate students eager to 

participate in a non-traditional SSP area, which is Spanish for Animal Health and Care. The 

authors delineated the steps undertaken to establish an undergraduate certificate program in this 

area. They conducted an LNA, identified tasks (i.e., illness treatment and care, health histories, 

preventative care, among others), gathered additional information about these tasks (i.e., 


 10 
 

language functions and samples), and created a scaffolded sequence for the identified tasks to 

facilitate the development of a curriculum. Additionally, the study articulated the critical 

components that must be considered during the creation of SSP curricula. These components 

included identifying linguistic characteristics, grammatical forms and structures, lexical 

attributes, and sociocultural factors that influence professional communication (Zeller & 

Velázquez-Castillo, 2018). 

Despite the advancements in SSP, challenges, such as the need for qualified instructors, 

integration of technology, and accurate assessment tools, persist (Czerkawski & Berti, 2020). 

SSP practitioners find themselves in a challenging position, caught between the demands of the 

language and the extensive diversity of scientific disciplines they are expected to address. They 

are tasked with teaching a specialized language, even though they often lack familiarity with the 

scientific fields and/or the linguistic tools required to transmit information related to those fields, 

which leaves them struggling to bridge the gap between two domains they might not fully 

command (Dodigovic, 1993). Because of the great diversity of disciplines, in most cases, each 

individual instructor has developed their own approach to teaching.  

To address these challenges, there has been a call for stronger connections between LSP 

scholars and practitioners. Professional development in SSP is supported through conferences 

and publications with notable organizations, such as the Asociación Europea de Lenguas para 

Fines Específicos (AELFE) and the Congreso Internacional de Español para Fines Específicos 

(CIEFE), playing key roles in advancing the field (Lafford, 2022). These efforts highlight the 

dynamic nature of SSP and the ongoing commitment to its advancement through research, 

education, and professional collaboration.  


 11 
 

2.1.1.1 Spanish for Doctor of Veterinary Medicine Students 

“An awareness of Hispanic culture and language is becoming increasingly necessary 

within the US borders, especially for those who work directly with a Spanish-speaking 

workforce” (Zeller & Velázquez-Castillo, 2018, p. 290), and as previously mentioned, the area of 

Spanish for Animal Health and Care is growing at the undergraduate level. Similarly, the SSP 

field has begun to train future veterinary professionals at the graduate level in the Spanish 

language that is needed to communicate with Spanish-speaking clients. One initial attempt was 

carried out by Graves (2014, as cited in Zeller & Velázquez-Castillo, 2018) in which a bilingual 

workshop on cow reproduction was offered; however, this type of workshop did not focus on 

communication. Instead, training focused on providing translation of terms and phrases and did 

not focus on grammatical patterns or pragmatics of the language. Moreover, this training 

presented only one point of view, the English-speaking perspective, and did not consider 

speakers from other perspectives, such as the Spanish-speaking community. The lack of the 

Spanish speaker’s voice reinforces the privilege of dominant cultures while disadvantaging 

minority groups, and so, cultural competence must also be considered in these course offerings 

as it is only then that importance will be given to examining and addressing biases (Zeller et al., 

2023). 

In another study, Landau et al. (2015) sent a questionnaire to students at all veterinary 

schools in the U.S. to understand students’ perceptions of the use and need for Spanish in the 

veterinary field. Students were asked about their experience in and ability with Spanish as well 

as the topic of preparedness to communicate in medical terms. Results from this study indicated 

that there was a gap between students’ general and medical Spanish proficiency, which resulted 

in many of them feeling unprepared to give medical information to Spanish-speaking clients. 


 12 
 

Moreover, some students, who self-identified as informal interpreters, did not consider 

themselves fluent in Spanish, which highlights a potential discrepancy between perceived and 

actual language skills. On the other hand, the need for Spanish in the veterinary field was not 

apparent for many of the students surveyed, and the ones who used Spanish in this setting did not 

stop to think about the challenges and difficulties that non-English speakers experience (Landau 

et al., 2015). As a result, Landau et al. (2015) stated that “there is room to improve professional 

communication competencies and diversity/multicultural awareness as identified by the North 

American Veterinary Medical Education Consortium” (p. 330).  

Nowadays, there are U.S. institutions that provide Spanish language courses within their 

Doctor of Veterinary Medicine (DVM) curriculum. For example, Texas A&M University offers 

a course titled “Medical Spanish”, which is aimed primarily at teaching foundational Spanish 

skills to facilitate engagement with Spanish-speaking clientele (Zeller et al., 2023). Another 

institution is the University of California, Davis, which has integrated Spanish language courses 

into the DVM offerings in collaboration with the Spanish and Portuguese department, and 

Purdue University offers Spanish language instruction during lunch breaks to DVM students 

(Zeller et al., 2023). An additional initiative within the field of Animal Sciences, although not 

directly affiliated with DVM offerings, is the collaborative program established between Texas 

Tech University, North Carolina State University, and Tarleton State University (Salazar et al., 

2024). This program consists of three courses in Spanish administered across three consecutive 

semesters. The sequence of courses starts at a basic level, concentrating on vocabulary and 

grammatical structures, and advances to more complex tasks at an intermediate level in the 

second course. The final course covers discussions and simulations that reflect real-world 

scenarios related to agriculture and animal care in Spanish.  


 13 
 

Another study related to the Spanish for Veterinarians Language Program (SVLP) at a 

large western U.S. university (Forehand et al., 2023). Before developing the program, the 

researchers conducted a survey to determine if an eventual certificate in Spanish at the DVM 

level was of interest to the student body. A total of 791 students from across the U.S. responded 

to the survey, 275 of whom were enrolled at the researchers’ institution. From this survey, it was 

found that there was great motivation for this type of coursework and that students were even 

willing to pay for classes out of pocket. Participants mentioned that their motivation to take these 

courses stemmed from the desire to help them improve client relations, increase confidence, and 

understand or potentially improve the animal’s health. The SVLP was designed based on an 

extensive LNA and using TBLT, both of which will be described in the next section, to focus on 

specific communicative tasks in Spanish so that students could develop the relevant language 

that they would need to use in their future careers (Zeller et al., 2023).   

2.1.2  Language Needs Analysis  

Chambers (1980) originally explained that “needs analysis should be concerned with the 

establishment of communicative needs and their realizations, resulting from an analysis of the 

communication in the target situation” (as cited by Basturkmen, 2010, p. 18). Basturkmen (2010) 

revised Chambers’ ideas 30 years later and stated that 

needs analysis in [English for Specific Purposes; ESP] refers to a course development 

process. In this process the language and skills that the learners will use in their target 

professional or vocational workplace or in their study areas are identified and considered 

in relation to the present state of knowledge of the learners, their perceptions of their 

needs and the practical possibilities and constraints of the teaching context (p.19).  


 14 
 

There are several advantages to doing this type of analysis, and one of those advantages 

is the fact that LNAs employ multiple sources of information, incorporating insights from 

students, professionals, and the target community who directly engages with the specialized 

language in practice (Malicka et al., 2019). One example of an LNA in the animal sciences field 

is work done by Zeller and Velázquez-Castillo (2018), who conducted a comprehensive needs 

analysis centered on the linguistic demands of the Spanish-speaking workforce in livestock 

environments. Their study involved observations of livestock work routines, interviews with 

practitioners and clients, and discussions with veterinarians and a farm manager, ensuring that 

the analysis reflected the real-world communication needs of professionals and the communities 

they serve. These analyses resulted in the development of a program for future professionals 

working with livestock farms and establishments. Additionally, the analysis of LNA data puts 

language tasks at the center. According to Candlin (1987), a task is “one of a set of 

differentiated, sequentiable, problem-posing activities involving learners and teachers” (as cited 

in Robinson, 2011, p. 6). Van den Branden (2006) adds that a task is “an activity in which a 

person engages to attain an objective, and which necessitates the use of language” (p. 4). 

Through the integration of perspectives from the target language community, a comprehensive 

and deep understanding arises regarding the specific types of language tasks learners must 

develop in order to meet the target language community members’ needs (Long, 2005). 

Therefore, it is through the LNA process that it will be possible to “relate instructional goals, 

processes, and practices to real-life performance outside the classroom” (Malicka et al., 2019, p. 

79). 

Another example of a comprehensive task-based LNA related to Spanish for 

Veterinarians at the DVM level involved observations at veterinary clinics in Colorado, Texas, 


 15 
 

and Colombia, along with other data collection methods, including interviews with veterinary 

professionals and their Spanish-speaking clients (Zeller et al., forthcoming). This research aimed 

to improve access to veterinary care for Spanish-speaking pet owners by identifying the clients’ 

communication needs and developing a curriculum tailored to these needs. The findings 

informed the creation of a Spanish language program that equips veterinarians with the necessary 

Spanish language skills to engage effectively with their clients. This program consists of five 

courses with a total of nine credits. Four of these courses are centered in the veterinary wellness 

appointment and includes taking the health history, relaying diagnostics, and discussing the 

treatment plan. All courses are conducted in Spanish. The fifth course, which is one credit, is 

focused on cultural awareness and access to care, and it is delivered in English. The analysis 

from the LNA also indicated the entrance proficiency level required for the program, which is a 

minimum of novice high on the American Council on the Teaching of Foreign Languages 

(ACTFL) proficiency scale. The proficiency level increases with each course so that 

veterinarians can build trust and develop stronger relationships with their clients, thereby 

improving the quality of care provided to the animals. 

2.1.3  Task-Based Language Teaching  

TBLT is the teaching methodology associated with the LNA research methodology. Ellis 

(2003) stated that “TBLT is an approach based on interactive and communicative tasks aiming at 

involving learners in meaningful communication and interaction enabling them to acquire 

linguistic structures as a result of engaging in authentic use” (as cited in Khatib & Dehghankar, 

2018, p. 5). TBLT was developed out of the Communicative Language Teaching approach 

(Motlagh et al., 2014), which emphasizes interaction and communication as the primary goals of 

learning a new language (Qasserras, 2023). The core concept of the TBLT approach centers on 


 16 
 

the task itself (Richards & Rodgers, 2001). El Arbaoui (2024) stated that the use of tasks 

encourages “effective and comprehensive language exposure and usage” (p. 253).  

TBLT has been widely studied in SLA, and research has shown that tasks aid in several 

SLA processes, such as understanding input and producing output, as tasks provide students with 

opportunities to engage in meaningful interaction (Long, 1985; Robinson, 2011). Through task-

based activities, learners are encouraged to produce language output, aligning with the Output 

Hypothesis (Swain, 1985), which suggests that producing language helps learners notice gaps in 

their knowledge and improve their language skills. Additionally, tasks facilitate a focus on form 

by drawing learners’ attention to specific linguistic forms while, at the same time, maintaining a 

focus on meaning (Ellis, 2003). This approach promotes negotiation of meaning, allowing 

learners to clarify misunderstandings and enhance their comprehension and production skills 

(Skehan, 1998).  

TBLT considers numerous philosophical positions and empirical traditions from different 

fields, including education, applied linguistics, and psychology. Important considerations of this 

methodology relate to experiential learning, student-centeredness, and a process-oriented 

approach to syllabus creation (Nunan, 2014). These three characteristics mean that there is a 

focus on how learners acquire language skills and the strategies they use rather than just the final 

product or specific language items that students should master. It also proposes that when 

language is used to accomplish meaningful tasks, it significantly enhances the learning process. 

Furthermore, this approach highlights that language becomes more beneficial to learning when it 

holds personal significance or relevance for the learner (Motlagh et al., 2014). These principles 

underscore the importance of practical, purposeful language use in educational settings. 


 17 
 

2.1.3.1  LNAs, TBLT, and LSP    

LSP courses should be meticulously designed using evidence-based methodologies, such 

as the LNA, to address the language needs required by professionals in a certain domain, which 

also stresses the importance of adopting TBLT to replicate the real-world tasks that will be 

carried out outside of the classroom (Naudi, 2023). This approach allows learners to consistently 

engage with the language in both oral and written forms through activities focused on completing 

the task in its fullest, often leaving aside grammatical perfection (Cédric, 2021; Naudi, 2023). 

Investigations conducted by Hattani (2020), Georgy (2023), and Nazari (2020) outline both 

advantages and disadvantages associated with the incorporation of TBLT and LSP and are 

discussed next. 

Among the advantages of combining TBLT in LSP courses are the facilitation of a 

significant contextual framework for learners as well as the promotion of a student-centered 

classroom environment. This approach enhances student engagement and contributes to a more 

stimulating and less repetitive classroom atmosphere (Hattani, 2020). Most of these instructional 

sessions are conducted in the target language so that students are immersed in the L2 during the 

class (Nazari, 2020). Due to the student-centered nature of the class, many activities are 

completed through collaborative engagement in pairs or groups to foster both cooperative and 

collaborative learning (Hattani, 2020). Research has shown that students find the TBLT nature of 

their LSP courses to be more engaging than general language courses, and they report an 

increased sense of self-efficacy when using their L2 (Hattani, 2020). Similarly, students 

articulate that this pedagogical approach allows them to identify more targeted learning 

objectives, providing them with better clarity regarding the skills and knowledge they are 

expected to develop during these classes (El Arbaoui, 2024). Finally, students expressed a desire 


 18 
 

for instructors to adapt and implement this instructional methodology in other disciplines, 

including those not directly associated with language acquisition (El Arbaoui, 2024). 

Some disadvantages of incorporating TBLT and LSP include the time investment 

required for lesson planning and finding authentic materials, which may not be widely available 

or adapted for use by language learners (Georgy, 2023). Additionally, instructors must be aware 

of their role as facilitators of information while simultaneously monitoring and providing 

feedback without disrupting the tasks or obstructing communication (Georgy, 2023; Hattani, 

2020). However, instructors have noted an increase in students’ motivation and self-esteem when 

employing this pedagogical approach, as it enables students to notice their relevance in real-

world contexts (Hattani, 2020). 

2.2  Technology in Language Education 

In this next section, the role of technology in language education, at large, and in LSP are 

discussed. Technological instruments in the classroom should act as collaborators rather than 

obstacles to allow for the successful integration of supplementary resources and activities, and 

there are many possibilities to do so. As noted by Chacón Medina (2007), “the greatest potential 

of new information and communication technologies (ICTs) is derived from the capabilities of 

manipulation, storage, and distribution of information in an easy, fast, and accessible way for all 

people” (p. 25). ICTs can increase access to knowledge while optimizing educational processes. 

This characteristic makes technology indispensable in education, and the incorporation of 

technology into L2 teaching is becoming progressively common to help promote immersion in 

authentic contexts. 


 19 
 

Technological advancements have brought a variety of tools and platforms that make 

language learning more engaging and effective (Mayer, 2014). Some of the most popular include 

multimedia tools, mobile apps, digital games, augmented reality (AR), and VR (Blyth, 2018; 

Godwin-Jones, 2011; Ibáñez & Delgado-Kloos, 2018; Pachler et al., 2010; Reinders & Wattana, 

2015). Multimedia tools, such as videos and audio recordings, help learners improve their 

listening and speaking skills by exposing them to authentic language use (Pachler et al., 2010). 

Multimedia tools also help improve language comprehension and retention as they offer a variety 

of input types, such as audio, visual, and textual (Mayer, 2009). Apps like Duolingo and Babbel 

use smartphones and tablets to offer flexible, on-the-go learning opportunities, especially for 

learners with busy schedules or limited access to traditional classrooms (Kukulska-Hulme & 

Shield, 2008; Reinders & Hubbard, 2013). Mobile technologies use adaptive learning systems 

that personalize instruction to meet individual learners’ needs while offering tailored feedback 

and practice, which boost learning efficiency. These apps also provide interactive exercises and 

gamified experiences that keep learners motivated (Godwin-Jones, 2011). Digital games can 

create immersive environments for practicing language through problem-solving and storytelling 

(Reinders & Wattana, 2015), which increases learner engagement and motivation and leads to 

higher participation rates and greater persistence in language study (Sykes & Reinhardt, 2012). 

Another example of this idea is found in AR as AR overlays digital information onto the real 

world to help learners engage with vocabulary and phrases in context (Ibáñez & Delgado-Kloos, 

2018), while VR provides immersive virtual environments where learners can practice language 

skills (Blyth, 2018). AR and VR deliver contextualized practice, which helps students transfer 

language skills to authentic communicative environments (Holden & Sykes, 2011). Additionally, 

digital games and VR can support collaborative learning by allowing learners to interact and 


 20 
 

communicate in virtual spaces to enhance language practice and foster social interaction 

(Thorne, 2008).  

2.2.1  VR and Language Education  

VR is one of the main technologies in this study, and this next section will focus on VR 

use in language education. VR is defined as “a simulation of a three-dimensional virtual 

environment, generated by a computer, in which the individual can engage with the 

aforementioned environment” (Peixoto et al., 2021, p. 48952). VR systems can be divided into 

three distinct categories: non-immersive, semi-immersive, and immersive. The principal 

differentiation among these categories is based on the degree of immersion experienced by the 

user, as well as the cost associated with these categories. As the level of immersion increases, 

there is a corresponding need for specialized equipment and advanced technological 

infrastructure to enable a more authentic and interactive environment (Peixoto et al., 2021), and 

all three levels of immersion have appeared in language teaching (Lin & Lan, 2015).  

Immersive VR settings can replicate real-world situations, enabling students to develop 

language competencies within authentic contexts that are otherwise challenging to recreate in 

conventional classrooms. This type of VR can enhance speaking and listening skills as learners 

can practice conversations in a safe space, which helps them build both confidence and 

proficiency (Lin & Lan, 2015). In the world of immersive VR, new concepts have emerged, like 

the idea of the metaverse. According to Anacona Ortiz et al. (2019), metaverses are characterized 

as “virtual worlds that allow users to let their imagination run free” (p. 62). Within educational 

frameworks, metaverses are described as “real-time immersive 3D simulated environments 

whose ecosystem is well-suited to incorporate audiovisual notifications, resulting in impressive 

configurations for formative or pedagogical spaces” (Barráez-Herrera, 2022, p. 16). The 


 21 
 

transition towards immersive digital learning environments emphasizes the potential of VR 

technologies in augmenting student engagement and promoting authentic learning experiences. 

The effectiveness of VR in the field of L2 learning has been a topic of investigation with 

a focus on the outcomes of the implementation of these tools. VR-assisted language learning 

(VRALL) provides “sensory-rich environments, allowing [learners] to experience telepresence 

(i.e., the feeling of ‘being there’ in the target language country)” (Kaplan-Rakowski & 

Wojdynski, 2018, p. 124). This immersive modality possesses the capacity to augment learner 

motivation and engagement, particularly in situations where exposure to authentic linguistic 

environments is constrained. One study that investigated the application of VR in the field of 

language learning found that VR facilitates learning experiences while eliminating geographical 

barriers for L2 learners (Kaplan-Rakowski & Wojdynski, 2018). This study also found that 82% 

of participants wanted to keep studying a language after using VR. Similarly, a systematic 

review of immersive VR applications in higher education found improved educational outcomes 

due to the integration of authentic tasks, real-time feedback mechanisms, and interactive 

simulations (Radianti et al., 2020).  

In terms of the effects of VR on L2 vocabulary acquisition, there are relatively few 

studies despite the growing interest in the field (Agurto-Cabrera & Guevara-Vizcaíno, 2023; 

Moreno Martínez & Galván Malagón, 2020; Valero Franco & Berns, 2023). However, Legault et 

al. (2019) found that immersive VR can be an effective tool for L2 vocabulary acquisition as VR 

scenarios promote contextualized vocabulary use and can improve memory retention through 

multisensory experiences. Additionally, providing these tools to learners allows them to use their 

motor skills while learning an L2, which is again achieved because of the replicas of real-world 

scenarios where learners can engage with vocabulary in meaningful ways (Legault et al., 2019). 


 22 
 

Other research investigated the advantages and limitations of using VR in L2 education. 

Advantages included having an authentic, real-life environment and the ability to engage with 

multiple senses leading to higher motivation, retention, and achievement (Klimova, 2021). VR 

supports diverse learning styles, fosters creativity, and boosts self-confidence while reducing 

anxiety and encouraging active participation and the development of learner autonomy. 

However, there are some limitations to consider, such as the high cost of software and teachers 

and students’ potentially limited tech skills (Klimova, 2021). The success of VR applications 

relies on carefully designed content that follows cognitive and instructional design principles 

(Mayer, 2014). The challenge for educators and programmers, then, lies in the creation of VR 

experiences that integrate interactivity, immersion, and educational efficacy while ensuring 

accessibility and cost-effectiveness to promote wider adoption. 

2.2.1.1  Grounded Cognition and Vocabulary Acquisition  

VR provides not only an immersive environment for language learners but also 

sensorimotor experiences. This combination can be explained by the framework of grounded 

cognition, which emphasizes the important role of sensorimotor experiences in the language 

acquisition process while highlighting the importance of contextualized and experiential 

approaches in vocabulary acquisition. Barsalou (2008) suggested that cognitive functions, 

including language comprehension, are fundamentally intertwined with the brain’s modal 

systems dedicated to perception, action, and introspection. This notion indicates that lexical 

items are not simply abstract representations. Instead, they are deeply rooted in sensorimotor 

experiences, reinforcing the importance of contextualized and experiential learning approaches 

in vocabulary acquisition. As individuals interact with their environment, the sensorimotor states 

linked to these experiences are encoded by memory systems for future representational use 


 23 
 

(Barsalou, 2023). Therefore, grounded learning methodologies emphasize the importance of 

contextual and experiential vocabulary acquisition, where lexical items are learned through 

active participation in authentic real-world contexts rather than through passive retention. The 

multimodal characteristics of this learning approach ensure that vocabulary components are 

associated with rich, interrelated representations that include visual, auditory, tactile, and 

emotional dimensions and can also be closely connected to cultural factors, as vocabulary 

acquisition is influenced by culturally specific situated activities and linguistic expressions 

(Barsalou, 2023). This theory supports the integration of VR in L2 education.  

2.2.2  AI and Language Education  

In addition to VR, AI is another technological tool that can be used in education. AI is 

defined “as the branch of computer science dedicated to creating systems capable of performing 

tasks that typically require human intelligence” (Macinska & Vinkler, 2021, p. 4). Generative AI 

(GenAI) is characterized by its ability to generate new content that has not existed previously. As 

such, GenAI can create output that may appear original, and one example of this is chatbots that 

can mimic human conversations (Macinska & Vinkler, 2021). 

Chatbots are software applications that simulate human conversation by asking and 

answering questions via text or audio. They are not new as they have existed since the 1960s, 

and they have been integrated into different messaging apps for years (Son et al., 2023). In the 

context of L2 learning, chatbots have been programmed to engage in conversations with the 

learner to perform different tasks. Research has demonstrated that the use of chatbots increases 

the output of the learner and also enhances their speaking performance (Son et al., 2023). 


 24 
 

AI is revolutionizing language education by offering new ways to develop different 

linguistic skills. In their systematic review, Zhu & Wang (2025) highlighted different ways in 

which AI tools can be used in the process of language learning, including AI-powered writing 

evaluation systems that provide feedback on written work as well as other tools like ChatGPT. 

AI-driven avatars are also helping learners improve their pronunciation and speaking abilities. 

AI-enhanced language learning experiences have the potential to address multiple dimensions of 

language learning and proficiency development; however, GenAI was only released in an open 

access format in 2022, and much more research is needed to understand its full potential. 

2.2.3  VR and AI in LSP  

As this study is focused on vocabulary acquisition in LSP and integrates VR and AI, this 

next section presents relevant, albeit limited, research in this area. Li et al. (2022) studied the 

effects of an immersive VR experience on vocabulary acquisition in English for Geography. 

Through the interaction with multimodal and contextualized scenarios, such as the hydrologic 

cycle, researchers found that VR can improve incidental vocabulary learning and learner 

engagement as the participants were able to activate memory and comprehension due to the 

immersive environments (Li et al., 2022). Additionally, these environments supported cognitive, 

behavioral, and social participation, which are essential for mastering specialized terminologies 

(Li et al., 2022). Similarly, Park (2022) studied vocabulary learning in a VR-enhanced real-world 

environment called the Digital Kitchen. This study combined TBLT with multimodality to 

engage learners’ senses (i.e., touch, smell, and taste), which are often absent in traditional or 

virtual settings. Results showed that learners in the Digital Kitchen achieved higher scores in 

vocabulary retention compared to those in traditional classroom settings. Park (2022) highlighted 


 25 
 

the importance of integrating physical and sensory experiences into language learning to enhance 

memory and language acquisition.  

Miller De Rutté (2024a, 2024b) investigated the role of VR in medical Spanish 

instruction. Undergraduate students enrolled in a medical Spanish course engaged in immersive 

VR simulations designed to replicate real-world professional interactions, allowing them to 

perform medical tasks in Spanish. Miller De Rutté (2024b) found increases in students’ 

motivations levels after using VR as students expressed a heightened ability to visualize 

themselves successfully performing professional tasks. Additionally, students emphasized 

specific skill development, particularly in conversation, listening, pronunciation, and critical 

thinking (Miller De Rutté, 2024a). The immersive nature of VR was found to deepen students’ 

self-reported levels of engagement and focus while amplifying course topics related to cultural 

awareness, pragmatics, and applied medical vocabulary. Participants expressed that the 

simulations created a more authentic and interactive learning environment, reinforcing their 

ability to connect language skills with professional application (Miller De Rutté, 2024a). 

2.2.4  VR and AI in the Veterinary Field 

Similarly to the use of technology in language education, VR and AI have also been 

implemented in the veterinary field as they offer new solutions to improve education, 

diagnostics, and administrative efficiency (Appleby & Basran, 2022; Sobkowich, 2025). 

Specifically, VR enables veterinary students and professionals to practice surgical techniques 

and diagnostic procedures in a controlled, risk-free environment (Xu et al., 2023). These 

immersive simulations not only improve skill acquisition but also build confidence, preparing 

practitioners for real-life scenarios (Appleby & Basran, 2022; Sobkowich, 2025). VR is also used 

in anatomy education, where students can explore 3D models of pet’s anatomy derived from real 


 26 
 

cases, and this approach has been shown to improve interactive learning while linking 

anatomical studies to clinical relevance (Aghapour & Bockstahler, 2022). Regarding the use of 

AI in veterinary medicine, Sobkowich (2025) discussed the variety of ways in which AI is being 

used in the field. Examples include automating workflow for scheduling appointments, 

predicting surgery complications, using adaptive testing systems, or extracting data from patient 

records. In the education field, Sobkowich (2025) states that AI is being used with platforms like 

Khanmigo to provide adaptive and on-demand support for students, while intelligent learning 

management systems adjust lesson pacing and sequencing based on individual progress. Another 

tool that is being used is AI-driven chatbots to recreate clinical scenarios or practice client 

communication. However, despite these technological advancements, the application of VR and 

AI combined specifically for communication in veterinary contexts remains largely unexplored.  

2.3  Vocabulary Acquisition in Language Learning 

Vocabulary acquisition serves as a fundamental function in the process of L2 learning 

with a direct influence on communicative competence and language proficiency. Vocabulary is 

part of the foundation for developing the four language skills (i.e., listening, reading, writing, and 

speaking). There are complex learning processes involved in vocabulary acquisition and 

retention, and retention processes are interconnected with the acquisition and transfer phases of 

the learning process in addition to memory and cognitive processes (Houston, 2001 as cited in 

Sanatullova-Allison, 2014). Sanatullova-Allison, (2014) explained that 

once an item is perceived, it enters primary memory (PM) with short-term storage. 

Rehearsal is necessary for the item to remain in PM and, if rehearsal is long enough, the 

item may enter secondary memory (SM), which is long-term storage […] Long-term 


 27 
 

memory is made of declarative and procedural knowledge: the former is the knowledge 

about facts and the latter is the knowledge about how to perform tasks (p. 2).  

Different language learning and teaching principles have been implemented to promote 

L2 vocabulary acquisition and memory retention (Nation, 2022). For example, spaced repetition 

learning, where review intervals are lengthened after successful recall and shortened after 

mistakes, has been found to significantly improve long-term retention compared to studying in 

one continuous session (Sanatullova-Allison, 2014). Other research has shown that engaging in 

active recall, defined as a learning technique that involves actively retrieving information from 

memory rather than passively reviewing it (Pham et al., 2016), significantly improves memory 

retention, promotes deeper cognitive engagement, and leads to better learning outcomes unlike 

strategies such as re-reading or passive review (Karpicke & Roediger III, 2008). Similarly, in 

terms of instructional techniques to teach vocabulary, research has found that intentional 

vocabulary teaching generally results in long-term retention as well as proficiency gains in the 

L2 (Schmitt & Schmitt, 2020). Other research has found that learners who actively participated 

in tasks outperformed those who did not engage in tasks (Joe 1998, as cited by Sanatullova-

Allison, 2014), which emphasizes the importance of task-based teaching approaches. Word 

frequency, how words are presented, what students do with the words, and what language is used 

when explaining vocabulary are other key considerations in instruction (Borawski, 2019). 

Presenting vocabulary in terms of frequency based on a spoken L2 corpus and using those words 

in a small and morphological word list has been found to improve the acquisition and retention 

of L2 vocabulary (Borawski, 2019). 

Therefore, vocabulary knowledge is a construct that includes form, meaning, and use, and 

knowing a word is not only recognizing or defining a word but using it in the correct context. 


 28 
 

Mastering all of these aspects enables a speaker to use the word with native-like fluency and 

accuracy (Schmitt & Meara, 1997). Previous research indicates four key aspects inherent to 

vocabulary knowledge, which include vocabulary size (the number of words known), 

collocational competence (the learner’s ability to recognize and use word combinations that 

naturally occur together), receptive vocabulary knowledge (the ability to recognize words), and 

productive vocabulary knowledge (the ability to use words appropriately) (Masrai, 2023; Nation, 

2022). Apart from these core dimensions, vocabulary knowledge is also influenced by depth of 

word understanding, including semantic associations and morphological awareness (Mcbride-

Chang et al., 2008; Samaraweera, 2025). The relationship between these elements ensures that 

learners not only recognize and produce words but also interpret subtle meanings and contextual 

relevance. This process enhances their ability to communicate effectively in the target language.  

2.3.1  Vocabulary Assessments  

Evaluating L2 vocabulary comprehension is an essential element in language proficiency 

assessment. Over time, scholars have developed a variety of instruments to evaluate both the 

breadth and depth of learners’ vocabulary knowledge with different tools measuring different 

vocabulary aspects. These instruments include traditional word-list exams as well as more 

advanced assessments based on contextual vocabulary use that evaluate the learner’s capability 

to identify and use words within specific contexts. Some of the most common types of 

vocabulary tests are receptive vocabulary tests, productive vocabulary tests, and computer-

adaptive testing.  

Initial research in vocabulary assessment focused on measuring receptive vocabulary 

through breadth, which refers to the number of words learners know, with common tools like the 

Vocabulary Size Test (Nation, 1990; Nation & Beglar, 2007). However, it was later recognized 


 29 
 

that vocabulary depth, or how well learners understand word meanings, associations, and usage, 

is also important to assess (Pignot-Shahov, 2012; Sun et al., 2023). Productive vocabulary tests 

assess the learner’s ability to actively use words in different contexts, such as translation 

(Fitzpatrick, 2007). One example of this type of test is the Productive Levels Test (Laufer & 

Nation, 1999), which evaluates vocabulary knowledge based on word-frequency bands. For this 

test, participants reorganize letters to create a word, and there are a possible total of eight words 

across five levels (Fitzpatrick, 2007). Computer-adaptive testing, such as the Computer Adaptive 

Test of Size and Strength (CATSS), assesses both breadth and depth of vocabulary knowledge 

by measuring active recall, passive recall, active recognition, and passive recognition while 

adapting dynamically to the learner’s proficiency level (Pignot-Shahov, 2012). However, the 

most widely used method is the receptive vocabulary test, which measures learners’ ability to 

recognize words and is the type of test used in this study. This type of test does not require the 

learner to produce language, which makes it accessible to a wide range of proficiency levels 

(Amenta et al., 2020).  

As mentioned, a common receptive vocabulary assessment is the Vocabulary Levels Test 

(Nation, 1990). It encompasses various levels of difficulty, from basic to more advanced. The 

assessment requires learners to match words with their corresponding meanings, and each level 

presents a distinct set of words that align with frequency bands in the English language (e.g., the 

1,000 most frequently encountered words, the 2,000 most frequently encountered words, etc.). 

The Vocabulary Size Test, on the other hand, was created by Nation & Beglar (2007) to assess 

learners’ vocabulary size. This test consists of a series of word recognition tasks designed to 

evaluate the number of words a learner can comprehend. The items are sourced from various 


 30 
 

frequency bands in the English language, and the test provides an estimation of the total number 

of words with which a learner is familiar across the different frequency bands. 

Another widely known assessment is the LexTALE test, introduced by Lemhöfer & 

Broersma (2012), which evaluates the receptive vocabulary size of advanced English learners. 

The LexTALE uses a lexical decision task format, requiring learners to decide whether the 

presented words are real English words or invented terms. The assessment is designed to be both 

quick and reliable and incorporates a scoring system that determines a learner’s vocabulary size 

based on their performance. The LexTALE can differentiate between varying levels of language 

proficiency and is a validated measure. Building upon the success of the LexTALE, other 

researchers have created similar instruments for other languages. Izura et al. (2014) developed 

the Lextale-Esp, a test adapted for the Spanish language. The test consists of 60 items (40 real 

words and 20 non-words) and can differentiate across Spanish language proficiency levels, 

including at the higher ends of the proficiency scale. Amenta et al. (2020) developed the LexITA 

to measure receptive vocabulary size in Italian and has been validated as an effective assessment 

for evaluating vocabulary knowledge across a spectrum of proficiency levels, as its equivalents 

in Spanish and English.  

These three tools, the LexTALE, Lextale-Esp, and LexITA are distinguished from others 

by their efficiency and accuracy. They are structured for quick administration, generally 

requiring five to ten minutes to complete, while also enabling objective scoring through a simple 

yes/no format. This study built on the strengths of these existing assessments, as it followed their 

methodology. However, it further adapted their work to create a specialized vocabulary test to 

the address needs of veterinary Spanish learners.  


 31 
 

2.4 Research Questions  

The purpose of this study was to understand the effects of the use of VR and AI on 

Spanish vocabulary acquisition and retention in a course on Spanish for Veterinarians. The 

following research questions (RQs) guided the study: 

RQ1: Does the use of a VR and AI platform in the Spanish for Veterinarians classroom 

affect vocabulary acquisition in L2 learners? 

RQ2: Does the use of explicit versus implicit vocabulary learning activities in a VR and 

AI environment influence learners’ vocabulary acquisition and retention? 


 32 
 

CHAPTER 3: THE DESIGN AND VALIDATION OF THE LEXVET-ESP TEST 

 
3.1  Introduction to Project 1 

The overarching goal of this thesis was to understand technology’s influence on 

vocabulary acquisition and retention in students enrolled in courses on Spanish for Veterinarians. 

However, there is no validated vocabulary assessment to measure vocabulary acquisition in this 

area. Therefore, the main objective of this first part of the research was the creation of a 

receptive vocabulary acquisition test to serve as a data collection measure in the subsequent 

chapter. The LexVet-Esp test, as this new test is now called, followed the same procedures from 

previous research to develop and validate the new test (Amenta et al., 2020; Izura et al., 2014; 

Lemhöfer & Broersma, 2012), which is described in the following subsections. The specific 

research questions guiding this first project were as follows: 

RQ1: How accurately does the LexVet-Esp predict Spanish vocabulary knowledge and 

proficiency? 

RQ2: How well does the LexVet-Esp differentiate between native and non-native 

Spanish speakers? 

3.2  Methodology 

3.2.1  Materials  

Ninety words were extracted from a spoken corpus on Spanish for Veterinarians (Zeller 

et al., forthcoming). The corpus was based on veterinary medicine observations from four 

different locations, Colorado, Texas, Mexico, and Colombia, and included observations in both 

Spanish and English. For this project, only the Spanish interactions were analyzed. These 


 33 
 

interactions occurred between different personnel in a veterinary clinic (e.g., veterinary 

technician, veterinarian, etc.) and the Spanish-speaking client. The selection of the 90-word 

items was based on frequency distributions as described in previous studies (Amenta et al., 2020; 

Izura et al., 2014; Lemhöfer & Broersma, 2012). Words spanned from exceedingly high 

frequency words (e.g., perro/dog or doctor/doctor), which are anticipated to be familiar to most 

Spanish speakers, to very low frequency words, (e.g., férula/splint or sarro/tartar), which are 

likely to be recognized exclusively by highly proficient primary language speakers.  

The characteristics of the word items, including their distribution, length, and frequency 

per million words, are presented in Table 1. Overall, 26 words had a frequency of less than one 

occurrence per million words, 23 had a frequency of one to five occurrences per million, 14 had 

a frequency of 6 to 10 occurrences per million, 17 had a frequency of 11 to 20 occurrences per 

million, eight had a frequency of 21 to 100 per million, and two words (comer/to eat and 

mirar/to look) had a frequency greater than 100 per million. Most of the words were nouns 

(n=50), and there were equal numbers of verbs (n=20) and adjectives (n=20). 

Table 1 

Characteristics of the word items included in the initial test with 90 words 

Distribution Length (letters) Frequency per million  

min 3 22.22 

1st quartile 6 111.11 

median 8 144.44 

3rd quartile 9.75 88.88 

max 15 33.33 


 34 
 

mean 8.13 144.44 

SD 2.58 677.77 

 
 Next, a compilation of 90 non-words was generated. These non-words were pseudowords, but 

they followed similar patterns to real words. For example, verbs in Spanish end in -ar, -er, or ir, 

and non-word verbs followed that same pattern (i.e., real word - comer; non-word - riñonar). 

ChatGPT 3.5 was used to generate non-word items after it was given the authentic word list and 

was prompted to create 90 non-words, including nouns, verbs, and adjectives, similar to the real 

word list. It was also prompted to use different parameters, such as including suffixes analogous 

to various grammatical categories (e.g., adding mente for an adverb) or verbs with Spanish 

characteristics (e.g., ending in -ar, -er, -ir). ChatGPT 3.5 generated non-word parts of speech, 

including nouns (toráciz, cachorreo, parasíticoz), adjectives (pulgario, abdominado, 

anestesiático), and verbs (autolimitantear, alérgizer, brinquir). 

After the creation of these 90 non-word items, they were reviewed to verify that the 

words did not exist. To do this, two dictionaries were consulted – the Real Academia Española 

(2014) and the Real Academia Nacional de Medicina (2012). Non-words were searched in both 

dictionaries to verify that they did not exist in either of them. If they did exist, a new non-word 

was created for which it could be substituted. There were only ten words that ChatGPT 3.5 

generated as non-words but were, in fact, real words. They included ótico (otic), lipémico 

(lipemic), intratorácico (intrathoracic), cremar (to cremate), sondeo (probing), remisivo 

(remissive), venenoso (poisonous), blandir (brandish), and esterilizante (sterilizing). Table 2 

shows the distribution, length, and frequency per million words of the selected non-words. Note 


 35 
 

that the frequency per million words is listed as N/A due to the fact that these are non-words, and 

so, frequency per million words cannot be calculated. 

Table 2 

Characteristics of the non-words items included in the initial test with 90 non-words 

Distribution Length (letters) Frequency per million  

min 3 N/A 

1st quartile 6 N/A 

median 8 N/A 

3rd quartile 9.75 N/A 

max 15 N/A 

mean 8.13 N/A 

SD 2.58 N/A 

 
3.2.2   Procedure and Participants  

To determine which real words and non-words were to be included in the vocabulary 

assessment, the list of all words was distributed to students at different proficiency levels in the 

form of a written paper test. The test (Appendix A) was administered in person so that the 

participants did not have access to the internet nor could they consult a dictionary or online 

translator to verify whether a word existed or not. In the first part of the test, participants 

answered a short questionnaire where they indicated the level of the class they were taking, their 

age, race, and what their primary language was. Then, participants were given a written 

explanation of the test with an example of four common words (sí - yes; sacapuntas - pencil 


 36 
 

sharpener; bien - good; casa - house) so that they could better understand how the test functioned 

(Amenta et al., 2020). The instructions for this first part of the test were provided in English. 

This test was given to students enrolled at a large western university in the U.S. 

Participants were divided into two main groups. The first group consisted of primary Spanish 

speakers (n=23), and the second group was both heritage Spanish speakers and Spanish L2 

speakers (n=180). The second group was further divided according to the level of the class in 

which the participants were enrolled, following previous research. There were 49 participants at 

the 100-level, 35 at the 200-level, 63 at the 300-level, 30 at the 400-level, and three at the 500-

level. Participants were then given the 180 items (90 real words and 90 non-words) that were 

randomly ordered, and they were asked to identify the real words on the list. All participants 

received the same order of real words and non-words. Participants were not given a specific time 

limit to complete the test; however, no participant took more than 15 minutes to complete it.  

3.2.3  Data Analysis 

The first part of the assessment collected demographic data. This data was analyzed using 

descriptive statistics and was used to separate the participants into two main groups (native 

speakers vs. heritage and L2 speakers) as well as across the different proficiency levels. The 

vocabulary test was then analyzed using a series of statistical tests to determine which 60 real 

words and 30 non-words were to be included in the final version. First, following previous 

research (Lemhöfer & Broersma, 2012; Brysbaert, 2013), the tests were scored using the 

following formula: 

Score = Nyes to words – 2 * Nyes to non-words 


 37 
 

This formula was used as it penalized participants for random guessing behavior, meaning that if 

a participant answered arbitrarily (i.e., affirmatively responding to half of the real words and half 

of the non-words), they would obtain a score close to zero. This formula also allows for 

participants to earn a negative score if they incorrectly categorized non-words as real words. 

Then, point-biserial correlations and Item Response Theory (IRT) were conducted. Point-

biserial correlation coefficients range from -1.0 to +1.0. For the purposes of this study, 

participants with a positive correlation exhibited superior performance on the overall test while 

also excelling on the specific item in question. However, a negative correlation implied that the 

test was not successfully measuring the intended outcome, as it suggested that proficient 

participants were performing less favorably on the specific item compared to their lower 

proficiency counterparts. An IRT analysis was also conducted to determine if items on the 

assessment were accurately measuring the proficiency of the participants. This type of analysis 

provided information related to both the difficulty level of the item and its discriminatory power. 

Discriminatory power refers to the steepness of the item response curve, transitioning from “not 

known” at the lower end of the ability spectrum to “known” at the upper end.  

Once the most effective test items were selected, test performance across groups of 

participants was compared using an Analysis of Variance (ANOVA) to determine if the test 

could discriminate across proficiency levels. The R package ltm (Rizopoulos, 2006) was utilized 

for all analyses in this study. 

3.3  Results 

Participants’ (n=203) ages ranged from 18-70 with a mean age of 21.65. See Table 3 for 

mean, standard deviation (SD), and age range by class level. The majority of participants’ self-


 38 
 

identified as White (n=142; 69.95%), which was followed by Hispanic/Latino (n= 44; 21.67%), 

other (n = 11; 5.42%), or Black/African American (n = 6; 2.96%). When asked about their 

primary language, the majority replied that English was their L1 (n= 178; 87.68%), while a 

smaller portion indicated that Spanish was their first language (n = 25; 12.32%). 

Table 3 

Distribution of age according to the level of the class 

Group Mean SD Range 

100 21.51 7.53 47 (65-18) 

200 18.94 8.50 52 (70-18) 

300 20.25 1.38 7 (25-18) 

400 18.74 9.75 51 (70-20) 

500 25.5 3.62 8 (30-22) 

Native speakers 29.8 8.81 22 (43-21) 

 
3.3.1  Point-biserial Correlation 

To select the items to be included for the final test, the quality of both the real words and 

non-words was analyzed by calculating the point-biserial correlation between the responses to 

each item and the cumulative scores of the participants. All items had a positive correlation 

(ranging from r = 0.08 for the word hez/feces to r = .68 for the word glándula/gland). Since there 

were only positive correlations, no items could be eliminated, and an IRT needed to be 

conducted next.   


 39 
 

3.3.2  Item Response Theory 

The results from the IRT analysis produced charts with difficulty of the word item along 

the x-axis and probability of correctly answering the item along the y-axis. This information then 

allowed for the selection of items that spanned a broad spectrum of difficulty using 

discriminatory power, which was measured by the steepness of the response curve at its 

midpoint. A steeper curve indicated greater discriminatory power of the item. Figures 1 and 2 

show the IRT charts for real words and non-words. 

Figure 1 

IRT of real words  

 
 40 
 

Figure 2 

IRT of non-words  

 
Based on the findings from the IRT analysis, a selection of 60 real words and 30 non-

words of varying difficulty levels with strong discriminatory power was made. This selection 

process involved ranking the items based on their difficulty levels and choosing those with 

optimal discriminatory power at approximately 1/30th of the total range covered by the items 

(Amenta et al., 2020; Izura et al., 2014). This process occurs so that the most difficult words can 

be excluded to ensure better discrimination between participants at the highest proficiency levels. 


 41 
 

Table 4 presents a selection of sample words, where the intercept represents the IRT item 

difficulty, and z1 indicates the discriminative power of each word relative to the others.  

Table 4 

Discriminative Power and IRT Scores for Selected Items 

Item  Intercept  z1  

Cremación (cremation)  1,338,778,108  16,326,639  

Irritar (to irritate) 0.508833260  11,035,564  

Remisar (non-word) -4,156,332,499  0.1710680  

Enroncharar (non-word) -4,156,332,499  0.1710680  

 
Based on these results, the final word list was determined, and descriptive statistics can be seen 

in Table 5. The table presents the distribution of word and non-word lengths. Word length 

ranged from three to 14 letters, with a median of nine, while non-word length spanned from 

seven to 15 letters, with a median of 11. The frequency per million varied across words, 

averaging 16,821.35, whereas non-words do not have frequency values assigned. 

Table 5 

Descriptive statistics of the final words 

Distribution Words Non-words 

  Length 
(letters) 

Frequency per 
million 

Length 
(letters) 

Frequency per 
million 

min  3 2,320.19 7 N/A 

1st quartile  7 2,320.19 9 N/A 

median  9 8,120.65 11 N/A 


 42 
 

3rd quartile  11 23,201.86 12 N/A 

max  14 102,088.17 15 N/A 

mean  8.98 16,821.35 10.74 N/A 

SD  2.94 21,472.48 2.53 N/A 

 
3.3.3  Comparisons across proficiency levels 

The formula for scoring the final test as described previously was used to calculate 

participants’ scores. Possible scores ranged from -180 (if a participant selected all non-words as 

real and all real words as non-words) to 90 (if a participant accurately identified all real words 

and refrained from selecting any non-words as real). Table 6 shows the groups of participants by 

level along with their mean score, standard deviation, and the range of test scores.  

Table 6 

Test scores by proficiency group 

 
Group Mean score SD Range 

100 -5.59 16.52 63 (-44 – 19) 

200 1.97 20.66 114 (-85 – 29) 

300 15.28 17.59 96 (-30 – 66) 

400 18.06 18.36 71 (-25 – 46) 

500 47.67 25.10 54 (18 – 72) 

Native speakers 66.20 11.63 28 (52 – 80) 


 43 
 

A one-way ANOVA was conducted to determine the difference in mean scores across the 

six proficiency groups. Overall, there was a statistically significant difference among the groups 

(F(5, 197) = 25.852, p < 0.001). Tukey post hoc analyses revealed evidence of a significant 

difference between all levels except 100 vs. 200 (p = 0.4089), 300 vs. 400 (p = 0.9759) and 500 

vs. native speakers (p = 0.5451).  

3.4  Discussion  

The objective of the initial investigation was to create and validate an instrument to test 

the acquisition of receptive vocabulary relevant to Spanish for Veterinarians. To do so, test items 

that possess the capability to effectively differentiate across proficiency levels were statistically 

determined. Unlike general vocabulary tests, such as the LexITA (Amenta et al., 2020) or the 

Lextale-Esp (Izura et al., 2014) which measure general vocabulary, this instrument focuses on 

technical terms for veterinary professionals. This section discusses the effectiveness of the 

LexVet-Esp in assessing specialized vocabulary knowledge by comparing its results to existing 

assessments, evaluating the strengths and limitations of the statistical analyses, and determining 

its ability to differentiate proficiency levels among participants. Finally, limitations and future 

directions are presented. 

3.4.1  Comparison of the LexVet-Esp with LexITA and Lextale-Esp  

While the LexITA (Amenta et al., 2020) and Lextale-Esp (Izura et al., 2014) evaluate 

general receptive vocabulary proficiency, the LexVet-Esp has been specifically developed to 

assess field-specific vocabulary knowledge within the area of Spanish for Veterinarians.  

Although all three assessments use a combination of real words and non-words to measure 

vocabulary recognition, the LexVet-Esp prioritizes technical terminology in veterinary 

consultations, which ensures its relevance for professionals of the field. To prioritize this 


 44 
 

specialized terminology, the selection of real word items was guided by a spoken corpus of 

veterinary interactions in Spanish. 

The methodology for the selection of non-words was likewise informed by the same 

approaches used in the LexITA and Lextale-Esp instead employed artificial intelligence 

alongside manual validation to ensure that the non-words resembled real words in form and 

structure while lacking semantic value (Amenta et al., 2020; Izura et al., 2014). This step is 

critical in mitigating guessing behavior, as participants are forced to identify between real words 

and non-words, which is a fundamental aspect of assessing receptive vocabulary (Ha, 2021; 

Masrai, 2023). The incorporation of non-words assured that participants were required to possess 

actual lexical knowledge to distinguish real words from non-words, thereby enhancing the 

reliability of the assessment. 

3.4.2  Addressing the Limitations of Point-Biserial Correlations with IRT 

All test items displayed positive point-biserial correlations, indicating that participants 

who performed well overall were more likely to correctly identify individual items. However, 

while this confirmed a basic relationship between item difficulty and participant ability, the 

analysis did not provide additional information to differentiate item performance across varying 

proficiency levels. This limitation aligns with prior observations in the LexITA (Amenta et al., 

2020) and Lextale-Esp (Izura et al., 2014), where point-biserial analysis was useful for initial 

evaluation but insufficient for deeper item validation. Given the need for a more precise 

assessment of individual items, IRT was employed to provide a more comprehensive analysis of 

item difficulty and discrimination across proficiency levels. The IRT analysis demonstrated the 

assessment’s capacity to distinguish across learners of varying proficiency levels, with both 

simpler and more challenging items effectively differentiating individuals with lower proficiency 


 45 
 

from those with higher proficiency. The broad variety of word difficulties incorporated into the 

final assessment was necessary to guarantee that the evaluation was sensitive to the full spectrum 

of language proficiency levels. 

3.4.3  Differentiation of Proficiency Levels 

The LexVet-Esp scores demonstrated significant differences across proficiency levels. 

Higher-proficiency groups, particularly those at the 500-level and native speakers, tended to 

perform better in correctly identifying vocabulary items. This finding partially supports the 

hypothesis that vocabulary acquisition is progressive (Nation, 2022), with learners expanding 

their lexicon to include increasingly complex and less commonly used terminology as they 

advance in their studies (Borawski, 2019). Lower-proficiency learners showed limitations in 

identifying specialized terminology. This trend aligns with the Lextale-Esp findings, where 

participants at lower proficiency levels struggled to identify infrequent vocabulary items (Izura et 

al., 2014). The data in the current study reflect similar patterns, reinforcing the notion that 

specialized vocabulary knowledge is not solely a product of exposure but also an indicator of 

overall linguistic proficiency (Masrai, 2023). Additionally, and consistent with previous studies 

(Amenta et al., 2020; Izura et al., 2014), participants who frequently selected non-words tended 

to score lower overall. This result aligns with prior research suggesting that non-word 

recognition can serve as a general indicator of language proficiency (Roche & Harrington, 2013). 

Overall, the results indicate the test’s ability to differentiate learners ranging from novices to 

advanced speakers. This finding underscores the instrument’s potential for assessing specialized 

language acquisition with a level of precision comparable to established vocabulary assessments. 


 46 
 

3.4.4 Limitations  

Firstly, while the sample size was considered sufficient for analyses, the reliance on a 

single data collection site limits the extent to which the findings can be generalized. With the 

incorporation of participants from various geographical areas and/or backgrounds, especially 

those with different levels of exposure to the Spanish language, the validity of the results would 

be significantly enhanced. This consideration is especially relevant when reflecting on previous 

assessments, such as LexITA and Lextale-Esp, which featured participants from multiple 

universities and diverse linguistic backgrounds, strengthening the credibility of their conclusions 

(Amenta et al., 2020; Izura et al., 2014). Additionally, the small number of participants at the 500 

level (n=3) represents a clear limitation as it limits the insights of higher proficiency learners. 

Another limitation is that the proficiency level was measured by course enrollment level, which 

may not be the most accurate measure of language proficiency. Future research should 

incorporate a standardized proficiency test to determine the level of participants.  

 
 47 
 

CHAPTER 4: MEASURING VOCABULARY ACQUISTION WITH THE LEXVET-ESP IN A 
VIRTUAL REALITY ENVIRONMENT 

 
4.1  Introduction to Project  

The principal objective of Project 2 was to implement the LexVet-Esp in a Spanish for 

Veterinary Medicine course to test participants’ vocabulary acquisition and retention before and 

after using a platform that combined VR and AI. The project also aimed to compare the results of 

two distinct instructional modalities for vocabulary acquisition: explicit versus implicit 

vocabulary exercises. Explicit vocabulary instruction involves direct explanation of word 

meanings, uses, and forms, followed by focused practice. In contrast, implicit instruction 

integrates vocabulary exposure within wider communicative activities, avoiding direct emphasis 

on the words. By contrasting these approaches, the project attempted to gain insights into the 

roles of explicit and implicit instruction in vocabulary acquisition in a VR/AI-enhanced 

educational platform. Information regarding the platform and activities will be detailed in the 

upcoming sections. The specific research questions guiding this project were as follows: 

1. What is the impact of a VR/AI platform on the acquisition of receptive vocabulary in a 

specialized language course on Spanish for Veterinary Medicine? 

2. How does the use of explicit versus implicit vocabulary activities impact students’ 

vocabulary learning in a VR/AI-based context in a course on Spanish for Veterinary 

Medicine? 

3. How effective is VR/AI-based vocabulary practice for long-term retention of specialized 

vocabulary in Spanish for Veterinary Medicine? 


 48 
 

4.2  Methodology 

4.2.1  Participants and Context  

Participants in this project were students enrolled in the Spanish for Veterinary Medicine 

certificate in the DVM program at a large, western university in the U.S. Out of all the students 

taking the certificate, fifteen completed all phases of this project and were included in the study. 

Participation in this study was optional and did not impact students’ final grades.  

Participants were enrolled in the second two-credit course of the elective certificate. The 

course met weekly for a duration of 50 minutes, and students were expected to complete two 

hours of work outside of class time. The course followed a flipped classroom in which students 

were required to complete a series of online preparatory activities prior to each face-to-face 

class. These preparatory materials included tasks designed to develop reading comprehension, 

listening proficiency, vocabulary enhancement, and speaking skills and to familiarize students 

with essential thematic content and vocabulary pertinent to the in-person class. The face-to-face 

sessions prioritized active, communicative use of the language. During class sessions, students 

engaged in structured speaking exercises, discussions, and task-oriented learning activities that 

reinforced the material addressed in the preparatory phase.  

4.2.2  Materials  

A non-immersive VR/AI platform, MeTabi, was used for this project. MeTabi is an 

online language learning platform dedicated to enhancing students’ linguistic proficiency within 

specified professional contexts that do not require headsets and spatial interaction, just the use of 

a computer or phone. Currently, it is available in over ten languages and uses Language Coaches, 

or characters that use AI, to help create a scaffolded learning process for students. This 


 49 
 

framework permits the development of a variety of activities that focus on four essential skills: 

oral comprehension, written comprehension, oral production, and written production. As MeTabi 

is an AI-enhanced but screen-based system, learners benefit from immediate feedback generated 

by the AI system but do not experience real-time sensory immersion. For the purposes of this 

study, a virtual veterinary clinic was built in the MeTabi environment (Figures 3 and 4), and 

MeTabi was implemented into the Spanish for Veterinary Medicine course during the second 

half of the semester. Students used Metabi to complete their required outside of class work. 

Figure 3 

One of the consultation rooms in MeTabi 

 
 50 
 

Figure 4 

The clinic’s laboratory with different diagnostic tools  

 
From week nine to week twelve, students completed explicit vocabulary activities. 

Examples can be seen in Figures 5-9. Figure 5 shows an activity in which participants watched a 

video talking about colors, textures, pet behaviors, etc., and images appeared alongside a list of 

words describing those images. At the end of the video, the participants completed a drag and 

drop activity where they moved each word into the correct category. Figure 6 shows a picture 

matching activity, and Figure 7 is a crossword puzzle. Figures 8 and 9 represent two parts of the 

same activity. For this activity, participants were presented with 18 flashcards, and each 

contained a picture and the corresponding word in Spanish. The first part of the activity required 

participants to translate the words into English, and in the second part, they were asked to 

complete a dialogue between the client and the doctor using the words from the flashcards 

(Figures 8 and 9).  


 51 
 

Figure 5  

Drag and drop activity 

 
Figure 6  

Matching activity 

 
 52 
 

Figure 7 

Crossword puzzle 

 
Figure 8  

Part 1 of the flashcard activity  

 
 53 
 

Figure 9  

Part 2 of the flashcard activity 

 
During the remaining three weeks, there were implicit vocabulary exercises. Vocabulary 

was introduced through listening comprehension activities, oral activities with the Language 

Coaches, and pronunciation practice. No words were highlighted or explained to the students, 

and instead, the focus was on seeing how words were used in context. An example of this type of 

activity can be seen in Figure 10. In this example, participants watched a video after which they 

were asked to create five questions that they would ask their client in order to gather additional 

information about their pet. Another type of activity involved an AI Language Coach, as seen in  

Figure 11. For this activity, the participants were given a prompt that guided them to simulate a 

conversation in which they needed to use specific vocabulary that appeared in previous activities, 

even though those activities focused on grammar or listening comprehension. Another example 

activity, seen in Figure 12, included participants listening to three different clients, taking notes 

regarding the pet’s health history, and then recommending a diagnostic procedure. This entire 


 54 
 

activity required participants to utilize vocabulary seen in prior activities without explicit 

instruction. 

Figure 10  

Video with listening comprehension and writing questions 

 
Figure 11 

Language Coach activity  

 
 55 
 

Figure 12 

Listening and speaking activity 

 
4.2.3  Procedure  

The LexVet-Esp was implemented to evaluate students’ vocabulary acquisition and was 

made up of two parts. The first part included a short questionnaire where students indicated their 

age, race/ethnicity, and their L1. This was followed by an explanation of the test with an 

example of common words so that they could better understand how the test functioned 

(Appendix B). The instructions for this part of the test were provided in English. Then, 

participants completed the LexVet-Esp by identifying real words from the list of 90 total words, 

60 of which were real words and 30 were non-