Academic Vocabulary in Psychology Research Articles: A Corpus-based Study*
Ismail Xodabande1 & Nasrin Xodabande2
Kharazmi University, Tehran, Iran, Independent Researcher, Tehran, Iran
Contact:  ismail.kh.tefl@gmail.com, nasrin2966@gmail.com
* This is a refereed article.

This is a refereed article. Received: 10 December, 2019. Accepted: 3 April, 2020.

This is an open-access article distributed under the terms of a CC BY-NC-SA 4.0 license
Abstract: The current corpus-based study investigated the lexical profile of psychology research articles based on General Service List (GSL) (West, 1953) and Academic Word List (AWL) (Coxhead, 2000). To this end, a corpus of 8,500 psychology research articles with around 74 million words was analyzed. The results showed that the AWL accounted for 13.12% of the tokens in the corpus. Further computer analysis of the corpus revealed that 472 out of 570 word families in the AWL have been used frequently in psychology research articles. The study also identified 693 word types outside the GSL and the AWL which occurred frequently in the corpus and accounted for 6.1% of the tokens. Finally, the findings of this study revealed that 1,537 high frequent AWL and non-GSL/AWL word types (rather than word families) provided around 17.91% coverage of the corpus, while the high ranking 570 word types in this list accounted for about 13.44% of the corpus which is higher than the coverage of the 570 AWL word families combined (with about 3000 types). Based on these findings, the study concluded that although the AWL is a valuable pedagogical resource for teaching academic vocabulary, there is a need to develop more restricted and discipline specific word lists to cater for the needs of students in different subject areas. The study also highlights the significance of these findings.

Keywords: academic vocabulary, corpus linguistics, research articles, English for Academic Purposes (EAP), psychology


Resumen: El presente estudio investigó el perfil léxico de los artí­culos cientí­ficos de psicologí­a basado en la Lista de Servicios Generales (GSL) (West, 1953) y la Lista de Palabras Académicas (AWL) (Coxhead, 2000). Con este fin, se analizó un corpus de 8.500 artí­culos con alrededor de 74 millones de palabras. Los resultados mostraron que el AWL representaba el 13,12% de los tokens en el corpus. Análisis informáticos posteriores del corpus revelaron que 472 de 570 familias de palabras en el AWL se han utilizado con frecuencia en artí­culos de investigación psicológica. El estudio también identificó 693 tipos de palabras fuera del GSL y el AWL que ocurrieron con frecuencia en el corpus y representaron el 6.1% de los tokens. Finalmente, los hallazgos de este estudio revelaron que 1,537 tipos de palabras frecuentes de alto nivel de AWL y no GSL / AWL (en lugar de familias de palabras) proporcionaron alrededor del 17.91% de cobertura del corpus, mientras que los 570 tipos de palabras de alto rango en esta lista representaron aproximadamente 13.44 % del corpus que es más alto que la cobertura de las 570 familias de palabras AWL combinadas (con aproximadamente 3000 tipos). Con base en estos hallazgos, el estudio concluyó que, aunque el AWL es un recurso pedagógico valioso para enseñar vocabulario académico, existe la necesidad de desarrollar listas de palabras especí­ficas más restringidas y disciplinarias para satisfacer las necesidades de los estudiantes en diferentes materias. El estudio también destaca la importancia de estos hallazgos.

Palabras Clave: vocabulario académico, lingüstica de corpus, artí­culos de investigación, inglés con fines académicos (EAP), psicologí­a


Introduction

Identifying and categorizing academic and discipline-specific vocabulary is important to a variety of stakeholders in English for Academic Purposes (EAP) programs. According to Coxhead and Nation (2001), this type of vocabulary refers to those items that occur with reasonably higher frequency across various academic genres, but with much lower frequency in other text types. It has been argued that learning academic vocabulary is a major challenge for first year undergraduates (Li & Pemberton, 1994), and knowledge of academic vocabulary is essential for reading academic texts and for successful writing in different subject areas (Corson, 1997). As a result, over the past years, there has been a concern among teachers and researchers to develop different vocabulary lists to serve the needs of language learners (Farrell, 1990; Xue & Nation, 1984). Since its creation, the Academic Word List (AWL) has been employed extensively in EAP programs, materials development, and vocabulary tests (Coxhead, 2011). According to Coxhead and Nation (2001), the pedagogical value of the AWL as a teaching instrument lies in the fact that when combined with General Service List (GSL) (West, 1953), it covers about 90% of the words in most academic texts. Coxhead (2011) also claims that the AWL has a great potential in helping instructors and students to set vocabulary learning goals by focusing on the most useful vocabulary items in EAP programs. The development of this list, which contains 570 word families was based on a corpus of 3.5 million words, featuring academic textbooks and journals, selected from arts, commerce, law, and science (Coxhead, 2000).

Despite its widespread use and acceptance as a benchmark for materials developments in EAP (Huntley, 2006; Schmitt & Schmitt, 2005; Wells, 2007), a number of studies have questioned the usefulness of a common core approach for identifying an academic word list in order to satisfy the needs of a diverse group of learners in different English for Specific Purposes (ESP) courses (Chen & Ge, 2007; Durrant, 2017; Hyland & Tse, 2007). In this regard, it has been strongly argued that the knowledge of specific vocabulary in a given field is largely related to the content knowledge of that discipline (Hyland, 2002, 2006; Woodward-Kron, 2008). A serious criticism leveled against the AWL is that the list is too general since offers language learners some vocabulary items that they don’t need, and it limits their exposure to those items they do need (Chen & Ge, 2007; Hyland & Tse, 2007; Paquot, 2007). In order to address these shortcomings, a number of discipline-specific vocabulary lists have been developed (Green & Lambert, 2018; Hajiyeva, 2015; Hsu, 2013; Konstantakis, 2007; Lei & Liu, 2016; Tangpijaikul, 2014; Wang et al., 2008; Ward, 2009). In spite of its shortcomings, the value of the AWL has been acknowledged as a great resource for learners and instructors (Eldridge, 2008). Alongside with the GSL, the AWL has been employed as the base list for identifying and categorizing specialized vocabulary for a number of disciplines (e.g., Chen & Ge, 2007; Csomay & Prades, 2018; Dang & Webb, 2014; Khani & Tazik, 2013; Li & Qian, 2010; Martínez et al., 2009; Mozaffari & Moini, 2014; Valipouri & Nassaji, 2013; Vongpumivitch et al., 2009; Yang, 2015).

Given the needs of graduate students and researchers in most EFL contexts to read published research and publish their own research in international journals in English (Martínez et al., 2009; Valipouri & Nassaji, 2013), there remains a need to investigate the vocabulary learning needs of students in different subject areas. However, according to Coxhead (2018), while the increased demand for STEM (Science, Technology, Engineering and Mathematics) education for international students has inspired a great deal of attention among researchers to these fields, the humanities have not been as thoroughly researched in university vocabulary studies, and many subject areas including biology, chemistry, and psychology have received scant attention. Given the point that no previous study investigated the AWL presence and its coverage in psychology research articles, the current study aims to fill this gap in the literature. Moreover, due to technological and software developments in corpus linguistics in recent years which have made it possible to analyze much larger corpora in vocabulary studies, this study sets out to investigate a very large corpus of psychology research articles (74 million words) to provide a detailed understanding of their lexical profile.

Review of Related Literature: AWL across Disciplines

A number of studies have investigated the coverage of the AWL in different text types in various academic disciplines (see Table 1 for a summary of some related studies in a chronological order). In this section, the findings of these studies will be summarized.

Study

Type of corpora

Size (words)

AWL coverage

Chen and Ge (2007)

Medical research articles

190,425

10.07%

Hyland and Tse (2007)

Professional and learner texts across a variety of genres from sciences, engineering, and social sciences

3,292,600

10.60%

Konstantakis (2007)

Business English course books

600,000

4.66%

Martínez et al. (2009)

Agriculture research articles

826,416

9.06%

Vongpumivitch et al. (2009)

Applied linguistics research papers

1,500,000

11.17%

Li and Qian (2010)

Hong Kong Financial Services Corpus

6,279,702

10.46%

Khani and Tazik (2013)

Applied linguistics research articles

1,553,450

11.96%

Valipouri and Nassaji (2013)

Chemistry research articles

4,000,000

9.96%

Mozaffari and Moini (2014)

Education Research Articles

1,710,989

4.94%

Shabani and Tazik (2014)

ESP and Asian EFL Journal Research Articles

320,310

14.89%

Hajiyeva (2015)

Subject-specific university textbooks for English majors

508,802

6.50%

Tongpoon-Patanasorn (2018)

A sub-corpus of the Khon Kaen University Business English (KKU BE) Corpus

10,093,425

10.52%

Table 1: A summary of some recent studies investigating AWL in different texts types

Some studies investigating the AWL in various contexts provided different profiles regarding its coverage. For example, two studies reported that the AWL accounted for less than 5% of their analyzed corpora (Konstantakis, 2007; Mozaffari & Moini, 2014). Nonetheless, Shabani and Tazik (2014) investigated the presence of the AWL items in 80 research articles (with 320,310 running words) selected from two Asian EFL and ESP journals, and they reported that the AWL covers about 14.89% of their corpus. A study by Konstantakis (2007) in particular indicated that the GSL and the AWL words provided a total coverage of 90% in the corpus of business English course books with 600,000 running words, with the AWL accounting for only 4.66% of this coverage. By establishing a Business Word List, this study found some vocabulary items with high frequency occurrences in the corpus that provided an additional coverage of 2.79%. It should be noted, however, that as these studies investigated relatively small corpora, their results might be biased as the size of the corpus is crucial for occurrence of some lexical items (Sinclair, 2005). More specifically, the size of the corpus is of prime importance in studying specialized and academic vocabulary. Unlike high frequent vocabulary, these items tended to occur with much less frequency in specialized domains.

In another study, Hyland and Tse (2007) explored the distribution of the AWL word families in a multi-genre and multi-discipline corpus of 3.3 million words, which was principally compiled based on sound criteria and balanced among various disciplines. By providing a strong case for the impracticality of a common core approach to identify and classify academic vocabulary, this study concluded that “although the AWL covers 10.6% of the corpus, individual lexical items on the list often occur and behave in different ways across disciplines in terms of range, frequency, collocation, and meaning” (p. 235). The findings of the study also emphasized that despite the merits and considerable coverage of the AWL in academic texts of different genres, it “might not be as general as it was intended to be” (p. 235), so there is a need to develop more restricted and discipline-based word lists. In a more recent study with similar conclusions, Hajiyeva (2015) analyzed a 508,802-word corpus of subject-specific university textbooks for frequency, distribution, and coverage of the AWL and the British National Corpus (BNC) frequency-based word families. Based on the findings of this study, the AWL world families constituted a very small proportion of the total words in the corpus (i.e., 6.5%), providing further support for the claim made by Hyland and Tse (2007).

Furthermore, Li and Qian (2010) investigated the presence of the AWL items in Hong Kong Financial Services Corpus (HKFSC) and reported that the GSL and the AWL in total covered about 83.09% of the tokens in their analyzed financial texts. The findings of this study also revealed that the 570 AWL word families covered around 10.46% of 6,279,702 running words in the finance corpus. In another study of a multimillion-word corpus of finance texts, Tongpoon-Patanasorn (2018) found that the AWL items cover about 10.52% of finance sub-corpus (10,093,425 words) of the Khon Kaen University Business English (KKU BE) Corpus. A number of other studies have investigated the coverage of the AWL items in research articles (Chen & Ge, 2007; Khani & Tazik, 2013; Martínez et al., 2009; Valipouri & Nassaji, 2013; Vongpumivitch et al., 2009). In one of the early studies of this category, Chen and Ge (2007) found that 292 out of the 570 the AWL word families were frequently used in medical research articles written in English. The AWL words also accounted for around 10.07% of their 190,425 running word corpus. Findings also indicated that 111 AWL word families were used infrequently, and 99 families were never used in medical research articles. Furthermore, high-frequency AWL items were used differently in medical research articles than in the AWL sub-lists compiled by Coxhead (2000). Investigating the presence of the AWL items in the five sections of medical research articles (i.e., abstract, introduction, materials and methods, results, and discussion) showed that the AWL items were dispersed throughout the articles and had varying rhetorical functions in different sub-sections of research papers. These findings are in line with other studies which have concluded that the AWL is far from being a complete academic vocabulary list for a wide range of subject areas and field of studies (Hajiyeva, 2015; Hyland & Tse, 2007).

In another study with both quantitative and qualitative analysis, Martínez et al. (2009) investigated the academic vocabulary in agriculture research articles. They reported that the cumulative coverage of the GSL and the AWL accounted for about 76.59% of the whole corpus with 826,416 running words, while the AWL represented around 9.06% of the tokens. Moreover, the findings of this study revealed that 37.50% of the AWL (out of 3107 types) did not occur at all in the corpus of agriculture research articles. Qualitative analysis of the corpus also revealed that some words from the AWL had technical meanings in the agriculture research articles corpus. Similar to the findings reported by Chen and Ge (2007), Martínez et al. (2009) also found that the use of the AWL items in different sections of research articles vary considerably, and the lowest and the highest number of the AWL types were used in the results and the discussion sections respectively. It should be noted that the majority of studies investigating the AWL in various corpora are mostly quantitative (and hence limited), and the study by Martínez et al. (2009) in particular demonstrated that in order to better understand the behavior of the AWL items in a given field, qualitative analyses are of prime importance.

Vongpumivitch et al. (2009) investigated the coverage of the AWL in applied linguistics research articles and reported that the AWL items accounted for 11.17% of the corpus; nonetheless, this study did not provide any account of combined coverage of GSL and AWL items in articles. In another study of applied linguistics research articles with the same size corpus, Khani and Tazik (2013) found that the AWL items accounted for around 11.96% of all tokens in the corpus, and when combined with GSL, the cumulative coverage of the two lists reached 88%. This coverage is higher than 86.1% coverage reported by Coxhead (2000), and considerably larger than results obtained results by Martínez et al. (2009) which was 76.59%. Valipouri and Nassaji (2013) also investigated the frequency and distribution of the AWL in a corpus of 1,185 chemistry research articles containing four million words. Results of the latter study revealed that 327 out of 570 AWL word families occurred frequently in the corpus of chemistry research articles, and the AWL items accounted for about 9.60% of tokens in the whole corpus. Moreover, non-GSL/AWL items accounted for about 24.57% of all tokens, which means that the two lists provided approximately 75% coverage of the tokens in the 4 million words corpus.

The comprehensive view offered by these studies indicates that the AWL covers around 10% of most academic texts (Coxhead, 2000; Coxhead & Byrd, 2007); however, its coverage of different texts varies considerably among some disciplines. In this regard, there remains a need to further investigate the vocabulary profile of academic texts in less studied disciplines. Given the fact that the field of psychology has been neglected in vocabulary studies, the current study aims to fill this gap by investigating the distribution and frequency of the AWL (and non-GSL/AWL) items in psychology research articles.

The Study

Coxhead and Nation (2001) divided English vocabulary into four categories: (1) high-frequency or general service vocabulary, (2) academic vocabulary, (3) technical vocabulary and (4) low-frequency vocabulary. Nation and Waring (1997) argued that beginner English language learners should focus on the first 2000 most frequently occurring word families of English in the GSL, which constitute the majority of spoken and written language in their various forms. For those students in English for Academic Purposes (EAP) programs, a major source of difficulty is academic vocabulary (Li & Pemberton, 1994). According to Farrell (1990) academic or semi-technical vocabulary falls somewhere between technical and general words and is viewed as “formal, context-independent words with a high frequency and/or wide range of occurrence across scientific disciplines, not usually found in basic general English courses; words with high frequency across scientific disciplines” (p. 11).

The current study aimed to develop an academic word list for psychology students. To this end, it investigated the lexical profile of psychology research articles based on GSL (West, 1953) and AWL (Coxhead, 2000). The following research questions were addressed:

  1. What is the coverage of AWL in psychology research articles corpus?
  2. Which items from the AWL occur more frequently in psychology research articles?
  3. Which lexical items occur frequently in psychology research articles, but are not included in the GSL and the AWL lists?

The Corpus

The current study adopts the criteria proposed by Sinclair (2005) in terms of size, balance, and representativeness. A corpus of psychology research articles was compiled and analyzed. First, AntCorGen software (Anthony, 2019) which is a freeware tool for creating discipline-specific corpora was used, and a corpus of 20,000 psychology research articles containing around 143,000,000 words was created. This very large corpus was representative of experimental research articles genre (Swales, 1990) in the field of psychology, and it contained articles from all sub-areas of this discipline, including cognitive psychology, developmental psychology, and social psychology. In order to create a more manageable corpus for further analysis, a second corpus was created by assigning a number to every research article, followed by a random selection of 8,500 articles out of 20,000 (with approximately 74,000,000 running words). These research articles were then grouped randomly into 20 sub-corpora, each containing 425 research articles with around 3,700,000 running words. It should be noted that for the purpose of current study, all sections of psychology research articles including abstracts, body (introduction, materials and methods, results, discussion, and conclusion), references, and appendices were collected and analyzed.

Software for Analysis

The computer software used for lexical profiling of psychology research articles in this study was AntWordProfiler (Anthony, 2014), which is a freeware tool available for analyzing the vocabulary level and the complexity of texts. The GSL and the AWL are the default word lists that come with AntWordProfiler. The software compares the texts loaded into the program against a set of vocabulary level lists and generates vocabulary statistics and complete frequency information about the corpus.

Data Analysis

For the purpose of this study, the frequency and distribution of word families and types in the corpus were analyzed based on the GSL and the AWL word lists. Furthermore, the obtained outputs from AntWordProfiler were used to identify frequently used general service and academic vocabulary, and also frequently used non-GSL/AWL items in psychology research articles. In order to complete this identification following Coxhead (2000), three criteria including range, frequency, and specialized occurrence were used for profiling the GSL and the AWL items in psychology research articles. As for range, AWL (and non-GSL/AWL) words which occurred in all 20 sub-groups of the corpus were included in the list of the most frequent items in psychology research articles. For frequency, the word forms and types had to occur at least 28.5 times in a million words (2100 times in the entire corpus and at least 105 time in each of 20 sub-corpora) to be included in the high frequent lexical items list. For specialized occurrence, the selected items had to be outside the most frequently occurring word families in English based on the GSL (West, 1953).

A major concern in developing word lists is how to determine the unit of counting including tokens, types, lemmas, and families. A common approach employed in most corpus studies is using word families, defined as the base word plus its inflected forms and transparent derivations (Bauer & Nation, 1993). For example, vocabulary items including anticipate, anticipated, anticipates, anticipating, anticipation, anticipations, anticipatory, and unanticipated are all members of a single word family with the word anticipate being the headword. The underlying assumption in this approach is that knowledge of the base word in a word family facilitates the understanding of its derived forms (Coxhead, 2000; Xue & Nation, 1984). However, this view has been challenged recently, and a number of studies have questioned the usefulness of word families as a unit for counting; thus, lemmas are used instead (Brezina & Gablasova, 2015; Gardner & Davies, 2014; Lei & Liu, 2016). In this regard, a major concern which is specifically related to learning English as a foreign language (EFL) is that using headwords in developing word lists simply assumes that knowing one family member contributes to the knowledge of all the other members of the same family for the less proficient learners, and this is misleading (Ward, 2009). Moreover, the headwords in the AWL expand to around 3,000 word types, making it even more difficult for EFL learners to learn them out of context. In order to analyze the coverage of the GSL and the AWL items in psychology research articles based on these considerations, Coxhead’s (2018) word family has been used as the unit of analysis. Nonetheless, for creating a more restricted and pedagogically useful list, the current study included high frequent word types (defined as single word forms) in Psychology Academic Word List (Appendix).

Finally, the study ensures validity concerns by a principled creation of a corpus of psychology research articles in terms of size, balance, and representativeness (Sinclair, 2005) and the reliability of findings by analyzing the data with computers, which are much accurate and faster than human analysis. Moreover, most of the similar studies conducted to investigate lexical profile of different corpora have used the Range software (Coxhead, 2000) which was developed nearly two decades ago and has not been updated since that time. Currently the AntWordProfiler (Anthony, 2014) is the best software available for lexical profiling of texts, which provides a better analysis of data with some additional and useful features for researchers (for more information see: https://www.laurenceanthony.net/software/antwordprofiler).

Results and Discussion

The focus of the current study was (1) on profiling the frequency and the coverage of the GSL + AWL in psychology research articles, (2) identifying the most useful and high frequency academic words for psychology discipline, and (3) identifying frequently occurring words in psychology research articles which are not included in the GSL and the AWL lists. The following subsections will present related results and discussions with respect to the aforementioned goals.

Coverage of the GSL and the AWL in the Corpus

Table 2 shows the overall lexical profile of a corpus of psychology research articles analyzed in this study. Results indicated that the GSL word families accounted for about 72.08% of 74,016,481 tokens in the corpus; the first most frequent words in English based on this list accounted for 66.14% while the second 1,000 word families covered only 5.94% of the corpus. The AWL word families also accounted for 9,708,661 tokens, which are 13.12% of the corpus, and together with the GSL, the cumulative coverage of these two-word lists reached 85.2%. Regarding the AWL word families, results indicate that almost all 570 of the AWL word families have been used in psychology research articles written in English. Finally, non-GSL/AWL items constituted 109,573,27 tokens, or 14.8% of the corpus.

Word Lists

Token

Token%

Cumtoken%

Type

Group

1st GSL

48,953,298

66.14

66.14

3982

998

2nd GSL

4,397,195

5.94

72.08

3371

985

AWL

9,708,661

13.12

85.2

2942

569

Non-GSL/AWL

10,957,327

14.8

100

14,2818

14,2818

TOTAL

74,016,481

       

Table 2: Coverage of GSL and AWL in the larger psychology research articles corpus

Comparing these findings with previous studies indicated that coverage of the AWL items in psychology research articles is higher than research articles published in some other disciplines. For example, the AWL coverage of 13.12% in this study is higher than 11.17% coverage reported by Vongpumivitch et al. (2009), and 11.96% reported by Khani and Tazik (2013) for applied linguistics research articles. It is also considerably higher than the AWL coverage of 10.07% for medical research articles (Chen & Ge, 2007), 9.06% for agriculture research articles (Martínez et al., 2009), and 9.96% in chemistry research articles (Valipouri & Nassaji, 2013). One explanation for this higher coverage might be the fact that in psychology research articles analyzed in this study, almost all word families from the AWL were used. However, in the study conducted by Martínez et al. (2009) for example, 37.50% of the AWL items did not occur at all in the corpus of agriculture research articles. In terms of cumulative coverage of the GSL and the AWL items in psychology research articles, the results obtained in this study also differ from the previous studies on agriculture, applied linguistics, and chemistry research articles. In this regard, the current study indicated that both lists accounted for 85.2% of all tokens in the corpus, which is considerably higher than 76.59% in research articles in agriculture (Martínez et al., 2009), and 75% in chemistry (Valipouri & Nassaji, 2013) . Nonetheless, this level of coverage is less than 88% coverage of the GSL and the AWL items reported for applied linguistics research articles (Khani & Tazik, 2013; Vongpumivitch et al., 2009).

Frequently Used AWL Items in the Corpus

Regarding the most frequently used AWL word families in psychology research articles, 472 out of 570 word families from the AWL met the criteria set for this study. Further analysis also revealed that these 472 word families accounted for about 12.98% of all tokens in the corpus. This means that the remaining 98 word families from the AWL used in psychology research articles covered only 0.14% of the tokens in the corpus. Table 3 displays the 50 most frequent AWL word families found in the corpus which accounted for about 5.71% of all tokens. Results also indicate that the 100 most frequent AWL word families accounted for 8% of the tokens in the corpus, which is impressive. Considering the word types, the results indicate that 842 word types from the AWL occurred frequently in the corpus (see the Appendix for the full list), accounting for 8,767,820 tokens, and around 11.84% of the corpus.

Rank

Headword

Frequency

AWL sub-lists

Rank

Headword

Frequency

AWL sub-lists

1

participate

323,527

2

26

consist

73,256

1

2

significant

192,933

1

27

affect

70,210

2

3

analyse

176,985

1

28

identify

66,359

1

4

task

174,505

3

29

method

65,630

1

5

respond

167,710

3

30

predict

65,045

4

6

vary

153,543

1

31

estimate

58,143

1

7

data

139,776

1

32

range

56,455

2

8

individual

129,132

1

33

statistic

54,529

4

9

process

104,675

1

34

error

53,771

4

10

visual

103,851

8

35

select

51,368

2

11

factor

102,717

1

36

image

50,341

5

12

indicate

96,124

1

37

outcome

49,216

3

13

item

94,806

2

38

evident

47,623

1

14

research

93,842

1

39

bias

46,472

8

15

perceive

90,681

2

40

hypothesis

46,266

4

16

function

83,572

1

41

category

46,243

2

17

interact

83,139

3

42

evaluate

45,506

2

18

positive

82,258

2

43

investigate

45,149

4

19

journal

80,149

2

44

accurate

44,752

6

20

assess

80,089

1

45

context

44,416

1

21

negate

79,016

3

46

contrast

43,565

4

22

target

77,922

5

47

distribute

41,704

1

23

specific

75,638

1

48

stress

41,511

4

24

previous

74,505

2

49

structure

41,215

1

25

similar

73,474

1

50

proceed

40,057

1

Table 3: The 50 most frequent AWL families in psychology research articles

As presented in Table 4, 22 out of the 50 most frequent AWL families in psychology research articles would be grouped with Coxhead’s (2000) first sub-list, 11 with the second, five with the third, and seven with sub-list 4. The results also indicated that some AWL word families that occurred very frequently in psychology research articles would be grouped under sub-lists 6 and 8 in Coxhead (2000). Some examples include visual, bias, and accurate. Comparing these findings to the results reported by Hyland and Tse (2007) revealed that five word families from the top ten most frequent AWL headwords in their corpus also appeared among 10 most frequent AWL items in psychology research articles. These include significant, analyze, vary, data, and process which seem to be common to most academic discourse. Significant, analyze, and data were also among the 10 most frequent AWL items in agriculture research articles (Martínez et al., 2009), where also 14 AWL families from the top 50 are shared with psychology research articles. There were, however, fewer shared items with chemistry research articles, and only 12 in the top 50 AWL headwords identified by Valipouri and Nassaji (2013) are also among the top 50 in psychology research articles. These findings further support the claims made by previous studies regarding the impracticality a common core word list for a variety of disciplines and fields of study (Hyland & Tse, 2007; Martínez et al., 2009). The findings also underscore the need for creating more restricted and needs-based word lists for different groups of learners. Nonetheless, it should be acknowledged that the AWL has a great pedagogical value in teaching academic vocabulary for psychology discipline as it provided a reasonable coverage of research articles analyzed in this study.

Frequently Used non-GSL/AWL Items in Psychology Research Articles

The results of corpus analysis revealed that 693 word types outside the GSL and the AWL occurred frequently in psychology research articles and met the criteria set for the current study. These 693 types accounted for 4,492,608 tokens, and their cumulative coverage was around 5.7% of the corpus. Table 5 shows the frequency information for the 20 most frequently occurring non-GSL/AWL word types in the corpus. The 10 most frequent types include stimuli, non, scores, patients, stimulus, cognitive, emotional, score, correlation, and emotion which occurred 605,674 times in the corpus and accounted for about 0.82% of all tokens.

Rank

Headword

Frequency

Rank

Headword

Frequency

1

stimuli

82,151

11

symptoms

33,864

2

non

76,065

12

clinical

28,764

3

scores

69,488

13

baseline

27,794

4

patients

68,432

14

personality

27,590

5

stimulus

63,731

15

spatial

27,465

6

cognitive

63,225

16

temporal

26,893

7

emotional

58,478

17

emotions

26,703

8

score

53,233

18

ratings

26,627

9

correlation

35,530

19

questionnaire

26,276

10

emotion

35,341

20

auditory

25,727

Table 4: The 20 most frequent non-GSL/AWL word types in psychology research articles

Further analysis of non-GSL/AWL items found in the output data also revealed that there were a considerable number of non-word items (i.e., signific, correl, …), which were probably caused by the way the AntCorGen software (Anthony, 2019) generated the corpus (i.e., collecting research articles from PLOS database and creating text files). These items, doi numbers, and other non-word characters accounted for about 2.58% of the corpus. Finally, analyzing non-GSL/AWL items against BNC-COCA list 31 and 32, which are for abbreviations and proper nouns, it was found that around 4% of all tokens in the corpus fall into these categories.

Implications for Teaching Vocabulary

The findings of this study have some implications for teaching vocabulary for psychology students. First, as the results indicate, the coverage of the AWL items in psychology research articles is considerable, and 472 out of 570 word families accounted for about 13.12% of the tokens in the corpus. In this regard, the AWL should be considered as a valuable pedagogical resource for teaching EAP students in the field of psychology with huge potential for assisting them in their reading and (probably) writing psychology research articles. However, this study also found that some AWL items (i.e., 98 families) are used very infrequently in psychology research articles and accounted for 0.14% of the corpus. This means that although focusing on materials published based on AWL (e.g., Huntley, 2006; Schmitt & Schmitt, 2005; Wells, 2007) can help psychology students a lot; a better approach is to focus on items which are more relevant to the discipline of psychology. In this regard, the findings of this study can help teachers in EAP programs select the appropriate word types from the AWL in order to focus their teaching on those items based on students’ needs.

Second, this study also revealed that there are some non-GSL/AWL items in psychology research articles that occurred with high frequency but are not included in vocabulary lists. For international and non-native English-speaking psychology students, these highly relevant but less frequent words in everyday English pose a learning burden, and teachers need to consider these aspects while teaching (Ward, 2009; Yang, 2015). As these items occurred with high frequency in psychology research articles, there is a considerable value in teaching them if teachers and students invest some time on mastering these items. Moreover, the results of this study further supported the need for creating more restricted and discipline specific vocabulary list to serve the needs of specific groups of students. The list provided in the Appendix includes 1,537 word types which occurred frequently in the 74 million corpus of psychology research articles used in this study. These word types accounted for almost 17.91% of all tokens in the corpus, which roughly means that one in every six words from psychology research articles is a member of this list.

Furthermore, our analysis revealed that the 570 high-ranking AWL and non-GSL/AWL word types from the aforementioned list provided 13.44% coverage of the corpus. This finding is quite interesting as learning these items (even in isolation or by list learning) is less challenging for students than learning the 570 AWL word families which expand into around 3,000 word types. Finally, when combined with the GSL, these 1,537 word types provided about 90% coverage of the corpus, and when the proper nouns and abbreviation were added, the overall coverage even reached around 94%. In this regard, the list provided in the Appendix can be regarded as an academic word list for the discipline of psychology, and it has a great pedagogical value in helping psychology students and their EAP instructors set vocabulary learning goals which are aligned with their disciplinary needs.

According to Webb and Nation (2017), certain conditions are needed for vocabulary learning to take place, which include meaningful repetition and significant encounters with target words. In this regard, beside published materials based on AWL (e.g., Huntley, 2006; Schmitt & Schmitt, 2005; Wells, 2007), recent developments in ICT technologies can provide students and teachers with more tools and opportunities for learning and teaching vocabulary. Li and Qian (2010) recommend using the AWL highlighter and the AWL Gapmaker as two applications for learning the AWL. With the growing importance of mobile technologies in foreign language learning and teaching and numerous affordances provided by them (Godwin-Jones, 2017; Reinhardt, 2018), there are even further possibilities to integrate them into language learning programs. AWL Builder Multilingual, which is a free application developed by EFL Technologies for Android devices (available in Google Play Store), is an example of available tools for teaching academic vocabulary. This mobile application allows selecting specific target words from 570 AWL word families to be studied and uses intelligent flashcard technology to help students to learn and review selected items (the definitions are provided in simple English). The application also keeps detailed records of the learning progress with the possibility of emailing the report to teachers. In this regard, mastering frequently occurring AWL items in psychology research articles by using this application can help psychology students a lot.

Conclusion

The current study investigated the frequency and coverage of the AWL items in psychology research articles using a corpus of 74 million words. The findings indicated that the AWL items accounted for 13.12% of all tokens in the corpus. The corpus was further analyzed to identify frequently used AWL and non-GSL/AWL items in psychology research articles. The results indicated that 472 AWL word families were used frequently in the corpus and that 693 word types outside the GSL/AWL lists were used frequently. In the Appendix, 1,537 word types are listed with their frequency information in the corpus, providing a cumulative coverage of 17.91% of psychology research articles. Despite acknowledging the value of the AWL (Coxhead, 2000) as a pedagogical resource for EAP programs, the findings of this study provided further support for the need for creating more discipline specific word lists for various fields of study (Hyland & Tse, 2007), as the same number of 570 word types and not families provided higher coverage of the corpus.

The current study had some limitations. First, the AntCorGen (Anthony, 2019) software which collects only open access and freely available articles from the PLOS database in compiling the corpus. However, in order to compensate for this limitation, a very large corpus was created that contained articles written by both native and non-native English speakers; then, articles were randomly selected for compiling a second corpus for further analysis based on principled criteria. Second, this study was quantitative in nature and the behavior of the AWL and other frequently used items in research articles was not examined qualitatively. Despite providing a general picture of the lexical profile of psychology research articles, this study’s findings did not provide any insights on how the AWL and high frequent non-GSL/AWL items are used in the field of psychology to perform rhetorical functions. Finally, following previous studies (Chen & Ge, 2007; Hajiyeva, 2015; Lei & Liu, 2016; Martínez et al., 2009; Muñoz, 2015; Shabani & Tazik, 2014; Valipouri & Nassaji, 2013), the GSL and the AWL were used as the base lists in order to analyze psychology research articles. Although these lists provided a considerable coverage of the corpus and the AWL is still a benchmark for most published materials in EAP, there remains a need to investigate the coverage of the newly developed word lists such as the New General Service Lists (Brezina & Gablasova, 2015; Browne et al., 2013b) and New Academic Word List (NAWL) (Browne et al., 2013a) across various academic genres. Future studies can also use both quantitative and qualitative methods in their investigation to provide a better picture of vocabulary use in specific genres and develop more pedagogically sound approaches to teach academic and disciplinary vocabulary for EAP students and graduate students.

 

References

Anthony, L. (2014). AntWordProfiler (Version 1.4.1) [Computer software]. Waseda University. https://www.laurenceanthony.net/software

Anthony, L. (2019). AntCorGen (Version 1.1.2) [Computer software]. Waseda University. https://www.laurenceanthony.net/software

Bauer, L., & Nation, P. (1993). Word Families. International Journal of Lexicography, 6(4), 253–279. https://doi.org/10.1093/ijl/6.4.253

Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1–22. https://doi.org/10.1093/applin/amt018

Browne, C., Culligan, B., & Phillips, J. (2013a). New Academic Word List (NAWL). http://www.newgeneralservicelist.org/nawl-new-academic-word-list

Browne, C., Culligan, B., & Phillips, J. (2013b). New General Service List (NGSL). http://www.newgeneralservicelist.org

Chen, Q., & Ge, G.-C. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL word families in medical research articles (RAs). English for Specific Purposes, 26(4), 502–514. https://doi.org/10.1016/j.esp.2007.04.003

Corson, D. (1997). The learning and use of academic English words. Language Learning, 47(4), 671–718. https://doi.org/10.1111/0023-8333.00025

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–236. https://doi.org/10.2307/3587951

Coxhead, A. (2011). The Academic Word List 10 years on: Research and teaching implications. TESOL Quarterly, 45(2), 355–362. https://doi.org/10.5054/tq.2011.254528

Coxhead, A. (2018). Vocabulary and English for specific purposes research. Routledge. https://doi.org/10.4324/9781315146478

Coxhead, A., & Byrd, P. (2007). Preparing writing teachers to teach the vocabulary and grammar of academic prose. Journal of Second Language Writing, 16(3), 129–147. https://doi.org/10.1016/J.JSLW.2007.07.002

Coxhead, A., & Nation, P. (2001). The specialised vocabulary of English for academic purposes. In J. Flowerdew & M. Peacock (Eds.), Research Perspectives on English for Academic Purposes (pp. 252-267). Cambridge University Press.

Csomay, E., & Prades, A. (2018). Academic vocabulary in ESL student papers: A corpus-based study. Journal of English for Academic Purposes, 33, 100–118. https://doi.org/10.1016/j.jeap.2018.02.003

Dang, T. N. Y., & Webb, S. (2014). The lexical profile of academic spoken English. English for Specific Purposes, 33, 66–76. https://doi.org/10.1016/j.esp.2013.08.001

Durrant, P. (2017). Lexical bundles and disciplinary variation in university students’ writing: Mapping the territories. Applied Linguistics, 38(2), 165–193. https://doi.org/10.1093/applin/amv011

Eldridge, J. (2008). “No, there isn’t an ‘academic vocabulary,’ but…”: A reader responds to K. Hyland and P. Tse’s “Is there an ‘academic vocabulary’?” TESOL Quarterly, 42(1), 109–113. https://doi.org/10.1002/j.1545-7249.2008.tb00210.x

Farrell, P. (1990). Vocabulary in ESP - A lexical analysis of the English of electronics and a study of semi-technical vocabulary (CLCS Occasional Paper No. 25) (ED332551). Trinity College.

Gardner, D., & Davies, M. (2013). A new academic vocabulary list. Applied Linguistics, 35(3), 305–327. https://doi.org/10.1093/applin/amt015

Godwin-Jones, R. (2017). Smartphones and Language Learning. Language Learning & Technology, 21(2), 3–17. https://dx.doi.org/10125/44607

Green, C., & Lambert, J. (2018). Advancing disciplinary literacy through English for academic purposes: Discipline-specific wordlists, collocations and word families for eight secondary subjects. Journal of English for Academic Purposes, 35, 105–115. https://doi.org/10.1016/j.jeap.2018.07.004

Hajiyeva, K. (2015). A corpus-based lexical analysis of subject-specific university textbooks for English majors. Ampersand, 2, 136–144. https://doi.org/10.1016/j.amper.2015.10.001

Hsu, W. (2013). Bridging the vocabulary gap for EFL medical undergraduates: The establishment of a medical word list. Language Teaching Research, 17(4), 454–484. https://doi.org/10.1177/1362168813494121

Huntley, H. (2006). Essential academic vocabulary: Mastering the complete Academic Word List. Houghton Mifflin Company.

Hyland, K. (2002). Specificity revisited: How far should we go now? English for Specific Purposes, 21(4), 385–395. https://doi.org/10.1016/S0889-4906(01)00028-X

Hyland, K. (2006). English for Academic Purposes: An advanced resource book. Routledge.

Hyland, K., & Tse, P. (2007). Is there an “academic vocabulary”? TESOL Quarterly, 41(2), 235–253. https://doi.org/10.1002/j.1545-7249.2007.tb00058.x

Khani, R., & Tazik, K. (2013). Towards the development of an academic word list for applied linguistics research articles. RELC Journal, 44(2), 209–232. https://doi.org/10.1177/0033688213488432

Konstantakis, N. (2007). Creating a business word list for teaching business English. Estudios de Linguistica Inglesa Aplicada (ELIA), 7, 79–102. http://revistas.uned.es/index.php/ELIA/article/download/18091/15242

Lei, L., & Liu, D. (2016). A new medical academic word list: A corpus-based study with enhanced methodology. Journal of English for Academic Purposes, 22, 42–53. https://doi.org/10.1016/j.jeap.2016.01.008

Li, S.-L., & Pemberton, R. (1994). An investigation of students’ knowledge of academic and subtechnical vocabulary. Proceedings Joint Seminar on Corpus Linguistics and Lexicology, Guangzhou and Hong Kong, pp. 183–196.

Li, Y., & Qian, D. D. (2010). Profiling the Academic Word List (AWL) in a financial corpus. System, 38(3), 402–411. https://doi.org/10.1016/j.system.2010.06.015

Martínez, I. A., Beck, S. C., & Panza, C. B. (2009). Academic vocabulary in agriculture research articles: A corpus-based study. English for Specific Purposes, 28(3), 183–198. https://doi.org/10.1016/j.esp.2009.04.003

Mozaffari, A., & Moini, R. (2014). Academic words in education research articles: A corpus study. Procedia - Social and Behavioral Sciences, 98, 1290–1296. https://doi.org/10.1016/j.sbspro.2014.03.545

Muñoz, V. L. (2015). The vocabulary of agriculture semi-popularization articles in English: A corpus-based study. English for Specific Purposes, 39, 26–44. https://doi.org/10.1016/j.esp.2015.04.001

Nation, P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, acquisition, and pedagogy (pp. 6–19). Cambridge University Press.

Paquot, M. (2007). Towards a productively oriented academic word list. In J., K. Walinski, K. Kredens & S. Gozdz-Roszkowski (Eds.), Practical applications in language and computers 2005 (pp. 127-140). Peter Lang.

Reinhardt, J. (2018). Social media in second and foreign language teaching and learning: Blogs, wikis, and social networking. Language Teaching, 52(1), 1–39. https://doi.org/10.1017/S0261444818000356

Schmitt, D., & Schmitt, N. (2005). Focus on vocabulary: Mastering the academic word list. Longman.

Shabani, M. B., & Tazik, K. (2014). Coxhead’s AWL across ESP and Asian EFL Journal Research Articles (RAs): A corpus-based lexical study. Procedia - Social and Behavioral Sciences, 98, 1722–1728. https://doi.org/10.1016/j.sbspro.2014.03.599

Sinclair, J. (2005). Corpus and Text – Basic Principles. In M. Wynne (Ed.), Developing Linguistic Corpora: a Guide to Good Practice. Oxbow Books.

Swales, J. M. (1990). Genre analysis: English in academic and research settings. Cambridge University Press.

Tangpijaikul, M. (2014). Preparing business vocabulary for the ESP classroom. RELC Journal, 45(1), 51–65. https://doi.org/10.1177/0033688214522641

Tongpoon-Patanasorn, A. (2018). Developing a frequent technical words list for finance: A hybrid approach. English for Specific Purposes, 51, 45–54. https://doi.org/10.1016/j.esp.2018.03.002

Valipouri, L. & Nassaji, H. (2013). A corpus-based study of academic vocabulary in chemistry research articles. Journal of English for Academic Purposes, 12(4), 248–263. https://doi.org/10.1016/j.jeap.2013.07.001

Vongpumivitch, V., Huang, J.-Y., & Chang, Y.-C. (2009). Frequency analysis of the words in the Academic Word List (AWL) and non-AWL content words in applied linguistics research papers. English for Specific Purposes, 28(1), 33–41. https://doi.org/10.1016/j.esp.2008.08.003

Wang, J., Liang, S.-I, & Ge, G.-C. (2008). Establishment of a medical academic word list. English for Specific Purposes, 27(4), 442–458. https://doi.org/10.1016/j.esp.2008.05.003

Ward, J. (2009). A basic engineering English word list for less proficient foundation engineering undergraduates. English for Specific Purposes, 28(3), 170–182. https://doi.org/10.1016/j.esp.2009.04.001

Webb, S. & Nation, P. (2017). How vocabulary is learned. Oxford University Press.

Wells, L. (2007). Vocabulary mastery 1: Using and learning the Academic Word List. University of Michigan Press.

West, M. (1953). A General Service List of English Words. Longman, Green.

Woodward-Kron, R. (2008). More than just jargon – the nature and role of specialist language in learning disciplinary knowledge. Journal of English for Academic Purposes, 7(4), 234–249. https://doi.org/10.1016/j.jeap.2008.10.004

Xue, G., & Nation, I. S. P. (1984). A university word list. Language Learning and Communication, 3, 215–229.

Yang, M.-N. (2015). A nursing academic word list. English for Specific Purposes, 37(1), 27–38. https://doi.org/10.1016/j.esp.2014.05.003  


Contact us

mextesoljournal@gmail.com
We Are Social On

Log In »
MEXTESOL A.C.

MEXTESOL Journal, vol, 44, núm. 3, 2020, es una publicación cuadrimestral editada por la Asociación Mexicana de Maestros de Inglés, MEXTESOL, A.C., Versalles 15, Int. 301, Col. Juárez, Delegación Cuauhtémoc, C.P. 06600 Mexico, D.F., Mexico, Tel. (55) 55 66 87 49, mextesoljournal@gmail.com. Editor responsable: Jo Ann Miller Jabbusch. Reserva de Derechos al uso Exclusivo No. 04-2015-092112295900-203, ISSN: 2395-9908, ambos otorgados por el Instituto Nacional de Derecho del Autor. Responsable de la última actualización de este número: Asociación Mexicana de Maestros de Inglés, MEXTESOL, A.C. JoAnn Miller, Versalles 15, Int. 301, Col. Juárez, Delegación Cuauhtémoc, C.P. 06600 Mexico, D.F., Mexico. Fecha de última modificación: 31/08/2015. Las opiniones expresadas por los autores no necesariamente reflejan la postura del editor de la publicación. Se autoriza la reproducción total o parcial de los textos aquí­ publicados siempre y cuando se cite la fuente completa y la dirección electrónica de la publicación.

MEXTESOL Journal, vol, 44, no. 3, 2020, is a quarterly publication edited by Asociación Mexicana de Maestros de Inglés, MEXTESOL, A.C., Versalles 15, Int. 301, Col. Juárez, Delegación Cuauhtémoc, C.P. 06600 Mexico, D.F., Mexico, Tel. (55) 55 66 87 49, mextesoljournal@gmail.com. Editor-in-Chief: Jo Ann Miller Jabbusch. Exclusive rights are reserved (No. 04-2015-092112295900-203, ISSN: 2395-9908), both given by the Instituto Nacional de Derecho del Autor. JoAnn Miller, Asociación Mexicana de Maestros de Inglés, MEXTESOL, A.C., Versalles 15, Int. 301, Col. Juárez, Delegación Cuauhtémoc, C.P. 06600 Mexico, D.F., Mexico is responsible for the most recent publication. Date of last modification: 31/08/2015. The opinions expressed by the authors do not necessarily reflect those of the publication. Total or partial reproduction of the texts published here is authorized if and only if the complete reference is cited including the URL of the publication.

License

MEXTESOL Journal applies the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license to everything we publish.