The Impact of Using Corpus-Based Approach on Self-Correction in Grade 10 EFL Students’ Writing ()
1. Introduction
Many EFL learners have errors in writing while learning a foreign language due to the differences between the main language and the target language. These errors need to be avoided in certain ways. Teachers’ teaching methods and techniques may prevent errors from occurring, facilitate students’ language acquisition and enhance their English competence. Using online corpora is one of these effective tools. Due to the advancement in technology, corpus linguistics facilitated the availability of corpora for language teachers and learners. For example, corpus concordance programs support learners with a huge amount of authentic language and create new learning opportunities that weren’t available before (Aston, 2001). Using inductive learning, the learners can observe grammar and vocabulary usage in concordance and explore the rules and usage. Moreover, the learners are able to use corpora to test the rules they have learned and to classify the rules and the data of the concordance through the application of certain rules and patterns.
According to Bennet (2010), a corpus is “a principled collection of authentic texts stored electronically” (p. 2). McEnery and Hardie (2012) also defined a corpus as a large “body” of texts that are stored electronically. The analyst can use corpus by uploading it into corpus software and then employing certain methods on the software, such as running frequency or concordance lists to show results. Corpora are used for statistical analysis, hypothesis testing, checking occurrences, or validating linguistic rules within a specific language. In linguistics, a corpus is defined as a “collection of linguistic data (usually contained in a computer database) used for research, scholarship and teaching” (Nordquist, 2020).
Johns (1991) proposed Data-driven learning (DDL) approach as one of the tools that help avoid this problem, for it is based on discovery-oriented learning. Qoura, Hassan and Mostafa (2018) claimed that through corpus-based data-driven learning approach, students are able to access corpus examples and concordance lines to analyze several examples and engage in autonomous learning. Flowerdew (2010) showed that corpus examples are useful in helping students recognize the sentence pattern and correct lexicogrammatical errors. With the corpus-based approach, educators and students can engage with a vast collection of authentic language examples from everyday life, or corpus. The examples used originate from reliable sources such as books, papers, and conversations. Using corpus-based approach, students are shown how native speakers use authentic language. Through the examination of authentic language patterns, students can enhance their comprehension of concepts like grammar, vocabulary, and collocations. Additionally, students can assess how effectively they are writing or speaking by comparing it to a corpus. They can see how native speakers or skilled writers utilize these structures appropriately, which makes it easier to spot errors in collocations or article usage (Abdel-Haq & Ali, 2017). This results in greater learning autonomy from students being able to identify and correct their own errors because they are more conscious of their weaknesses and know how to fix them based on real language use. Therefore, the purpose of this study is to examine whether the use of corpus-based approach can help students improve their writing performance, specifically the ability to correct the lexicogrammatical errors.
Statement of the Problem
Many students often struggle with lexicogrammatical errors in their writing due to their inability to notice them and the influence of the native language on the target language. This is clearly shown in their writing, especially in the use of collocations, word meaning and choice, lexical confusion, use of articles, and subject-verb agreement. As a result, students are unable to correct the errors, find the reason behind them and the way for avoiding them for knowing their rules and usage is a challenging task leading to poor essays.
Based on actual data and observations from numerous tasks on student writing, errors in the following categories have been recognized in this study:
a) Adjective + Noun and Verb + Noun collocations
Due to direct translation from their original language, which results in uncommon combinations, many students struggle with incorrect collocations. Students may write “make a decision” rather than “take a decision”, “do a mistake” instead of “make a mistake”, “fast divorce” rather than “quick divorce”.
b) Word Meanings:
Students frequently combine terms that sound or have similar meanings, which causes them to choose words incorrectly. This is commonly typical when it comes to near synonyms, like “affect” and “effect”.
c) Students may find English articles confusing, particularly those whose native tongues either don’t have articles at all or have distinct rules. Incorrect use of “a/an” or omission or addition of “the” are frequent problems.
Incorrect: The time is important.
Correct: Time is important.
d) Subject-Verb Agreement:
When sentences are lengthy or the subject and verb are separated by other terms, students often make mistakes in subject-verb agreement. This is even worse in languages where the singular and plural subjects have no influence on the verb forms.
In essays produced by non-native speakers, subject-verb agreement errors are noticed in longer or more complicated phrases or sentences.
Incorrect: The communication between the parents disappear.
Correct: The communication between the parents disappears.
e) Use of Prepositions:
Students frequently make mistakes while using prepositions since their usage can vary greatly between languages and can be informal in English. Translating anything literally often results in mistakes such as “depend of” instead of “depend on”.
f) Lexical Confusion:
Students who are unfamiliar with the formality of English vocabulary may use words incorrectly or misinterpret words with similar spellings or meanings, which is known as lexical confusion. Lexical confusion frequently involves phrasal verbs, idiomatic phrases, or words that students come across in formal or academic writing.
Incorrect: They were too much tired.
Correct: They were extremely tired.
Therefore, there is a need for applying corpus-based approach, which allows students to overcome this problem and enables them to identify and correct their own errors.
2. Literature Review
2.1. Importance of Corpora in Error Correction in EFL Academic Writing
Research has shown that corpus concordancing data are extremely important for understanding the collocational meanings of lexicogrammatical structures (Sinclair, 2004; Liu, 2010; Liu & Jiang, 2009). Since DDL was promoted in language teaching and learning, some studies have been conducted on how students use corpora as reference materials to resolve linguistic issues in writing.
Some researchers primarily concentrate on the lexical errors made by students in these investigations. Yoon and Hirvela (2004) conducted a study on 8 intermediate students of ESL writing class and 15 students at a US University. These students were asked to use Collins COBUILD as a corpus tool to help with their writing. The findings showed the effectiveness of using corpus in English language writing. Moreover, several studies have shown the importance of using corpus examples to help learners with lexicogrammatical errors and enhance their performance (Coxhead & Bryd, 2007; Flowerdew, 2010; Tribble, 2009). When EFL learners don’t have the ability to use corpus, sufficient training should be given to the learners to benefit from using corpora in writing and this corpus should be chosen appropriately as easy-handed corpora through which enough training and guidance are given. Tono, Satake and Miura (2014) examined the types of errors that were appropriate for correction by using corpora. They concentrated on three types of errors while students used corpus in the process of revising essays in English as a foreign language. The results showed a big difference in accuracy rate when students consulted the corpus. Omission and addition errors were easily identified while misinformation errors had low accuracy. Tsai (2019) explored in his study the effectiveness of using inductive learning where learners can observe grammar and vocabulary usage in concordance. In deductive learning, the learners use corpora to check the rules and apply the grammatical pattern. Therefore, through these activities, the learners are motivated and their discovery is enhanced especially in noticing the lexicogrammatical usage. Moreover, O’Sullivan and Chambers (2006) made research on 14 undergraduate French major students to show the effects of using corpus in correcting errors and the evaluation of the learner for the process of consulting a corpus. It showed that 122 changes (73%) were correct among the 166 changes they made by consulting a corpus. Therefore, corpus consultation proved to be useful in reducing native language interference and correcting all types of errors.
Several studies showing the importance of using concordances in teaching writing and creating a positive classroom environment were conducted by many researchers. When students master lexical and grammatical accuracy, their confidence will increase, and there will be a possible increase in the quality of their writing (Yoon, 2008). Gilquin (2021) investigated the students’ attitudes after adopting a construction grammar approach to examine the effectiveness of Data-driven learning, which involved concordance-based activities. After applying pre-test and several post-tests on high intermediate learners who participated in this study, results showed a significant improvement in the scores through different post-tests. The students’ attitudes were evaluated through a questionnaire, which suggested the usefulness of using concordance activities to create a positive and interesting classroom environment. Another study conducted by Jantarabang and Tachom (2021) examined the effectiveness of the use of a corpus-based approach on the grammatical development of students’ narrative writing paragraphs. Using a mixed-method design, 34 students in grade 11 served as participants. The results of learner diaries, semi-structured interviews, and pre- and post-writing assessments demonstrated that students’ use of corpus websites improved their writing abilities and reduced the amount of grammatical errors in their pieces of writing. In their study, Yang, Harn and Hwang (2019) investigated certain ways where students can improve their writing. When it came to text rewriting, the experimental group (N = 15) that employed the bilingual concordance performed better than the control group (N = 17). Additionally, the concordance gave students real-world examples that reduced their errors and made them more conscious of linguistic blunders, whereas the control group simply used particular instances from the online dictionary, which had minimal impact on their writing.
These studies emphasized how crucial it is to use data-driven corpus instruction using concordance analysis to address errors in students’ papers. As a result, it presents the findings of several earlier investigations that evaluated how well utilizing computer-based concordances for error correction worked.
On a pedagogical level, these studies show the effectiveness of using online corpora in error treatment. This tool can be used by teachers as a way of providing indirect feedback to students to enable them to identify their errors and work on correcting them using corpus-based approach. Moreover, the previous studies encouraged an active learning approach where students become aware of their errors in writing and become the center of the process of learning.
2.2. Implementation of Learner’s Corpus in ESL/EFL Classrooms
Many researchers have emphasized the important function of the computer-based programs and their efficacy in raising students’ productivity in language development-related domains. Users can utilize the concordance lines provided by the computer application. To see how the word is used in context, they first input the target word. The student is provided with every instance of the term that they looked for. Figure 1 shows an example of how a learner might insert the phrase “their” into the search process.
Figure 1. Sketch engine concordancer when searching for the word “their”.
The Key word in context (KWIC) refers to the type of concordance lines that will be used in this study and are shown in Figure 2. The exact moment the reading process should start is indicated by the search phrase, which is in the middle. Students are expected to search for patterns in either or both directions. Reminding students that the examples of a word’s usage in the text come from different places might help them understand that they are not expected to read the text as a coherent work of writing or speaking. They observe, search for instances that are similar, look for errors, and then generalize a rule.
Figure 2. Screenshot of the concordance lines for “their” in sketch engine.
This study attempts to answer the following research questions:
1) How does the use of concordances affect students’ ability to correct lexicogrammatical errors?
2) How does the use of concordance lines improve students’ writing performance?
3. Research Method
3.1. Research Design
The experimental research approach is used to assess the effectiveness of the treatments and monitor the academic development of the participating students, which suits the nature of the research, finding the effect of the independent variable (corpus-based approach through concordances) on the dependent variables (error correction). According to Ary, Jacob, Irvin and Walker (2018) this design is the most effective one for comparing participant groups and assessing the extent of change caused by treatments or interventions. One experimental group was chosen for this study, which used corpora and concordances, and a comparison was made before and after corpus use.
3.2. Participants
The participants are 25 students with an average age of 16 as an experimental group. Participants are selected from the school lists as they are available in the private sector. They are all pre-university students (grade 10) at a private school in Beirut. The participants study English as a foreign language and barely use it outside the classroom. Moreover, the students come from the same socio-economic status and share common characteristics. They have sufficient computer literacy and study 4 sessions of English for 45 minutes each per week.
Grade 10 students tend to make common lexical and grammatical errors. A corpus-based approach can improve students’ grammar, vocabulary, and language skills and enable them to write more advanced texts. This is especially important for grade 10 students as it prepares them for higher levels and more challenging writing assignments.
3.3. Instruments
The pre- and post-tests will be used to conduct the study, analyze the results, and discuss them to answer the research questions. All the participants in the experimental group are given three correction tasks: a pre-test and a post-test. These tests gather information about students’ lexicogrammatical errors before and after the experiment. These tests were conducted by the researcher and another experienced EFL teacher to validate them. After ensuring the content’s validity by experienced teachers, the researcher uses a test to measure the number and the type of errors among grade ten students at the private school.
Additionally, three writing activities with errors were given to students to check the effectiveness of applying corpus-based approach through concordances in grade 10 classes.
Moreover, the corpora used in this study is the Brown corpus (found in Sketch Engine), which is a free online corpus. This corpus allows learners to access different types of lexicogrammatical usage information, including their usage patterns and distribution. Compared to other online corpus, it has easy access and multiple corpus examples. It will be used to provide students with example sentences to correct the errors in three given exercises by the researcher.
3.4. Procedure
3.4.1. Data Collection Procedure
In the beginning, students were introduced to the corpora and their functions and how they can access different types of lexicogrammatical usage using the following link: https://app.sketchengine.eu/. Students were trained over a period of eight weeks on some areas of grammatical use of articles, prepositions, and subject-verb agreement. Lexical search included the confusion in the use of possessive adjectives “its” instead of “it is”, “there” instead of “their”, “think” instead of “thing”, “sea” instead of “see”, collocation and word meanings, especially connotations.
As a writing practice, students were asked to write three distinct essays as an assignment during the first semester of the school year. The teacher provided guided feedback by underlying the lexicogrammatical errors in the essays. Then students were asked to correct the underlined errors using concordance examples. Learners consulted the corpora to correct the errors of each original essay and then highlighted the changes that they made by using the corpus.
After submitting their second draft on time, students wrote a specific report on the words they searched, the concordance they used and the words they explored. Finally, the teacher gave her feedback and evaluation on the students’ corrections and gave some suggestions to improve their work and apply rules to their work.
For the comparison of the number of errors in the two corpora, the teacher recorded the number of errors when students submitted their essays and then when they handed in their revised essays. For further analysis, the number of corrections students made using corpora was recorded, and the number of right corrections was also recorded. The total number of errors in the students’ essays was calculated by the teacher to know the changes in number.
3.4.2. Data Analysis Procedure
The scores from the three essays taken by the sampled students were examined descriptively and statistically to see if the experimental group performance improved. All research variables are produced with means, standard deviations, frequencies, and percentages by descriptive statistics. The data collected by this research will be analyzed by using the Statistical Package for the Social Sciences (SPSS) version. Results and their interpretations are presented in tabular form, referring to each part included in the pre-test and the post-test. The issue discussed in the literature review, the importance of corpus-based approach to error correction, is checked carefully in the tests’ result analysis, and the data extracted from the tests are related to the research questions.
3.4.3. Validity of the Instruments
Kimberlin and Winterstein (2008) state that the internal and external validity of a pretest-posttest experimental design must be examined to reduce measurement error.
Regarding content validity, the instruments employed assess knowledge of the content areas that they are supposed to measure. As a result, the instruments are designed to assess the level of development of the participating students’ lexicogrammatical skills formed after the experiment. More specifically, the content validity of pre and post-tests is crucial to ensure that the assessments accurately measure the intended construct or knowledge. Students’ essays were examined by three experts (teachers of English language) who have expertise in the content domain being assessed. The topic is thematic and the essay type matches with what they have been already taught in class.
3.4.4. Reliability of the Instruments
The pre-post-tests are piloted for the following reasons:
1) to assess change or progress that occurs in participants over the course of an intervention,
2) to establish the baseline knowledge of participants. This assists in understanding the beginning point and allows for comparison with the results following the intervention,
3) to evaluate an intervention’s effectiveness,
4) to determine learning gains, and
5) to identify specific areas of weakness or gaps in knowledge, allowing the researcher to tailor interventions or educational programs to address those specific needs effectively.
Therefore, pre-post-tests help evaluate changes, assess effectiveness, tailor interventions, and enhance the reliability of study results.
4. Research Findings
The study was carried out to investigate the use of corpus and its efficacy at a private school in Lebanon. The results showed a statistically significant difference in the experimental group ability to self-correct before and after using concordances.
To figure out how the number of errors changed in the process, the teacher had to measure the number of errors in each of the three essays before and after using corpora. After collecting the data, an analysis was made using Excel to measure these factors: the effectiveness of using corpus-assisted correction in writing, the total number of errors and accuracy rate in correction using corpora.
4.1. Effectiveness of Using Corpus in Error Correction
Based on the average of the number of error correction, the students’ overall performance with and without corpus use was compared (Table 1), the students performed less without corpus use, with 7.3378 mean score, which is low, and with 13.3514 mean score using corpus which shows higher performance than without corpus use.
Table 1. Results of the total number of errors of pre-test and post-test for experimental group.
Error Type |
N |
No Corpus Use |
With Corpus use |
Mean |
SD |
Mean |
SD |
Total |
25 |
7.3378 |
1.5552 |
13.3514 |
1.0089 |
Collocations (verb + noun, adjective + noun) |
25 |
2.3919 |
1.6122 |
4.9189 |
1.3870 |
Word Meanings |
25 |
0.5405 |
0.6602 |
1.5676 |
0.7743 |
Use of Articles |
25 |
0.7297 |
0.6813 |
1.4730 |
0.6118 |
Subject-Verb Agreement |
25 |
1.2162 |
1.0900 |
2.4324 |
1.0812 |
Use of Prepositions |
25 |
2.4595 |
1.2822 |
2.9595 |
1.2714 |
Lexical Confusion |
25 |
2.7956 |
1.1984 |
2.9675 |
1.1856 |
In terms of the effectiveness of corpus use in correcting each type of lexicogrammatical errors (in Table 1), it is shown that the corpus use is more effective than without corpus use in all types of errors. In collocations, the mean score without corpus is 2.3919, with a standard deviation of 1.6122. After the use of corpus, the mean increased to 4.9189 with a standard deviation of 1.3870. The increase in the mean score of other types of lexicogrammatical errors is also identified.
4.2. Error Correction Rate of Improvement in the Three Error Correction Tasks
As seen in Figure 3, the ability of participants to correct errors increased as more tasks were completed. The findings show that corpora can be valuable tools for error correction. This supports the studies that show learners as being capable of making the correct correction based on concordance evidence (Gaskell & Cobb, 2004; Gilmore, 2009).
Figure 3. Error correction score in the three error correction assignments.
It was discovered that corpora are useful in assisting learners to make more precise corrections after long practice. The experimental group may be unfamiliar with using corpora for error correction at first, and it may be difficult for them to infer the correct language patterns due to the long lists given by corpus examples. As noticed in the above graph, the accuracy rate at the beginning is (76%) less than in the second assignment, and after practicing more on corpus use, the experimental group got a higher accuracy rate (87%). While in the third assignment, the rate increased to (91%). The results assure how much using online corpora is useful in doing the right correction of errors.
4.3. Comparison of the Number of Errors in Writing during Concordance Intervention
Figure 4 shows a steady decrease in the number of lexicogrammatical errors in the three writing assignments. Focusing on lexicogrammatical errors, participants could pay more attention to errors, and after completing the writing task, they could spend more time testing. Students scored 68% on the first task before using corpus and 84% after using it. This shows that corpus is effective in decreasing the number of errors by 16%. However, in the second and third assignments, errors decreased more than in the first assignment. In the second assignment, students’ essays scored 68% before corpus while 90% errors after. In the third assignment, they scored 68% before, whereas their essay scores became, on average, 96%. This refers to the use of corpus examples as useful tools for reducing lexicogrammatical errors. Frankenberg-Garcia (2012) found that the best way to understand language is through definitions and examples that contain context clues rather than giving examples that show collocation and colligation only.
![]()
Figure 4. Comparison in total number of errors in experimental pre-test vs post-test group across the three error correction tasks.
The time the participants spent reviewing corpus data related to their own errors provided them with a clearer understanding of these errors and a better grasp of proper word or phrase use. They tried to avoid making the same mistakes in the tasks that followed.
4.3.1. Validity Test
To provide support to the validity of research, all means of error correction performance show an increase with a narrow spread of the data around the mean as the standard deviation is small. Moreover, the total number of errors shows a decline and the correction rate increases (see Table 1, Figure 1 and Figure 2). This means that the standard deviation (SD) reflects most individual scores, and therefore, the mean also represents most individual scores.
4.3.2. Null Hypothesis Test
Ferreira and Patino (2015) define the P-value as “the probability of observing the given value of the test statistic, or greater, under the null hypothesis” (p. 485). The null hypothesis in this case is H0 = corpora, which is not effective in reducing the number of lexicogrammatical errors and the alternative hypothesis is Ha = corpora, which is effective in reducing the number of lexicogrammatical errors. Based on the statistical results of testing H1, H2, and H3, the p-value for the regression analysis performed was < 0.001, indicating a significant result for the independent variables on the dependent variables. This means that there is a 1% probability of observing results that are different from the mean. Therefore, we reject the Null Hypothesis H0 and accept the alternative hypothesis Ha, considering that there is a statistically significant difference between H0 and the Hypothesis (H1, H2, and H3). The null hypothesis test supports the hypotheses studied in this research.
The results of this research encourage teachers and students to use corpora to correct lexicogrammatical errors in writing. Therefore, the general results of this study have reflected this need, and that was supported by the comparison done.
5. Discussion
This experimental study aimed to evaluate the efficacy of the corpus-based approach in enhancing error correction among grade 10 students in Beirut, Lebanon. The researcher administered pre- and post-assessments to an experimental group trained in the corpus data-driven learning strategy. Moreover, students were given three writing assignments to check the progress in the period of using concordances. Quantitative analysis revealed that the experimental group exhibited statistically significant pre-post test score gains after training.
Table 1 shows a significant decrease in the overall number of errors following the implementation of corpus utilization. In the pre-test (without corpus usage), the mean score for total errors was 7.3378 (SD = 1.5552); in the post-test (with corpus use), it increased to 13.3514 (SD = 1.0089), indicating a significant decrease in the frequency of errors in several categories. Particularly, there was a noticeable improvement in collocational errors, such as verb-noun and adjective-noun pairings; the mean went from 2.3919 to 4.9189, showing that following exposure to the corpus, collocations were used more accurately. Subject-verb agreement and the usage of articles both demonstrated notable improvements, indicating that corpus-based learning improves grammatical accuracy in these domains.
The improvement in error correction percentages across the tasks after a concordance intervention is shown in Figure 1. The ability of the participants to rectify errors over time appears to have been positively impacted by the concordance-based intervention, as seen by the evident upward trend. With an error correction score of 76 out of 100, Task 1 exhibits the least progress, demonstrating early difficulties using the concordance tool. However, by Task 2, the participants’ error correction score increased to 87/100, indicating a significant improvement in their error correction skills. With a score of 91/100, Task 3 demonstrated the best performance, suggesting that by this time, participants had probably been more proficient at utilizing the concordance tool to find and fix errors. This progressive enhancement implies that the intervention will eventually become more effective as learners gain more practice and familiarity with the tool, highlighting its potential as a valuable resource in language learning and error correction strategies.
In Figure 2, Task 1 performance decreased even with the use of the corpus, as evidenced by the post-test’s 84 errors, which are greater than the pre-test’s 68 errors. Similar to this, in assignment 2, the pre-test had 68 errors, whereas the post-test had 90 errors. With 96 errors in the post-test and a steady 68 errors in the pre-test, Task 3 exhibits the most discrepancy. This shows that students first had difficulty using the concordance lines.
6. Conclusion
This study examined the effect of using online corpora as tools to treat lexicogrammatical errors in grade 10 students’ essays. Using them enabled students to correct their errors in writing by referring to concordancers’ examples and to deduce the grammar rule related to each error type. Students were able to realize and avoid errors; therefore, their writing skill was enhanced. Moreover, the teachers’ guidance and training helped increase students’ awareness of these errors.
Through this study, online corpora have proved their effectiveness as valid teaching tools that increase students’ knowledge and performance. The results show a decrease in the number of errors after using corpus and an improvement in their use of lexicogrammar.
After using online corpora in the classroom, students became more motivated to discover their errors in writing and more confident about their ability to correct them. More studies should be conducted on the different usages of corpora for many other learning purposes. I recommend using corpora for teachers and learners because it facilitates the process of learning and helps in the development of students’ writing skills. Further studies can deal with whether corpora can help in improving the content of the learners’ writing. However, more learners and a larger sample from different regions should be conducted to give clearer confirmation of the findings.
Acknowledgements
This work was supported by a grant from Scientific Research. The author is grateful for their support.