What is Corpus Linguistics?

Article Details
  • Written By: Marlene de Wilde
  • Edited By: Nancy Fann-Im
  • Last Modified Date: 13 December 2019
  • Copyright Protected:
    Conjecture Corporation
  • Print this Article
Free Widgets for your Site/Blog
Most people who believe they've had an encounter with a higher power report lasting psychological benefits.  more...

December 15 ,  1791 :  The US Bill of Rights was ratified.  more...

Corpus linguistics the study of language using real-life examples. It is not a branch of linguistics but a methodology or approach. Corpus, the Latin word for "body," refers to the body of natural texts, and the approach involves discovering patterns of language use through analysis of the corpus. Corpus linguistics is experiencing a comeback, as computer programs have revolutionized the approach.

Parental diaries of a child's speech as he first acquires language is a simple example of a corpus that can then be studied to learn language patterns. Foreign language teaching in the first half of the 20th century often used corpora of the target language to compile vocabulary lists for students. The eminent linguist Noam Chomsky did not consider the use of corpora a valid tool, as he believed that language competency was more important than performance data. Early corpus linguistics was largely based on the assumption that there are a limited number of sentences in a natural language and that those sentences can be collected and evaluated.


After falling out of favor in the '60s and '70s, corpus linguistics is experiencing a revival due to the methodological use of the computer. The concordance program is the name of the software most commonly used by linguists. While searching patterns in a corpus of millions of words would take too much time for a human being and the results would be less than accurate, a computer can search and retrieve information in mere seconds. It can calculate frequency, sort data and exploit corpora in ways that were impossible in the past.

Corpus-based analysis can look into how register affects language; patterns of language use, such as how males and females make different use of tag questions; the extent to which language patterns are used; and the factors that affect the variability of language use. Teaching can benefit from corpus linguistics in the design of the syllabus, the development of the materials used, and the type of activities used in the classroom. Students could benefit from the approach by being able to determine more clearly the different uses and meanings of common words, the differences inherent in written and spoken language, and phrases and collocations they could make use of. The body of data that is the corpus is constantly updated and is the product of real-life social interactions. Thus, the corpora are naturalistic data that can be easily accessed, and the findings can be generalized.


You might also Like


Discuss this Article

Post 3

@croydon - I'd be more worried about what would happen if people decide to deliberately manipulate their child's language development as an experiment.

There was a famous experiment where a researcher wanted to know if children learn to laugh while they are being tickled, because their parents laugh while doing it, so he decided to tickle his children without laughing and see if they would still learn.

That's relatively mild and, in theory, wouldn't have lasting effects on the child's development. But if you start messing with language development, that's another story.

Post 2

@pleonasm - There have already been attempts at this, including some where a researcher has attempted to completely record their child's language development. The problem, as I see it, is that there is just too much information. I don't know how you would even go about processing it.

They wouldn't just be interested in what words a baby learns first and when, but also how they learn it, which means recording everything that is said to the baby, or within its hearing as well.

And gestures would also have to be recorded, and voice tone and inflection and so forth. If a word is spoken by a family member does it have more weight than if it is spoken by a stranger? Do accents make a difference? These are questions that would add multiple dimensions to an already vast amount of data.

I don't think we're that close to being able to deal with that level of information yet.

Post 1

It'll be interesting to see what we learn from this kind of data as our methods for collecting and processing it become more and more advanced. With so many technological devices in homes these days, I could see a time when people just routinely record most of their child's early life and linguistic professors would be able to use that information to chart language learning patterns.

Post your comments

Post Anonymously


forgot password?