Morfologi: Klitika

Sosiolinguistik: Ragam Bahasa

Analisis Percakapan

Linguistik Bandingan Historis: Pronomina Bahasa Inggris

Pragmatik: Mekanisme Giliran Bicara dan Budaya Komunitas Tutur

Sociolinguistics: language Situation

Pragmatics: MEKANISME TURN-TAKING

Corpus Linguistics: Diachronic Analysis on Sundanese Lexemes “WIFE”

DIACHRONIC PERSPECTIVE ON THE SUNDANESE LEXEMES ‘WIFE’:

A CORPUS-BASED ANALYSIS[*]

 

Susi Yuliawati

Program Sastra Inggris Fakultas Ilmu Budaya

Universitas Padjadjaran

susi.y@unpad.ac.id

 

Abstract

 

Looking at some Sundanese dictionaries, there are several lexemes WIFE such as BOJO, GARWA, ISTRI and PAMAJIKAN. Although all these words have the same referent, that is a married woman in relation to her spouse, and their lexical meanings are described in a quite similar way in the dictionaries, are there any differences in terms of frequency and contexts of usage? Based on the samples of the real language use in a Sundanese magazine (Manglè), the paper investigates the use of the Sundanese lexemes WIFE over the last forty-seven years (1966-2013). Using the method of corpus linguistics and diachronic perspective, the Sundanese lexemes WIFE are analysed based on the word frequency, collocation and semantic preference; and are compared within two different periods: the New Order (1966-1998) and the Reformation (1999-2013). The research is aimed to reveal the actual usage evidence of the Sundanese lexemes WIFE which probably change over time.

Keywords: collocation, corpus linguistics, semantic preference, Sundanese lexemes WIFE, word frequency.

 

  1. Introduction

 

Investigating meanings of words has always been fascinating those who have concerned with the nature of human language. This is perhaps not only because many ideas previously proposed are not completely satisfactory, but also because of the fact that word meanings tend to be dynamic. Therefore, different methods to study meanings of words seem to be continuously created. One of the methods is corpus linguistics which allows us to investigate word meanings from large collections of real language use. The method employs computers to store and process corpora of millions of running words and it relies both on quantitative and qualitative analysis (Biber in Baker, 2010). As a result, it provides a new look at the theoretical foundations of empirical analysis of meaning in language (Hanks, 2013).

In corpus linguistics, one of the most influential figures is John Sinclair. Two interrelated notions that are central to his corpus linguistic work are that language is all about creating meaning and that language has a tendency to be phraseology. According to Cheng (2012: 101-102), the first notion is most likely self-evidence, but the second one is based on the large evidences of real language use which show that meaning is not created by isolated words. Instead, it is created by the co-selection of words. Thus, phraseology refers to repeated patterns of associated words, and the tendency for words to be associated is a result of speakers or writers co-selecting words to create meaning.

In Sundanese language, one of the local languages in Indonesia used by most people in West Java, a married woman is denoted by several words such as bojo, garwa, istri, and pamajikan. Based on the existing description in three Sundanese dictionaries, which are a non corpus-based dictionary: Danadibrata (2009), Lembaga Basa dan Sastra Sunda (1969) and Satjadibrata (2005), those words are described in a quite similar way. All of the words refer to a married woman, except for the word istri which can also mean female, while the difference among those words is in terms of speech levels[†]. The words bojo, garwa and istri are used in high level frame (Lemes), while pamajikan is used in a low level frame (Kasar). Based on Sinclair’s notion that meaning creation is phrase-based and Hank’s statement that meaning of words is dynamic, the present study aims to investigate diachronically the meaning of the Sundanese words denoting wife – bojo, garwa, istri, and pamajikan –based on the samples of actual language use taken from Sundanese magazine, Manglé.

 

  1. Method

For the purpose of the study, a corpus is constructed from the samples of Manglé magazine issued from 1966 to 2013. The corpus is divided into two periods: the New Order corpus (1966-1998) containing 1.592.384 words which is built from 59 editions and the Reformation corpus (1999-2013) containing 891.641 words which is constructed from 29 editions. One to two editions are taken randomly from each year of publication to construct the corpus. Almost all of the editions are available in printed version. Therefore, they have to be converted to electronic version in plain text format (.txt) to be processed by corpus software, which is WordSmith Tools 6.0.

The research discusses three main points based on the following research questions:

  1. How often each of the word occurs in the corpus of New Order and Reformation period?
  2. What are the significant collocates of each word in the New Order and Reformation period?
  3. What are the semantic preference of each word in the New Order and Reformation period?

Utilising the Wordsmith Tools, the present writer uses two corpus procedures in the analysis: the frequency analysis (in which I obtain the frequency of every word in the corpus of each period) and the collocation analysis (in which I examine the statistically significant co-occuring words for each of the word in both periods).

In relation to the analysis, two important theoretical frameworks in corpus lingustics are used. Those are collocation and semantic preference. According to McEnery and Hardie (2012: 123) collocation refers to “…a co-occurrence pattern that exists between two items that frequently occur in proximity to one another – but not necessarily adjacently or, indeed, in any fixed order.” In this sense, collocation is a good guide to meaning because the co-occurrence of words generates context-specific meaning. Cheng (2012: 77) states that if words collocate, it means that they are co-selected by speakers or writers instead of chance. Based on Sinclair’s basic methodological approach (1991), when looking for collocates a node word (word-form or lemma being investigated) and its collocates (a word-form or lemma) which co-occur with a node in a corpus are to be found within a span of five words to the left and right. Furthermore, the strength of collocation can be demonstated by a statistical test, such as MI, t-score, z-score, log-likelihood, etc.

Another key concept in corpus linguistics used in this research is semantic preference. The semantic preference, which is built upon collocation analysis, is the relation between a lemma and a set of semantically related words (Stubbs, 2002: 65). For example, in the British National Corpus, the word rising frequently co-occurs with words for “work and money”, such as incomes, prices, wages, earnings, unemployment, etc. (Baker et al, 2006). Therefore, semantic preference is related to the concept of collocation that focuses on a lexical set of semantic categories. Furthermore, the basis to categorize the semantic preference which group together word senses that are related by virtue of their being connected at some level of generality with the same mental concept is a semantic analysis named USAS (the Ucrel Semantic Analysis System). The semantic category in the USAS consists of 21 major discourse fields (Archer et al., 2002). Those can be seen in the following table:

A general and abstract terms B the body and the individual C arts and crafts E emotion
F food and farming G government and public H architecture, housing and the home I money and commerce in industry
K entertainment, sports and games L life and living things M movement, location, travel and transport N numbers and measurement
O substances, materials, objects and equipment P education Q language and communication S social actions, states and processes
T Time W world and environment X psychological actions, states and processes Y science and technology
Znames and grammar      

 

Tabel 1. 21 major discourse fields in the USAS

 

  1. Results and Discussion
    • Frequencies of Sundanese Lexemes WIFE

 

Based on the occurrences of each word in each period, it can be seen that both in the period of New Order and Reformation pamajikan is the most frequent word, while garwa is the least frequent word. Subsequently, istri and bojo are in the second and the third rank. As illustrated in figure 1., there is a clear indication that in terms of frequency there has been a steadiness in usage of all words denoting wife in Manglé magazine from the New Order to Reformation period. It also means that in Manglé magazine published from 1966 to 2013 the referent of married women tend to be denoted using the word pamajikan rather than istri, bojo or garwa. According to some dictionaries explained above, pamajikan is a word denoting wife which is used in low level frame. In relation to this, it can be concluded that married women in the Manglé magazine are talked about in a general conversation speech or colloquial way, as a referent of lower status than speakers/hearers, by higher status speakers, or among intimate participants (Anderson, 2009).

 

Word Word/Million
The New Order The Reformation
pamajikan 522 346
istri 262 199
bojo 69 53
garwa 40 42

 

Figure 2. The frequencies of Sundanese lexemes WIFE in the period of New Order and Reformation

 

If the frequencies for each word denoting wife are compared between the period of New Order and Reformation, however, they demonstrate differences. The frequencies of occurrences of pamajikan and istri have declined over the last fourty seven years. As it can be seen in figure 2., from the New Order period to the Reformation period, the word of pamajikan has decreased 0.02%, while the word of istri has decreased 0.01%. Nonetheless, the occurrences of bojo and garwa remain to be constant. Overall, it may indicate that the magazine has less talked about married women in the Reformation period than it did in the New Order period.

 

Figure 3. The percentage of the occurrences of Sundanese lexemes WIFE in the period of New Order and Reformation

  • Significant Collocates of Sundanese Lexemes WIFE

 

To measure the strength of collocation, a statistical test is needed. In this research, the present writer uses MI in combination with frequency. Referring to Hardy and Colombini (2011), Davies (2008) and Chruch and Hanks (1990), the threshold to determine significant collocates within the span of five words to the left and to the right is 3.00 or higher for MI score and collocation with five or more occurrences.

Based on the statistical test, it can be identified that pamajikan and istri have more significant collocates in the New Order period than they have in the Reformation period. However, bojo and istri only have a few significant collocates in both of the periods. Then, if the number of significant collocates for each word in both of the periods are compared, it shows that the word pamajikan has the most significant collocates and the word garwa has the least significant collocates of all (see Table 3.).

 

Table 4. The significant collocates of Sundanese lexemes WIFE in the period of New Order and Reformation (MI ≥ 3.0; freq. ≥ 5)

Word Collocate
New Order Reformation
pamajikan kuring, anak, boga, salaki, cek, ceuk, imah, hayang, kumaha, dua, lamun, batur, balik, geulis, ceurik, ulah, terus, hate, kamar, nyaho,   milu, ninggalkeun, budak, kacida, sanggeus, indit, nyampeurkeun, sarua, maneh, peuting, sieun, lalaki, cul, hariweusweus, suhunan, saha, siga, silaing, indung, tara, dapur, gigireun, beda, nyarita, awewe antara, norojol, adi beuteung, kahayang, kawas, nanya, nempo, mitoha, solat, nyanghareupan, ngomong, sugan, ngajuru, ngaleos, gawe, inget, ras, komo, indungna, kakara, jol, kasampak, maranehna, hirup, pedah, lembur, imahna, bangun, bi, beurang, tepi kuring, anak, boga, salaki, ceuk, indung, kudu, kahayang, manehna, bakal, kang, pedah, peuting, awewe, cenah, mitoha
istri pameget, juragan, binangkit, hiji, alo, ceuk, uwa, putra, kagungan, kudu, persatuan, Pasundan, kedah, prajurit, sepuh, kanggo, tos, Sunda, gaduh, dua, tiasa, jenengan, saderek, ua, dalem, lalaki, sasauran, putrana, dapur, emang, kang, kirang, geulis, anak, abdi, wanita, upami juragan, uyut, binangkit, pameget, para, putra, putri, geulis, hiji, ibu, kagungan, kang, saya
bojo pun, anak, abdi, kuring pun, kuring, ceuk
garwa putra, saderek pangeran, bupati

 

By examining the top ten significant collocates of each word in each period, I am going to identify whether it reflects the norm of speech levels as described in the Sundanese dictionaries above. There are some similar significant collocates for lexemes WIFE that are interested to discuss in relation to this. In the New Order periode (see Table 4.), one of the significant collocates for pamajikan is a low level pronoun that is KURING ‘I/my/me’, yet the collocate also co-occurs with bojo. However, the word bojo does not only co-occur with the pronoun KURING, but it also has a strong collocation with a high level pronoun that is ABDI ‘I/my/me’. In addition, bojo co-occurs with the noun ANAK ‘child’, which is also the collocate of pamajikan. Nevertheless, the word denoting child which co-occurs with istri and garwa is not the word ANAK ‘cbild’, but it is PUTRA ‘son’ (a high level word). Another evidence relating to this can be seen in the word pamajikan which co-occurs with BOGA ‘have’, a word of a low level frame, and istri which co-occurs with KAGUNGAN ‘have’, a word of high level frame. Based on this analysis, it can be concluded that the word bojo co-occurs with words used in both low level (KURING and ANAK) and high level frames (ABDI), while the word pamajikan constantly co-occurs with words of low level frame (KURING, ANAK and BOGA) and so do the words istri and garwa which constantly co-occur with words of high level frame (PUTRA and KAGUNGAN).

 

Table 5. The top ten significant collocates of Sundanese lexemes WIFE in the New Order period

NO PAMAJIKAN ISTRI BOJO GARWA
1 KURING ‘I/my/me’ PAMEGET ’male’ PUN ’my’ PUTRA ‘son’
2 ANAK ‘child’ JURAGAN ‘lady/lord’ ANAK ‘child’ SADEREK ‘you/your’
3 BOGA ’have’ BINANGKIT ‘smart’ ABDI ‘I/my/me’
4 SALAKI ‘husband’ HIJI ‘one’ KURING ‘I/my/me
5 CEK ‘said’ ALO ‘nephew/niece’
6 CEUK ‘said CEUK ‘said’
7 IMAH ‘house’ UWA ‘aunt’
8 HAYANG ‘want PUTRA ‘son’
9 KUMAHA ‘how’ ‘depend KAGUNGAN ‘have’
10 DUA ‘two’ KUDU ‘must’

 

If we compare this evidence with the existing description in the Sundanese dictionaries, it can be concluded that the pattern of the word usage for lexemes WIFE is similar to what is described in the dictionaries – the words bojo, garwa and istri are used in high level frame (Lemes), while pamajikan is used in a low level frame (Kasar). However, there is a slight difference in the usage of the word bojo. According to the dictionaries the word is used only in a high level frame (Lemes). Based on the samples taken from Manglé magazine, however, the word is not only used in a high level frame, but also it tends to be used in a low level frame (kasar) because the word co-occurs with low level words, such as KURING and ANAK. Moreover, in the Reformation period (see Table 5.) the word bojo co-occurs only with the low level pronoun, which is KURING.

            Other interesting significant collocates to discuss are SALAKI ‘husband’ that co-occurs with pamajikan, and PAMEGET ‘male’ that co-occurs with istri. According to the dictionaries , the word pamajikan only refers to married women, while istri does not only refer to wife but also to female. The evidence of co-occurrences, which shows that pamajikan indeed co-occurs with word denoting husband (SALAKI), while istri co-occurs with the word denoting male (PAMEGET), is in accordance with what is described in the dictionaries. This can be seen both in the New Order and Reformation period (see Table 6). The last evidence that can be examined is the relation of the word usage and men’s social status. In the period of Reformation, only the word garwa which co-occurs with words reflecting the high social status of men: PANGERAN ‘lord’ and ‘BUPATI’ ‘regent’. It indicates that the word garwa is used to denote wife of high status man.

Table 6. The top ten significant collocates of Sundanese lexemes WIFE in the Reformation Period

NO PAMAJIKAN ISTRI BOJO GARWA
1 KURING ’I/my/me’ JURAGAN ‘lady/lord’ PUN ‘my’ PANGERAN ‘lord
2 ANAK ‘child’ UYUT ‘grandparent’ KURING ‘I/my/me’ BUPATI ‘regent’
3 BOGA ‘have’ BINANGKIT ‘smart’ CEUK ‘said’
4 SALAKI ‘husband’ PAMEGET ‘male’
5 CEUK ‘said’ PARA ‘many’
6 URANG ‘I/my/we/our’ PUTRA ‘son’
7 INDUNG ‘mother’ PUTRI ‘daughter’
8 KUDU ‘must’ GEULIS ‘beautiful’
9 KAHAYANG ‘desire’ HIJI ‘one’
10 MANEHNA ‘his/hers IBU ‘Madam’

 

  • Semantic Preferences of Sundanese Lexemes WIFE

 

Based on the collocation analysis, the semantic preference for each word denoting wife in both of the periods can be demonstrated through the semantic field of its collocates. In this research, the semantic field is analyzed by the sense relations among the significant collocates and grouped based on the categories of discourse fields in the USAS. First, it can be seen from Table 7. that the number of discourse fields of the word pamajikan has declined. In the New Order period it has fourteen semantic categories, while in the Reformation period it only has six semantic categories. In the period of New Order, pamajikan tends to co-occur with words relating to general and abstract terms; the body and the individual; emotion; architechture, housing and the home; money and commerce in industry; life and living things; movement, location, travel and transportation; number and measurement; tubstances, materials, objects and equipment; language and communication; social actions, states and processes; time; psychological actions, states and processes; and names and grammar. However, in the Reformation period, the word only tends to co-occur with words relating to general and abstract terms; language and communication; social actions, states and processes; time; psychological actions, states and processes; and names and grammar.

Table 7. The Semantic Categories of PAMAJIKAN in the Period of New Order and Reformation

 

Discourse Field New Order Reformation
General and abstract terms boga, kumaha, kacida, sarua, siga, beda, kawas, sugan, bangun boga
The body and the individual ngajuru
Emotion ceurik, sieun
Architecture, housing and the home imah, imahna, suhunan, kamar, dapur
Money and commerce in industry gawe
Life and living things hirup
Movement, location, travel and transportation balik, nyampeurkeun, norojol, jol, kasampak, tepi, ninggalkeun, indit, cul, ngaleos, milu, lembur, nyanghareupan
Number and measurement dua
Substances, materials, objects and equipment geulis
Language and communication ceuk, cek, hariweusweus, nyarita, nanya, ngomong ceuk, cenah
Social actions, states and processes salaki, lalaki, awewe, anak, budak, indung, indungna, mitoha, adi beuteung, bi, ulah, solat salaki, anak, awewe, indung, mitoha, kudu, kang
Time peuting, beurang, terus, tara, kakara bakal, peuting
Psychological actions, states and processes hayang, kahayang, lamun, sugan, hate, nyaho, nempo, inget, ras kahayang
Names and grammar kuring, maneh, maraneh, silaing, batur, kumaha, tepi, sanggeus, komo, barang, saha, pedah kuring, manehna, teu, pedah

 

Second, As illustrated in Table 8., the word istri also has more semantic categories in the New Order period rather than in the Reformation period. In the first period, the word frequently co-occurs with words relating general and abstract terms; architecture, housing and the home; money and commerce in industry; movement, location, travel and transportation; number and measurement; substances, materials, objects and equipments; language and communication; social actions, states and processes; psychological actions, states and processes; and names and grammar. In the second period, istri tends to co-occur with words relating to general and abstract terms; number and measurement; substances, materials, objects and equipments; social actions, states and processes; psychological actions, states and processes; and names and grammar.

 

Table 8. The Semantic Categories of ISTRI in the Period of New Order and Reformation

 

Discourse Field New Order Reformation
General and abstract terms kagungan, gaduh, tiasa, kirang kagungan
Architecture, housing and the home dapur
Money and commerce in industry prajurit
Movement, location, travel and transportation Pasundan, Sunda
Number and measurement hiji, dua hiji, para
Substances, materials, objects and equipment geulis geulis
Language and communication ceuk, sasauran  
Social actions, states and processes juragan, pameget, putra, putrana, alo, uwa, emang, saderek, ua, lalaki, kang, anak, sepuh, wanita, dalem, kudu, persatuan, kedah juragan, pameget, putra, putri, uyut, ibu, kang
Psychological actions, states and processes binangkit, upami binangkit
Names and grammar saderek, jenengan, abdi saya

 

Third, the semantic preference for the word bojo in the New Order period and the Reformation period, as it can be seen in Table 9, is quite similar. In both periods, it has only two semantic categories. In both of the periods the word co-occurs with words relating to names and grammar. However, in the New Order period the word bojo co-occurs with a word relating to social actions, states and processes, while in the Reformation period bojo co-occurs with a word relating to language and communication.

 

Table 9. The Semantic Categories of BOJO in the Period of New Order and Reformation

 

Discourse Field New Order Reformation
Language and communication ceuk
Social actions, states and processes anak
Names and Grammar pun, abdi, kuring pun, kuring

 

Lastly, the number of semantic categories of the semantic preference for garwa is the same with those for bojo. Both of the words only have two semantic categories in both of the periods and the difference is in the matter of the types of categories. The semantic preferences of the word garwa in the New Order period are grouped in two semantic categories; those are names and grammar and social action, states and process. However, in the Reformation period, the semantic categories of the semantic preference are money and commerce in industry and social action, states and process.

 

Table 10. The Semantic Categories of GARWA in the Period of New Order and Reformation

 

Discourse Fields New Order Reformation
Money and commerce in industry Bupati
Social action, states and processes putra pangeran
Names and grammar saderek

 

 

  1. Conclusion

 

Based the analysis, it can be concluded that examining the frequencies of the occurrences of the Sundanese lexemes WIFE (pamajikan, istri, bojo and garwa), the words are used diachronically in a quite different way. It can be seen from the evidence that in both of the periods, the rank of the frequencies of the occurrences of all lexemes denoting wife is the same: the most frequent word is pamajikan and the least frequent one is garwa, while the words istri and bojo subsequently are in the second and the third rank. If we relate this with the speech levels, there is a strong tendency that married women in the Manglé magazine, which is most frequently denoted by pamajikan, are talked about in a general conversation speech or colloquial way, as a referent of lower status than speakers/hearers, by higher status speakers, or among intimate participants. In spite of that, there is also a tendency that diachronically the usages of pamajikan and istri are decreasing, while the usages of bojo and garwa are quite unchanging. Overall, it may indicate that the magazine has less talked about married women in the Reformation period than it did in the New Order period.

Then, in relation to significant collocates for each of the word, the analysis demonstrates that there are more significant collocates for the first two words (pamajikan and istri) in the New Order period than in the Reformation period, yet the number of signifcant collocates in both of the periods for the last two words (bojo and garwa) are almost the same. Some of the significant collocates in both of the periods also reflect that each word is used in different speech levels, quite similar to the existing description in the dictionaries. For example, the word pamajikan is used in low level frame (kasar), while istri, bojo, and garwa are used in high level frame (lemes).

The last point that can be concluded is based on the sematic preference analysis. The interesting point to highlight is the indication that married women in the magazine are never talked about in relation to arts and crafts; food and farming; government and public; entertainment, sports and games; education; world and environment; and science and technology in both of the New Order perior and the Reformation period.

 

Bibliography

Anderson, Edmund A. 2009. Speech Levels: the Case of Sundanese in Pragmatics 3:2. 107-136.

International Pragmatics Association.

Archer et al. 2002. Introduction to the USAS Category System.

Baker, P. 2010. Will Ms ever be as frequent as Mr? A corpus-based comparison of gendered terms across four diacronic corpora of British English. Equinox Publishing.

Baker, P., Hardie, A. & McEnery, T. 2006. A Glossary of Corpus Linguistics. Edinburg: Edinburg University Press.

Cheng, Winnie. 2012. Exploring Corpus Linguistics. Language in Action. London & New York: Routledge.

Church, K. W. & Hanks, P. 1990. “Word association norms, Mutual Information, and lexicography”. Computational Linguistics, 16 (1), 22–29.

 

Danadibrata, R. 2009. Kamus Basa Sunda. Bandung: Kiblat.

 

Davies, M. 2008-: online. The Corpus of Contemporary American English (COCA): 425+ million words, 1990-present. Available at http://www.americancorpus.org.

 

Hardy, E. Donald & Colombini, C.B., 2011. A genre, collocational and constructional analysis of RISK in International Journal of Corpus Linguistics 16:4 (2011), 462–485. John Benjamins Publishing Company

 

Hanks, Partrick. 2013. Lexical Analysis: Norms and Exploitations. The MIT Press.

Lembaga Basa dan Sastra Sunda (LBSS). 1969. Kamus Umum Basa Sunda

McEnery, T. & Hardie, A. 2012. Corpus Linguistics. Cambridge.

Satjadibrata, R. 2005. Kamus Basa Sunda. Bandung: Kiblat

Sinclair, J. 1987. Looking Up: An Account of the COBUILD Project in Lexical Computing. London: Collins.

Sinclair, J. 2003. Reading Concordances. London: Pearson Education Limited.

Sinclair, J. 2004. Trust the Text. London & New York: Routledge

Stubbs, M. 2002. Words and Phrases: Corpus Studies of Lexical Semantics. Blackwell Publishing.

 

 

 

 

 

[*] Presented in Seminar Internasional Hari Kelahiran Bahasa Indonesia 2 Mei 1926 “Penelitian di Bidang Lekskologi, Leksikografi, Peristilahan, Etimologi, dan Toponimi” held by Laboratorium Leksikologi dan Leksikografi Fakultas Ilmu Budaya, Universitas Indonesia.

[†] Specch level in Sundanese is one aspect of politeness, ranging from Lemes ‘refined’ (in accordance with customary law) to Kasar ‘lacking refinement. It is social relationship of interlocutors that influences the selection of speech level (Anderson, 2009).

Hello world!

Selamat datang di Blog Universitas Padjadjaran. Ini adalah artikel pertama Anda.