The Technological Monk: linguistics

Saturday, October 3, 2009

Of languages, computer and human

I noticed an interesting difference between humans and computers the other day. I was conversing with a fellow linguist about language and technology and I sort of stumbled upon it.

The progress of language is the same as the progress of computers, but their sources are reversed.

In the days of the early computers, we had vacuum tubes and punch cards for data storage. In order to work, one used a terminal (one of many) connected to an always-running computer. Most everything was done in RAM. Then along came something wonderful: magnetic storage. Now you could much more efficiently store data to be accessed and modified later. As with all things, however, there a few obvious problems. One was capacity and the other was price. Price went down over time, and personal computers became more and more common.

Capacity was another issue. In order to save as much space as possible (i.e. maximize available capacity) and to keep the price down, interfaces didn't progress all that much for a while. Text-based command-line input did wonders, but it proved to be hard for many people to adapt to. There was a significant learning curve. However, the efficiency of these systems was such that they still exist and are in use today. And, the more that commands were predictable and formulaic, the easier things got.

Then, along came a concept called GUI - Graphical User Interface. As storage capacity increased and price went down further, it was much more feasible to run an interface that was easier to use at the expense of it being not as direct or efficient. The reason this trade-off really was important was because it allowed access to not just those familiar with computers, but to those who had never even touched one before. It allowed access to "outsiders," those who weren't a member of the computer-based community. The learning curve dropped. Progress, however, was even more dependent on the increase of storage capacity.

In a related note, as computers become dated, those who are more "tech-savvy" often return to more efficient operating systems to run on their older hardware. Linux is a favorite for many. The reason for this is that you can run new software on these older machines and still have them be usable by decreasing bloat. I, myself, turned an older computer into a server devoid of any GUI. Without the "bloat" of a GUI, it still remains very useful and usable for many things. And, I won't be as affected by software deprecation as I would be if I had left an old operating system running that wasn't updated anymore.

You can see how the progress of personal computers was based on simplification of the user interface, so that others could use them more efficiently, even if they weren't great with the technology. Accessibility came at the cost of dependence on storage. As storage became available and higher and higher capacities, process efficiency became less important for the average user (think today's average user, not 1980's average user.)

The progress of language also works towards increasing accessibility.

Long ago, language was a very difficult thing to grasp. This seems counter-intuitive because language is so fundamental, but it's easier to see when you look towards the number of people who were bi-, tri-, or multi-lingual. Much lower than now. This is because of many reasons, such as the fact that globalism wasn't as high as it is now. This can be easily seen by the biases seen in the Western world and that of early Sanskritic society in India. Greeks took pride in their language, so much so that they deemed anyone who could not speak it to be uncivilized. This definitely carries through time by the concept of "The White Man's Burden." Of course, that particular example is not based solely on language, but I've always found that language and culture tie together so intimately that they almost certainly go together. This is especially true when analyzing one's cultural identity.

When we look to the east, however, we find that multi-lingualism increases. This is in no small way based on the silk routes from the Middle East, through India, to China. The advantage of being multi-lingual is multi-faceted, especially regarding the business world. It was also important because the Middle East, India, and China all retained many subcultures, each with their own language. India today has over 20 official languages, not to mention the many "dialects" of China (most Westerners only know of Cantonese and Mandarin, but there are many more).

Back to the point, another important reason for the reason that multi-lingualism was difficult was that languages had different characteristics than seen those of today. The largest spoken language family is that of the Indo-European branch. These languages were originally highly inflected and word order mattered less. This means that words had many, many different forms based on their use in a sentence. Verbs had many more conjugations than we often see today, and their associated nominal usages also required lots of rearranging. Latin, Greek, and Sanskrit are primary examples here. Lots of word forms and usages. The benefit to this was that it was easier (in many ways) to convey meaning. On the whole, though definitely not always by any means, one could convey more precise information by using fewer words. This was because word endings conveyed the meanings better.

[Important and notable exceptions to this rule are languages in the Sino-Tibetan family, such as the Chinese languages, and other such languages that were not directly or immediately related to or in contact with Indo-European languages.]

Now, growing up and speaking these languages was one thing, and learning them was quite another. Unless you grew up in a place that spoke more than one language, you wouldn't necessarily learn the other language until you were an adult, and learning languages becomes significantly harder after your teens. As a result, speaking more than one language fluently was rarer (moreso in the West, as I stated before) than it is today.

Went you look at an inventory of words of these languages, you may notice that each verb root has many, many different variations, which may or may not be formulaic. You could consider these highly inflected languages to be more "storage" based than Chinese languages or today's languages. As time went on, languages diversified, but started being focused less on nominal cases, simplifying verb tenses and conjugation-groups, and started focusing more on word-order. As a result, you learned more differentiated words and fewer eccentric morphological endings. Now, if you learned fewer words on the whole, you could still convey meaning, albeit with more words in each sentence. Less efficient but much easier to learn for those who weren't so great with languages. Multi-lingualism just got a whole lot easier. Ignore my last statement, because this is slow, steady progress over years and years, but you can understand how and why things changed.

Orthography is important, but in my opinion, didn't matter as much to the average person until the Arabic empire rose to its height. This was the era of copying and preserving, leading up to the invention of the printing press by Gutenburg. Prior to that, writing was important, but not so integral to the learning of language, especially if that language was a second, third, or fourth one.

Another benefit to more formulaic language is that it frees up time to think more abstractly. Language becomes less of a pure inventory, so we're free to remember more. It also requires less attention because we can always add more words to alleviate ambiguity later. This was always true, but is much more apparent now. At least, so I've noticed. Anyway, this way, we're free to multitask better.

So, here we can see a definite change towards accessibility at the (relatively slight) cost of efficiency. However, you may, as I have, noticed a few important differences between this and the progress of computers.

Computers moved from always-on, process-based centralized systems towards individual computers that were more easily accessible because of the presence and development of ever-increasing storage. Language moved away from pure "storage" towards more formulaic usage. It became more "process-friendly" in a way, and this is definitely true when we do consider the entrance of orthography to the mix. Knowledge can be stored and accessed later, but the process of learning (how to read especially) becomes elevated.

The thing about the older languages is that in their earliest forms (Mycenaean Greek, Vedic Sanskrit, and Old Latin), many of these inflections weren't so standardized. There were more exceptions to the rule, and these tended to decrease as time went on, much like the command-line based systems whose predictability eventually became nigh-universal. Another similarity is the context of these. As Latin, Greek, and Sanskrit became liturgical languages, they were standardized much as linux commands were alongside the rising use of Windows and Mac OS. Latin and Greek became used specifically for scientific naming, as well, as Unix and Linux are arguable defaults for high-end servers.

As technology tries to break the limitation of today's magnetic storage abilities, we should take some time and think about our language. In light of today's post, I challenge you to take the time to read, write, and speak more efficiently, at the cost of speed and time. I guarantee that if you do this for a while, you will gain something from the simple act of moving a little more slowly, along the lines of "Ungeek to Live."

Monday, April 27, 2009

Linguistic similarities in Arabic and Sanskrit vowels

I don't know if this belongs here, or in A Modern Hindu's Perspective, but since it's not directly religious in any way, and provides very interesting notes linguistically, and since linguistics has a major impact on modern technology (by way of voice recognition, sound analysis, and linguistic interpretation by machines), I figured it wouldn't hurt to throw it this way.

NOTE: Here, I use my slightly altered version of ITRANS for sanskrit (saMskRta) transliteration. I use the Buckwalter transliteration for arabic (Eraby). For the purposes of this post, my alterations to ITRANS are negligible. Also, I will describe the appearance of the ta$kyl so that people who are familiar with arabic phonetics and logography can follow along without worrying about the transcription. Also, while I've taken linguistic classes and studied both languages with native/polished speakers (one more than the other), I am in no way a linguist. Thus, I do my best to be as accurate as possible and give helpful links, but feel free to study both languages and compare yourself.

Also, if you have no idea what I just said, don't panic! Just read on and you can ignore that jazz.

Something interesting I noticed today was how arabic's vowels and sanskrit's vowels are similar. In arabic, you essentially have three tiers of vowels, dictated by length (traditionally, in terms of beats).

You have the long vowels (in order of strength): yA (/a/), waw (/w/), Alif (/A/). These are pronounced as "ee" in 'beet,' "oo" as in 'boom,' and "a" as in 'hat,' in standard American English. They are held for two beats.

You have the short vowels (same order as long vowels), the kasra, Dam~a, and fatHa. There is one phonetic difference here: the kasra is often pronounced like "i" as in 'bit,' but they do still correspond to the longer vowels. These short vowels are held for one beat.

Then, you have the hamzap (hamza). It represents the glottal stop, which most English speakers will recognize as the hyphen in 'uh-oh.' It's that short abruptness you cause when you close your throat. The hamzap in arabic has vowel quality associated with it. In writing, when this appears at the beginning of word, it is represented as an Alif with the hamzap under it (for the yA equivalent), an Alif with the hamzap and Dam~a over it (for the waw equivalent), and an Alif with the hamzap over it alone (for the Alif equivalent).

For English speakers, think of that teenage apathetic "I'm not interested" sounding "eh," except with the aforementioned vowel sounds and shorter.

These hamzap representations are held for a half of a beat in duration.

Now let's get to the sanskrit representation.

You have the long vowels (in a comparitive, not traditional, order): /ii/, /uu/, and /aa/. /ii/ is pronounced just like arabic yA and /uu/ is pronounced just like arabic waw. /aa/ is NOT pronounced like Alif, however; Alif is more frontal (remember, "a" as in 'hat'), but /aa/ is a little farther back. Think "a" as in 'far,' or the first "o" in 'October.' The long vowels are also held for two beats.

You have the short vowels (same comparitive order): /i/, /u/, and /a/. In sanskrit, /i/ and /u/ have the same quality as /ii/ and /uu/, but are just one beat in length. This changes for modern Indian languages, where the short versions end up sounding like "i" as in 'bit' and "u" as in 'put.' Also, /a/ has two schools of thought as to its pronunciation. In one, it's pronounced just like "aa" but held for one beat. The second, and more predominant school has /a/ pronounced like "u" as in 'bun,' and "o" as in 'done.' Here, too, it is held for one beat.

Then, you have two semivowels, /ya/ and /va/. /ya/ is a palatal semivowel and is associated with /i/ and /ii/ in sanskrit's system of sandhi (which documents phonetic assimilation). In vedic sanskrit, /va/ was pronounced like an English "w," but came to be pronounced like the English "v." However, it still remains the labial semivowel, related to /u/ and /uu/.

Let's say you have a sanskrit word, /karmaNi/ "actions." Then, you have another word after it, /eva/ "only." You put them together in a phrase and you get /karmaNi eva/. However, in sanskrit, you must apply sandhi, and the /i/ changes to the semivowel /ya/. You end up with one word, /karmanyeva/, which still means "only actions."

It's a little bit easier to say and if you were to say the two words in casual speech (read: quickly and not in a metrically significant way), you'd end up with this anyway. To steal the wikipedia article's example, think of the phrase "don't be" in English. You say it casually and quickly, it comes out as "dome be." It happens in a great deal of languages, and instead of forcing uncomfortable articulation (when you speak "properly" or formally), sanskrit accepts and documents it, and then "forces" you to apply those changes (you apply them when you speak "properly" or formally).

/karmaNyeva/'s semivowel conversion illustrates that /ya/ - and the "y" in English for that matter - is just a broadening and truncation of the vowel /i/, to which you can then give another vowel to. Sanskrit doesn't like consecutive vowels, unlike greek and latin (mostly greek). This is how it deals with them. But, getting to the point, this makes these semivowels /ya/ and /va/ like very short half-beat length vowels in their own way.

It's fun to see similar structures.

Now, in arabic, you have two diphthongs, /ay/ and /aw/. This is a fatHa (one beat length of Alif) with yA and waw, respectively.

Sanskrit has four diphthongs (merged from a few more), /e/ and /ai/, and /o/ and /au/. Of these, /ai/ and /au/ come to my mind. Originally, scholars think they may have been pronounced as /aa/+/ii/ and /aa/+/uu/, but in modern pronunciation (and perhaps as far back as classical sanskrit), they are pronounced as /a/+/i/ and /a/+/u/. Each of these two diphthongs are of two beats' length and are classified as "long" vowels in sanskrit. Nice parallel structure, eh?

I don't know how much you guys are familiar with linguistics and such, but this is pretty interesting to me because arabic is an Afro-Asiatic language, while sanskrit is an Indo-European language. The two languages are pretty distant in terms of linguistic geneology and the features mentioned here are old in both respective languages, implying that a much later borrowing of structure and phonemes did not occur; it's more likely that each language retained these features independently. Also, while I used sanskrit here, please note that the vowel structures and such are in use in modern Indian languages in general, though the use of sandhi has declined in favor of consecutive vowels.

You may also be interested in how sanskrit and avestan are related.

The Technological Monk

Saturday, October 3, 2009

Of languages, computer and human

Monday, April 27, 2009

Linguistic similarities in Arabic and Sanskrit vowels

About Me

Blog Archive

Labels

What I'm reading right now.

My Blog List

Followers

The Technological Monk

Saturday, October 3, 2009

Of languages, computer and human

Monday, April 27, 2009

Linguistic similarities in Arabic and Sanskrit vowels

About Me

Blog Archive

Labels

Subscribe To

What I'm reading right now.

My Blog List

Followers