Chinese characters WILL last

Discuss the Chinese language.
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1266: count(): Parameter must be an array or an object that implements Countable
User avatar
Alex_rcpilot
Posts: 14
Joined: Sat Oct 28, 2006 3:25 pm
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1266: count(): Parameter must be an array or an object that implements Countable

Chinese characters WILL last

Postby Alex_rcpilot » Sun Oct 29, 2006 9:48 am

A language is itself just because the way it is.



Based on the fact that some foreign friends don't have adequate understanding about written Chinese, and that some of them claim that Chinese characters are destined to die out, I realized this topic should be brought up, in order to make it known to all that Chinese characters will flourish.



Frankly I'm NOT a student of language majur. I'm 23 and I just graduated with a mechanical engineering bachelor's degree. However, my primary interest is building electronic systems, including those with LCD user interfaces. Sometimes I need to embed a Chinese input method and character display function into my system. So you know, [color=red]I HAVE TO BUILD A CHINESE CHARACTER SUPPORTING SYSTEM FROM SCRATCH[/color]. What you worried about was not a problem at all. On the contrary, Chinese texts take up less space in a digital system, and less time to type. I'll tell you why.



We all know that our screens are made up of luminescence materials which form tiny pixels. And that characters are a type of special images. Actually alphabetic letters are images too, but relatively simpler. Anyway, the way they are displayed are the same. We draw them on the creen, so they look the way a character appears.



Take the letter "A" and the character "国"(guo/country) for example:

let's put them both in a small 16 pixel height.



A:

00000000 = 0x00

00000000 = 0x00

00010000 = 0x10

00111000 = 0x38

01101100 = 0x6C

11000110 = 0xC6

11000110 = 0xC6

11111110 = 0xFE

11000110 = 0xC6

11000110 = 0xC6

11000110 = 0xC6

11000110 = 0xC6

00000000 = 0x00

00000000 = 0x00

00000000 = 0x00

00000000 = 0x00



国:

0000000000000100 = 0x00, 0x04

0111111111111110 = 0x7F, 0xFE

0100000000100100 = 0x40, 0x24

0101111111110100 = 0x5F, 0xF4

0100000100000100 = 0x41, 0x04

0100000100000100 = 0x41, 0x04

0100000101000100 = 0x41, 0x44

0100111111100100 = 0x4F, 0xE4

0100000100000100 = 0x41, 0x04

0100000101000100 = 0x41, 0x44

0100000100100100 = 0x41, 0x24

0100000100000100 = 0x41, 0x04

0101111111110100 = 0x5F, 0xF4

0100000000000100 = 0x40, 0x04

0111111111111100 = 0x7F, 0xFC

0100000000000100 = 0x40, 0x04



A digital 1 stands for a visible pixel on the screen, and digital 0 stands for the background. and that's how we draw images on the screen. Actually each digit takes up a bit in the memory. And 8 bits, make up a byte. So we can also put the dots into hexdecimal format. That's what those magic 0xxx numbers are. So the letter A takes up 16 bytes while being displayed, and the character 国 takes up 32 bytes.



Where do the images come from? They come from what we call a library, which is actually a file filled with sequential bytes. There are 128 ASCII codes, each displayed same way as illustrated above for letter "A". So an ASCII library takes exactly 128*16=2048 bytes, or 2KB. There are 8192 most frequently used Chinese characters according to the national standard(GBxxxx). These characters are usually arranged together to form a standard library. Given the fact a single 16*16 size Chinese character takes up 32 bytes, the tatal space 8192 characters consume is exactly 8192*32=262144 bytes. Because 262144/1024=256, we can say it takes 256KB to store such a library. How much space have you guys got in your computer? 40GB? 120GB? 400GB? or 2TB? I got 320GB here. so I've got a hell lotta space to save over 1.3 million duplicates of this library. So, you see, a 256KB library isn't big deal nowadays.



What does the computer do when we input Chinese characters? There are different types of IME's or we call input methods. I'm not digging deep into the algorithm they implement. I'm ganna tell you how characters are fetched from this library with a certain index. The 8192 characters are stored inside this library in a certain order. GB has set a table of index to retrieve these characters by giving each of them a unique address code. And in Chinese it is called "区位输入法", which means index by zone code and serial code. This makes the sequence characters form the library.



When we type Chinese characters, they appear as complicated images on the screen, but what's inside, is much simpler. Let's creat a *.txt file and type something into it. For example input Chinese characters "中国"(China) without any space or enter. Save the file, and check its properties, what's its size? 4 bytes right? How come two complex Chinese characters take only four bytes? I'll tell you how this is done inside the computer. (And actually the four specific bytes will be 0xD6,0xD0,0xB9,0xFA if you're curious enough to look into the file with WinHEX or UltraEdit.)



Take 国 for example again. Like I said it is stored as 0xB9,0xFA inside the machine, which we call the "internal code". Its zone code is 0xB9-0xA0=0x19 = 25(decimal), and its serial code is 0xFA-0xA0=0x5A = 90(decimal). 0xA0 or 160(decimal),is the threshould of whether the computer should consider a byte stored is for a Chinese character or an ASCII letter. When it gets 0xB9, it compares the number with 0xA0, and finds out it's greater than 0xA0, so the computer does the math as below:

absolute address of first byte of character in library = 32*[(zone code-161)*94+(serial code-161)]

-- 32 means each character takes 32 bytes.

-- 94 means there 94 zones inside the library

-- serial code-161 is the position this character stands in the zone of the library,or what we call the OFFSET

-- (zone code-161)*94+OFFSET means the sequence number of the character in the whole library. For 国, it is 2345, which means it's the 2345th character in the entire library. Multiply it by 32, and we get the address number 75040. This is where we start to pick up 32 sequential bytes. And they will be 0x00,0x04,0x7F,0xFE......0x40,0x04 just as illustrated above.



On the other hand, an ASCII code takes one byte in a *.txt file. This byte is its address in the ASCII table, i.e, A being 0x41. The computer checks the ASCII library pretty much the same way it does with a Chinese character, but the calculation is much simpler. 0x41-1=0x40 is multiplied by 16, instead of 32 because of the space the image takes.



So we've come all the way to this understanding, that a 8192(count)*16*16(size) characters Chinese library takes up 256KB of memory inside a digital device. And that our texts are stored inside the computer for two bytes per Chinese character and one byte per ASCII code. Most importantly, ONLY ONE LIBRARY IS SUFFICIENT IN A DIGITAL DIVICE TO DISPLAY ALL THE CHARACTERS IT SUPPORTS!! I don't think anyone would mind sparing 256KB extra space to support 8192 Chinese characters' compatibility. Some one said on most U.N confrences, printed documents with the same content come with different thickness, and the one typed in Chinese is always the thinest. The written Chinese language is brief. I've seen so many English articles being translated into Chinese, in less than half the original length. So I say, given the algorithm that one Chinese character takes twice an English letter does, a Chinese digital document can take no more than an English one, at least not much more. Consequently, space isn't an issue which will cause Chinese characters to die out. Neither will input efficiency be one. My cousin used to work as a typist, she could type an article even faster than someone reciting the same thing. So, my point is, Chinese characters will last, won't you agree?

jane
Posts: 3
Joined: Wed Aug 15, 2007 9:23 am
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1266: count(): Parameter must be an array or an object that implements Countable

Postby jane » Thu Aug 16, 2007 3:38 am


User avatar
Alex_rcpilot
Posts: 14
Joined: Sat Oct 28, 2006 3:25 pm
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1266: count(): Parameter must be an array or an object that implements Countable

Postby Alex_rcpilot » Fri Aug 17, 2007 5:35 pm


Tom Higgins
Posts: 38
Joined: Mon Dec 10, 2007 8:43 am
Location: Shanghai
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1266: count(): Parameter must be an array or an object that implements Countable

Postby Tom Higgins » Mon Dec 10, 2007 11:05 am

I agree with you Alex. Chinese characters will last.



Granted, a Chinese character typically requires more storage space on disk than the equivalent alphabet-based language, but storage space of text is hardly ever an issue, considering that image and multimedia file require MUCH larger space.



Also a look-up table can be constructed to connect codes of much shorter length with the actual representation table of a character, so the saying that Chinese characters are destined to disappear cannot be further from the truth.



Plus, there are 1.3 billion people using the Chinese characters even though they're pronounced differently in different dialects. How much effort and how long time may EVER make Chinese characters obsolete is imply beyond my ken.
[url=http://www.learnrealchinese.com][b][size=150][color=blue][u]LearnRealChinese.com[/u][/color][/size][/b][/url] - Secrets of learning Chinese easily, quickly and inexpensively

jaz
Posts: 10
Joined: Mon Jul 05, 2010 5:20 am
Location: shanghai
Contact:

Re: Chinese characters WILL last

Postby jaz » Tue Jul 06, 2010 9:07 am

Are we talking in technolgy or as part of a language?



It will never die out, it may slightly adapt but wont die out. It part of the culture and history of China and would be unthinkable to change it . However even now it is changing. More and more english words are part of the younger generation mandarin.
A single conversation with a wise man is better than ten years of study.

~Chinese Proverb



I have been learning Mandarin with Mandarin lessons for a year and still trying to find my wise man.

http://www.chinese-blossom.com

Yeleixingfeng
Posts: 110
Joined: Thu Mar 17, 2011 12:50 am
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1266: count(): Parameter must be an array or an object that implements Countable

Re: Chinese characters WILL last

Postby Yeleixingfeng » Wed Mar 30, 2011 7:18 pm

Wow. I am impressed. Seldom you see an engineer so enthusiastic in promoting the Chinese culture.



Gogogo Hanzi!



By the way, if the 白話運動 (Changing from 'Classical' Chinese grammar to modern Chinese Grammar) never happened, a whole lot less of characters would be needed to convey the same meaning. Hence, more space is saved, if that really is the main concern/obstacle of Chinese characters. Haha ^^


[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1266: count(): Parameter must be an array or an object that implements Countable
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1266: count(): Parameter must be an array or an object that implements Countable

Return to “Chinese language”

Who is online

Users browsing this forum: No registered users and 5 guests