Monday, September 30, 2013

The statistical analysis of using the ancient 反切 method

The uproar and outcry of people in HK regarding the standardization of the pronounciation of chinese characters in cantonese lead me to this analysis.

Using a system designed many years ago, t least over 100 years or a 1000 years ago, imagine that the people in China, the king, the emperor, the ruler, the extensive land coverage, the various number of different races and cultures, impact from others, with one and only one standard book used, or forced to use in one province or city, this is insanely stupid.

I am based on the correctness probability, try not to use the word "incorrect".

The system designed in such a long time ago is based on two characters to describe the third character. This is a simple binary tree structure in computer science slogan. The sound of one character is determined by the sound of two other characters. Using the consonant of one character and vowel of the other character, the third character's sound is to be determined. The tone of the third is determined with the simlar way. The first character determine the high or low pitch and the second character determine which tone (平上去入) and for this reason, the total tones determined will be 2x4=8. But cantonese tones has 9 in total and this is one of the failure in the system design. 中入 is missing. (middle tone, entering)

Other than this faulty design, the accuracy can be analyse here:

The probability of having a incorrect sound for one and say it is 50% or 0.5. Because people moved, emperior changed, culture interacted and modified, the original sound will or will not be the same and now it is assumed to be correct or kept in original form with a probability of 0.5. If it is not changed, the probability will be 1. If it is changed, the probability will be 0 (incorrect).

Let's look at this simple logic table:
A  B  C(character described with 反切 method
0   0   0  (changed)
0   1   0  (inaccurately pronouned because of one character changed)
1   0   0  (same as the above)
1   1   1  ( this is not changed if the two characters 反切ed is not changed)

This is a simple AND logic and the probability of having the character 反切ed to be accurate is simply 0.5x0.5 which is 0.25 or a quarter (1/4, 1/2x2)

This is a unit of the binary tree node. Keeping this to describe another character, the correctness will go down to 0.25x0.25 which is 0.0625.  And of course some characters have the sound correctly kept over that thousand years but most of the characters should get a probability of 0.0000001 of being unchanged.

We should see that using this system and claiming the authority of standardization is really STUPID!

If those people educated, they should know a good and sound system like logic and mathematic, we need AXIOMS and THEOREMS.  We should keep the solid base or foundation like some characters (base one) should be taught first and the probability of changing them should be 0. And then based on these characters, other characters' sound will be described through them. With the western influence nowaday, as they are phonetic, we should be able to keep the cantonese to go another thousands years without much changes.

Tuesday, September 3, 2013

Final Design of Cantonese Input Method

This should be my final design out of all the cantonese input methods that I have previously done. This version should be the easiest, fastest and it should be able to be ported in any other devices. I have just modified it so you can choose the layout in Chinese/Cantonese or English:

If you are thinking of entering "初" which in Linguistic Society of Hong Kong's LSHK Transcription System spells as "co1", you will click the principal "o" first, then you will have a list of all characters will vowel "o" will all different consonants such as "zo", "to", "mo", "go" etc. Also you have all characters with all various tones as well. By choosing one of them, you will get the list of characters having the same consonant, vowel and tone. The last click is to select the one you want and in this case '初'. This system takes only 3 or 4 clicks. I think this is awesome!

You can test it here:

Have fun! Enjoy!