Note: Currently new registrations are closed, if you want an account Contact us

Difference between revisions of "SMC/AtomicChilluIsUnacceptable"

From FSCI Wiki
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
[Working Draft]
[Working Draft]
Draft version of [http://www.unicode.org/versions/Unicode5.1.0/#Significant_Character_Additions Unicode 5.1.0] suggests new code points for CHILLU characters.
We, Swathanthra Malayalam Computing, a developer community working on localization, development and standardization of Malayalam Computing softwares has the following comments on the draft version of Unicode 5.1.0 .


The Atomic chillu's are unacceptable because it destroys the link of a chillu with its base character.
The Atomic chillu's are unacceptable because it destroys the link of a chillu with its base character.


1. The examples used to justify semantic difference between  words only separated by zwj are non-existent in dictionary or are grammatically wrong or meaningless without proper context.
1. The examples used to justify semantic difference between  words only separated by zwj are non-existent in dictionary , not in are grammatically wrong or meaningless without proper context.


a) വന്‍യവനിക/വന്യവനിക (vanYavanika/vanyavanika), കണ്‍വലയം/കണ്വലയം (kanvalayam/kanualayam) ... contrived examples not found in dictionary
a) വന്‍യവനിക/വന്യവനിക (vanYavanika/vanyavanika), കണ്‍വലയം/കണ്വലയം (kanvalayam/kanualayam) ... contrived examples not found in dictionary
b)  ആ മനുഷൃന്‍ കൊടുക്കുന്നു (that man is giving)
b)  ആ മനുഷ്യന്‍ കൊടുക്കുന്നു (that man is giving)
മനുഷൃനു് കൊടുക്കുന്നു (giving to the man)
 
മനുഷ്യനു് കൊടുക്കുന്നു (giving to the man)


as per malayalam lingustic rules the sentence is a mistake.
as per malayalam lingustic rules the sentence is a mistake.
Line 14: Line 18:
Structure:
Structure:


ആ മനുഷ്യന്‍ <to whom & what he is gives>  കൊടുക്കുന്നു
ആ മനുഷ്യന്‍ <to whom & what he gives>  കൊടുക്കുന്നു
ആ മനുഷ്യനു്  <who is giving & what is giving> കൊടുക്കുന്നു.
ആ മനുഷ്യനു്  <who is giving & what is being given> കൊടുക്കുന്നു.


Example:
Example:
Line 24: Line 28:
to man)  :-)   
to man)  :-)   


Fundamental problem lies here in the unicode's way of treating only representational forms without checking linguistic correctness. Most of the indic languages are unlike latin and collations are based on linguistic base. If you are not considering it, it will become a play yard of people with vested interests
Here , the fundamental problem lies in Unicode's way of treating only representational forms without checking linguistic correctness. Most of the Indic languages are unlike Latin and collations are based on linguistics. If you are not considering it, it will become a play yard of people with vested interests.


2. All these arguments were once considered and rejected by UTC and the only new argument in support of atomic chillus is the issue of missing domain names in IDN. The examples given in 1) can't be considered real as these are contrived just to make a case for atomic chillus. Even if were real it is similar to case folding in Latin (You can't register two sites PenIsland.com and PenisLand.com). How can already rejected proposal be accepted when the new arguments in supports is not only proved to be real, but creates a lot of new chaos and security problems.
2. All these arguments were once considered and rejected by UTC and the only new argument in support of atomic chillus is the issue of missing domain names in IDN. The examples given in 1) can't be considered real as these are contrived just to make a case for atomic chillus. Even if were real it is similar to case folding in Latin (You can't register two sites PenIsland.com and PenisLand.com). How can already rejected proposal be accepted when the new arguments in supports is not only proved to be real, but creates a lot of new chaos and security problems.
Line 37: Line 41:
4. Since the joiners has to be supported for backward compatibility it  creates unnecessary complexity to all text processing application (sorting, searching) and it makes atomic chillus redundant and useless.
4. Since the joiners has to be supported for backward compatibility it  creates unnecessary complexity to all text processing application (sorting, searching) and it makes atomic chillus redundant and useless.


5. Why isn't a canonical equivalence to old sequences not provided?
5. As per the [http://unicode.org/policies/stability_policy.html uncode stability policy], session 'Named Character Sequence Stability', the existing chillu sequence has to be supported. In that case, inorder to process the text with atomic chillu and exiting chillu, there should be  a canonical equivalence to old sequence.  It is not provided and not mentioned in Unicode 5.1 and that breaks the existing applications and data and violates unicode stability policy.


6. Even after atomic chillus are  made part of the standard many words cannot be written without joiners and it would be increasing the chaos.
6. Even after atomic chillus are  made part of the standard many words cannot be written without joiners and it would be increasing the chaos. Thereby the atomic chillu doesnot solve the issue of ignorability of ZWJ or ZWNJ as mentioned in the proposal to encode chillu and atomic chillu is a partial incomplete solution.
കൊയ്‌രാള (koirala), സദ്‌വാരം (sadvaram)
Eg: കൊയ്‌രാള (koirala), സദ്‌വാരം (sadvaram)


7. Using virama with chillus is linguistically incorrect (function of virama is to create vowel-less and you can't use it with a chillu or pure consonant because these are already vowel-less forms of the underlying consonants)
7. Using virama with chillus is linguistically incorrect (function of virama is to create vowel-less and you can't use it with a chillu or pure consonant because these are already vowel-less forms of the underlying consonants)


I strongly oppose including this characters in the standard as it not only fail to solve all the problems with joiner it creates lots of new problems and the need for providing backward compatibility will produce more chaos in encoding chillus.
8. Everybody knows that there was no consensus reached in the discussion in indic@unicode.org mailing list and still the problem is controversial. Another thing is even though the new changes will have a major impact on the langauge technology, the linguistics and language experts in Malayalam is not at all aware of the facts. We would like to start a process there by explaining the pros and cons to the language experts and getting the opinion in this matter. So any hurry on adding new code points will have bad impacts.
 
9. The document http://www.rachanamalayalam.org/docs/ChilluEncodingIsWrong.pdf already questioned the new code points and there was no satisfactory reply from the people who proposed atomic chillu.


Praveen A
We strongly oppose including this characters in the standard as it not only fail to solve all the problems with joiner it creates lots of new problems and the need for providing backward compatibility will produce more chaos in encoding chillus.


Swathanthara Malayalam Computing
Swathanthara Malayalam Computing


http://www.smc.org.in
http://www.smc.org.in

Revision as of 12:07, 28 January 2008

[Working Draft]

Draft version of Unicode 5.1.0 suggests new code points for CHILLU characters. We, Swathanthra Malayalam Computing, a developer community working on localization, development and standardization of Malayalam Computing softwares has the following comments on the draft version of Unicode 5.1.0 .

The Atomic chillu's are unacceptable because it destroys the link of a chillu with its base character.

1. The examples used to justify semantic difference between words only separated by zwj are non-existent in dictionary , not in are grammatically wrong or meaningless without proper context.

a) വന്‍യവനിക/വന്യവനിക (vanYavanika/vanyavanika), കണ്‍വലയം/കണ്വലയം (kanvalayam/kanualayam) ... contrived examples not found in dictionary b) ആ മനുഷ്യന്‍ കൊടുക്കുന്നു (that man is giving)

ആ മനുഷ്യനു് കൊടുക്കുന്നു (giving to the man)

as per malayalam lingustic rules the sentence is a mistake. it will be completed if and only if you need to write it as following.

Structure:

ആ മനുഷ്യന്‍ <to whom & what he gives> കൊടുക്കുന്നു ആ മനുഷ്യനു് <who is giving & what is being given> കൊടുക്കുന്നു.

Example:

ആ മനുഷ്യന്‍ (man) പൂച്ചക്ക് (to cat) പാല്‍(milk) കൊടുക്കുന്നു (That man is giving milk to cat ) ആ മനുഷ്യനു് (to man) പൂച്ച (cat) പാല്‍ (milk) കൊടുക്കുന്നു. (That cat is giving milk to man)  :-)

Here , the fundamental problem lies in Unicode's way of treating only representational forms without checking linguistic correctness. Most of the Indic languages are unlike Latin and collations are based on linguistics. If you are not considering it, it will become a play yard of people with vested interests.

2. All these arguments were once considered and rejected by UTC and the only new argument in support of atomic chillus is the issue of missing domain names in IDN. The examples given in 1) can't be considered real as these are contrived just to make a case for atomic chillus. Even if were real it is similar to case folding in Latin (You can't register two sites PenIsland.com and PenisLand.com). How can already rejected proposal be accepted when the new arguments in supports is not only proved to be real, but creates a lot of new chaos and security problems.

3. This will create dual encoding and makes URL spoofing very easy.

http://റാല്മിനോവ്.blogspot.com (using chillu joiner sequence) http://റാൽമിനോവ്.blogspot.com (using atomic chillu)

because both of these have different punicode. The existing chillu encoding with joiners is best solution because all of the combinations of joiners and non-joiners give exactly same punicode.

4. Since the joiners has to be supported for backward compatibility it creates unnecessary complexity to all text processing application (sorting, searching) and it makes atomic chillus redundant and useless.

5. As per the uncode stability policy, session 'Named Character Sequence Stability', the existing chillu sequence has to be supported. In that case, inorder to process the text with atomic chillu and exiting chillu, there should be a canonical equivalence to old sequence. It is not provided and not mentioned in Unicode 5.1 and that breaks the existing applications and data and violates unicode stability policy.

6. Even after atomic chillus are made part of the standard many words cannot be written without joiners and it would be increasing the chaos. Thereby the atomic chillu doesnot solve the issue of ignorability of ZWJ or ZWNJ as mentioned in the proposal to encode chillu and atomic chillu is a partial incomplete solution. Eg: കൊയ്‌രാള (koirala), സദ്‌വാരം (sadvaram)

7. Using virama with chillus is linguistically incorrect (function of virama is to create vowel-less and you can't use it with a chillu or pure consonant because these are already vowel-less forms of the underlying consonants)

8. Everybody knows that there was no consensus reached in the discussion in indic@unicode.org mailing list and still the problem is controversial. Another thing is even though the new changes will have a major impact on the langauge technology, the linguistics and language experts in Malayalam is not at all aware of the facts. We would like to start a process there by explaining the pros and cons to the language experts and getting the opinion in this matter. So any hurry on adding new code points will have bad impacts.

9. The document http://www.rachanamalayalam.org/docs/ChilluEncodingIsWrong.pdf already questioned the new code points and there was no satisfactory reply from the people who proposed atomic chillu.

We strongly oppose including this characters in the standard as it not only fail to solve all the problems with joiner it creates lots of new problems and the need for providing backward compatibility will produce more chaos in encoding chillus.

Swathanthara Malayalam Computing

http://www.smc.org.in