Tamil Script Code for Information Interchange: Difference between revisions
ச.பிரபாகரன் (talk | contribs) No edit summary |
Fix bare URLs references, add title |
||
(29 intermediate revisions by 25 users not shown) | |||
Line 1: | Line 1: | ||
{{Contains special characters|Indic}} |
|||
{{IndicText}} |
|||
'''Tamil Script Code for Information Interchange''' ('''TSCII''') is a coding scheme for representing the [[Tamil script]]. The lower 128 codepoints are plain [[American Standard Code for Information Interchange|ASCII]], the upper 128 codepoints are TSCII-specific. After long years of being used on the Internet by private agreement only, it was successfully registered with the [[Internet Assigned Numbers Authority|IANA]] in 2007.<ref> |
'''Tamil Script Code for Information Interchange''' ('''TSCII''') is a coding scheme for representing the [[Tamil script]]. The lower 128 codepoints are plain [[American Standard Code for Information Interchange|ASCII]], the upper 128 codepoints are TSCII-specific. After long years of being used on the Internet by private agreement only, it was successfully registered with the [[Internet Assigned Numbers Authority|IANA]] in 2007.<ref>{{Cite web | url=https://www.iana.org/assignments/charset-reg/TSCII | format=TXT | title=Character set name: TSCII (TAMIL SCRIPT CODE FOR INFORMATION INTERCHANGE) | website=www.iana.org | publisher=[[IANA]]}}</ref> |
||
'''TSCII''' encodes the characters in visual (written) order, paralleling the use of the Tamil Typewriter. |
'''TSCII''' encodes the characters in visual (written) order, paralleling the use of the Tamil Typewriter. [[Unicode]], instead, uses the logical order encoding strategy for Tamil, following [[ISCII]], in contrast to the case of [[Thai alphabet|Thai]], where the visual order encoding grandfathered by [[Thai Industrial Standard 620-2533|TIS-620]] was adopted. |
||
⚫ | |||
[[Unicode]] has used the logical order encoding strategy for Tamil, following [[ISCII]], in contrast to the case of [[Thai alphabet|Thai]], where the visual order encoding grandfathered by [[Thai Industrial Standard 620-2533|TIS-620]] was adopted. |
|||
⚫ | |||
⚫ | |||
==History== |
|||
⚫ | |||
The need for a common encoding for Tamil was felt by members of various mailing list based forums in mid-1990s, as there were multiple custom coded fonts were prevalent in those forums. While some of the commercial encodings were popular than the others, they were not accepted by wider community due to conflicting commercial interests. While Unicode was accepted by most as the future standard, most of the desktop systems at that time were still not capable of handling Unicode for Tamil language, and an interim 8-bit encoding was required. |
|||
A separate mailing list for discussion of such encodings (webmasters@tamil.net) was created in 1997 to initiate this discussion, starting with an email written by [[Kuppuswamy Kalyanasundaram|Dr.K.Kalyanasundaram]] to the popular Tamil author [[Sujatha (writer)|Sujatha]] who headed the committee for standardization of Tamil keyboard.<ref>{{Cite web|url=http://www.infitt.org/tscii/archives/msg00001.html|title = A proposal for font encoding scheme for tamil}}</ref> This forum quickly attracted enthusiastic participants from across the globe, including several prominent Tamil scholars.{{NPOV inline|date=August 2024}} Archives of these discussion are maintained by [[INFITT]].<ref>{{Cite web|url=http://www.infitt.org/tscii/archives/maillist.html|title = Tamil Discussion at webmasters@tamil.net}}</ref> |
|||
Subsequent to publishing TSCII, most of the members of webmasters@tamil.net mailing list became part of INFITT, which is a wider initiative to bring in standardization and continued development in various areas of Tamil computing. |
|||
==Codepage layout== |
==Codepage layout== |
||
{| {{chset-tableformat}} |
|||
{{chset-table- |
{|{{chset-table-header1|TSCII}} |
||
|- |
|- |
||
|{{chset-left1|8x}} |
|||
|{{chset-cell1|U+0BE6 TAMIL DIGIT ZERO|௦|fn={{efn|U+0BE6 TAMIL DIGIT ZERO, which was added with Unicode version 4.1 in March, 2005}}}} |
|||
|{{chset-color-intl}} |{{chset-cell3|0BE6|௦|128}} |
|||
|{{chset- |
|{{chset-cell1|U+0BE7 TAMIL DIGIT ONE|௧}} |
||
|{{chset- |
|{{chset-cell1|U+0BB8 TAMIL LETTER SA, U+0BCD TAMIL SIGN VIRAMA, U+0BB0 TAMIL LETTER RA, U+0BC0 TAMIL VOWEL SIGN II|ஸ்ரீ}} |
||
|{{chset- |
|{{chset-cell1|U+0B9C TAMIL LETTER JA|ஜ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB7 TAMIL LETTER SSA|ஷ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB8 TAMIL LETTER SA|ஸ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB9 TAMIL LETTER HA|ஹ}} |
||
|{{chset- |
|{{chset-cell1|U+0B95 TAMIL LETTER KA, U+0BCD TAMIL SIGN VIRAMA, U+0BB7 TAMIL LETTER SSA|க்ஷ}} |
||
|{{chset- |
|{{chset-cell1|U+0B9C TAMIL LETTER JA, U+0BCD TAMIL SIGN VIRAMA|ஜ்}} |
||
|{{chset- |
|{{chset-cell1|U+0BB7 TAMIL LETTER SSA, U+0BCD TAMIL SIGN VIRAMA|ஷ்}} |
||
|{{chset- |
|{{chset-cell1|U+0BB8 TAMIL LETTER SA, U+0BCD TAMIL SIGN VIRAMA|ஸ்}} |
||
|{{chset- |
|{{chset-cell1|U+0BB9 TAMIL LETTER HA, U+0BCD TAMIL SIGN VIRAMA|ஹ்}} |
||
|{{chset- |
|{{chset-cell1|U+0B95 TAMIL LETTER KA, U+0BCD TAMIL SIGN VIRAMA, U+0BB7 TAMIL LETTER SSA, U+0BCD TAMIL SIGN VIRAMA|க்ஷ்}} |
||
|{{chset- |
|{{chset-cell1|U+0BE8 TAMIL DIGIT TWO|௨}} |
||
|{{chset- |
|{{chset-cell1|U+0BE9 TAMIL DIGIT THREE|௩}} |
||
|{{chset- |
|{{chset-cell1|U+0BEA TAMIL DIGIT FOUR|௪}} |
||
|- |
|- |
||
|{{chset-left1|9x}} |
|||
|{{chset- |
|{{chset-cell1|U+0BEB TAMIL DIGIT FIVE|௫}} |
||
|{{chset- |
|{{chset-cell1|U+2018 LEFT SINGLE QUOTATION MARK|[[Quotation mark|‘]]}} |
||
|{{chset- |
|{{chset-cell1|U+2019 RIGHT SINGLE QUOTATION MARK|[[Apostrophe|’]]}} |
||
|{{chset- |
|{{chset-cell1|U+201C LEFT DOUBLE QUOTATION MARK|[[Quotation mark|“]]}} |
||
|{{chset- |
|{{chset-cell1|U+201D RIGHT DOUBLE QUOTATION MARK|[[Quotation mark|”]]}} |
||
|{{chset- |
|{{chset-cell1|U+0BEC TAMIL DIGIT SIX|௬}} |
||
|{{chset- |
|{{chset-cell1|U+0BED TAMIL DIGIT SEVEN|௭}} |
||
|{{chset- |
|{{chset-cell1|U+0BEE TAMIL DIGIT EIGHT|௮}} |
||
|{{chset- |
|{{chset-cell1|U+0BEF TAMIL DIGIT NINE|௯}} |
||
|{{chset- |
|{{chset-cell1|U+0B99 TAMIL LETTER NGA, U+0BC1 TAMIL VOWEL SIGN U|ஙு}} |
||
|{{chset- |
|{{chset-cell1|U+0B9E TAMIL LETTER NYA, U+0BC1 TAMIL VOWEL SIGN U|ஞு}} |
||
|{{chset- |
|{{chset-cell1|U+0B99 TAMIL LETTER NGA, U+0BC2 TAMIL VOWEL SIGN UU|ஙூ}} |
||
|{{chset- |
|{{chset-cell1|U+0B9E TAMIL LETTER NYA, U+0BC2 TAMIL VOWEL SIGN UU|ஞூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BF0 TAMIL NUMBER TEN|௰}} |
||
|{{chset- |
|{{chset-cell1|U+0BF1 TAMIL NUMBER ONE HUNDRED|௱}} |
||
|{{chset- |
|{{chset-cell1|U+0BF2 TAMIL NUMBER ONE THOUSAND|௲}} |
||
|- |
|- |
||
|{{chset-left1|Ax}} |
|||
|{{chset- |
|{{chset-ctrl1|U+00A0 NO-BREAK SPACE|[[Non-breaking space|NBSP]]}} |
||
|{{chset- |
|{{chset-cell1|U+0BBE TAMIL VOWEL SIGN AA|ா}} |
||
|{{chset- |
|{{chset-cell1|U+0BBF TAMIL VOWEL SIGN I|ி}} |
||
|{{chset- |
|{{chset-cell1|U+0BC0 TAMIL VOWEL SIGN II|ீ}} |
||
|{{chset- |
|{{chset-cell1|U+0BC1 TAMIL VOWEL SIGN U|ு}} |
||
|{{chset- |
|{{chset-cell1|U+0BC2 TAMIL VOWEL SIGN UU|ூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BC6 TAMIL VOWEL SIGN E|ெ}} |
||
|{{chset- |
|{{chset-cell1|U+0BC7 TAMIL VOWEL SIGN EE|ே}} |
||
|{{chset- |
|{{chset-cell1|U+0BC8 TAMIL VOWEL SIGN AI|ை}} |
||
|{{chset- |
|{{chset-cell1|U+00A9 COPYRIGHT SIGN|[[Copyright symbol|©]]}} |
||
|{{chset- |
|{{chset-cell1|U+0BD7 TAMIL AU LENGTH MARK|ௗ}} |
||
|{{chset- |
|{{chset-cell1|U+0B85 TAMIL LETTER A|அ}} |
||
|{{chset- |
|{{chset-cell1|U+0B86 TAMIL LETTER AA|ஆ}} |
||
|{{chset- |
|{{chset-cell1|||style=background:#DDD}} |
||
|{{chset- |
|{{chset-cell1|U+0B88 TAMIL LETTER II|ஈ}} |
||
|{{chset- |
|{{chset-cell1|U+0B89 TAMIL LETTER U|உ}} |
||
|- |
|- |
||
|{{chset-left1|Bx}} |
|||
|{{chset- |
|{{chset-cell1|U+0B8A TAMIL LETTER UU|ஊ}} |
||
|{{chset- |
|{{chset-cell1|U+0B8E TAMIL LETTER E|எ}} |
||
|{{chset- |
|{{chset-cell1|U+0B8F TAMIL LETTER EE|ஏ}} |
||
|{{chset- |
|{{chset-cell1|U+0B90 TAMIL LETTER AI|ஐ}} |
||
|{{chset- |
|{{chset-cell1|U+0B92 TAMIL LETTER O|ஒ}} |
||
|{{chset- |
|{{chset-cell1|U+0B93 TAMIL LETTER OO|ஓ}} |
||
|{{chset- |
|{{chset-cell1|U+0B94 TAMIL LETTER AU|ஔ}} |
||
|{{chset- |
|{{chset-cell1|U+0B83 TAMIL SIGN VISARGA|ஃ}} |
||
|{{chset- |
|{{chset-cell1|U+0B95 TAMIL LETTER KA|க}} |
||
|{{chset- |
|{{chset-cell1|U+0B99 TAMIL LETTER NGA|ங}} |
||
|{{chset- |
|{{chset-cell1|U+0B9A TAMIL LETTER CA|ச}} |
||
|{{chset- |
|{{chset-cell1|U+0B9E TAMIL LETTER NYA|ஞ}} |
||
|{{chset- |
|{{chset-cell1|U+0B9F TAMIL LETTER TTA|ட}} |
||
|{{chset- |
|{{chset-cell1|U+0BA3 TAMIL LETTER NNA|ண}} |
||
|{{chset- |
|{{chset-cell1|U+0BA4 TAMIL LETTER TA|த}} |
||
|{{chset- |
|{{chset-cell1|U+0BA8 TAMIL LETTER NA|ந}} |
||
|- |
|- |
||
|{{chset-left1|Cx}} |
|||
|{{chset- |
|{{chset-cell1|U+0BAA TAMIL LETTER PA|ப}} |
||
|{{chset- |
|{{chset-cell1|U+0BAE TAMIL LETTER MA|ம}} |
||
|{{chset- |
|{{chset-cell1|U+0BAF TAMIL LETTER YA|ய}} |
||
|{{chset- |
|{{chset-cell1|U+0BB0 TAMIL LETTER RA|ர}} |
||
|{{chset- |
|{{chset-cell1|U+0BB2 TAMIL LETTER LA|ல}} |
||
|{{chset- |
|{{chset-cell1|U+0BB5 TAMIL LETTER VA|வ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB4 TAMIL LETTER LLLA|ழ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB3 TAMIL LETTER LLA|ள}} |
||
|{{chset- |
|{{chset-cell1|U+0BB1 TAMIL LETTER RRA|ற}} |
||
|{{chset- |
|{{chset-cell1|U+0BA9 TAMIL LETTER NNNA|ன}} |
||
|{{chset- |
|{{chset-cell1|U+0B9F TAMIL LETTER TTA, U+0BBF TAMIL VOWEL SIGN I|டி}} |
||
|{{chset- |
|{{chset-cell1|U+0B9F TAMIL LETTER TTA, U+0BC0 TAMIL VOWEL SIGN II|டீ}} |
||
|{{chset- |
|{{chset-cell1|U+0B95 TAMIL LETTER KA, U+0BC1 TAMIL VOWEL SIGN U|கு}} |
||
|{{chset- |
|{{chset-cell1|U+0B9A TAMIL LETTER CA, U+0BC1 TAMIL VOWEL SIGN U|சு}} |
||
|{{chset- |
|{{chset-cell1|U+0B9F TAMIL LETTER TTA, U+0BC1 TAMIL VOWEL SIGN U|டு}} |
||
|{{chset- |
|{{chset-cell1|U+0BA3 TAMIL LETTER NNA, U+0BC1 TAMIL VOWEL SIGN U|ணு}} |
||
|- |
|- |
||
|{{chset-left1|Dx}} |
|||
|{{chset- |
|{{chset-cell1|U+0BA4 TAMIL LETTER TA, U+0BC1 TAMIL VOWEL SIGN U|து}} |
||
|{{chset- |
|{{chset-cell1|U+0BA8 TAMIL LETTER NA, U+0BC1 TAMIL VOWEL SIGN U|நு}} |
||
|{{chset- |
|{{chset-cell1|U+0BAA TAMIL LETTER PA, U+0BC1 TAMIL VOWEL SIGN U|பு}} |
||
|{{chset- |
|{{chset-cell1|U+0BAE TAMIL LETTER MA, U+0BC1 TAMIL VOWEL SIGN U|மு}} |
||
|{{chset- |
|{{chset-cell1|U+0BAF TAMIL LETTER YA, U+0BC1 TAMIL VOWEL SIGN U|யு}} |
||
|{{chset- |
|{{chset-cell1|U+0BB0 TAMIL LETTER RA, U+0BC1 TAMIL VOWEL SIGN U|ரு}} |
||
|{{chset- |
|{{chset-cell1|U+0BB2 TAMIL LETTER LA, U+0BC1 TAMIL VOWEL SIGN U|லு}} |
||
|{{chset- |
|{{chset-cell1|U+0BB5 TAMIL LETTER VA, U+0BC1 TAMIL VOWEL SIGN U|வு}} |
||
|{{chset- |
|{{chset-cell1|U+0BB4 TAMIL LETTER LLLA, U+0BC1 TAMIL VOWEL SIGN U|ழு}} |
||
|{{chset- |
|{{chset-cell1|U+0BB3 TAMIL LETTER LLA, U+0BC1 TAMIL VOWEL SIGN U|ளு}} |
||
|{{chset- |
|{{chset-cell1|U+0BB1 TAMIL LETTER RRA, U+0BC1 TAMIL VOWEL SIGN U|று}} |
||
|{{chset- |
|{{chset-cell1|U+0BA9 TAMIL LETTER NNNA, U+0BC1 TAMIL VOWEL SIGN U|னு}} |
||
|{{chset- |
|{{chset-cell1|U+0B95 TAMIL LETTER KA, U+0BC2 TAMIL VOWEL SIGN UU|கூ}} |
||
|{{chset- |
|{{chset-cell1|U+0B9A TAMIL LETTER CA, U+0BC2 TAMIL VOWEL SIGN UU|சூ}} |
||
|{{chset- |
|{{chset-cell1|U+0B9F TAMIL LETTER TTA, U+0BC2 TAMIL VOWEL SIGN UU|டூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BA3 TAMIL LETTER NNA, U+0BC2 TAMIL VOWEL SIGN UU|ணூ}} |
||
|- |
|- |
||
|{{chset-left1|Ex}} |
|||
|{{chset- |
|{{chset-cell1|U+0BA4 TAMIL LETTER TA, U+0BC2 TAMIL VOWEL SIGN UU|தூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BA8 TAMIL LETTER NA, U+0BC2 TAMIL VOWEL SIGN UU|நூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BAA TAMIL LETTER PA, U+0BC2 TAMIL VOWEL SIGN UU|பூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BAE TAMIL LETTER MA, U+0BC2 TAMIL VOWEL SIGN UU|மூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BAF TAMIL LETTER YA, U+0BC2 TAMIL VOWEL SIGN UU|யூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB0 TAMIL LETTER RA, U+0BC2 TAMIL VOWEL SIGN UU|ரூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB2 TAMIL LETTER LA, U+0BC2 TAMIL VOWEL SIGN UU|லூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB5 TAMIL LETTER VA, U+0BC2 TAMIL VOWEL SIGN UU|வூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB4 TAMIL LETTER LLLA, U+0BC2 TAMIL VOWEL SIGN UU|ழூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB3 TAMIL LETTER LLA, U+0BC2 TAMIL VOWEL SIGN UU|ளூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BB1 TAMIL LETTER RRA, U+0BC2 TAMIL VOWEL SIGN UU|றூ}} |
||
|{{chset- |
|{{chset-cell1|U+0BA9 TAMIL LETTER NNNA, U+0BC2 TAMIL VOWEL SIGN UU|னூ}} |
||
|{{chset- |
|{{chset-cell1|U+0B95 TAMIL LETTER KA, U+0BCD TAMIL SIGN VIRAMA|க்}} |
||
|{{chset- |
|{{chset-cell1|U+0B99 TAMIL LETTER NGA, U+0BCD TAMIL SIGN VIRAMA|ங்}} |
||
|{{chset- |
|{{chset-cell1|U+0B9A TAMIL LETTER CA, U+0BCD TAMIL SIGN VIRAMA|ச்}} |
||
|{{chset- |
|{{chset-cell1|U+0B9E TAMIL LETTER NYA, U+0BCD TAMIL SIGN VIRAMA|ஞ்}} |
||
|- |
|- |
||
|{{chset-left1|Fx}} |
|||
|{{chset- |
|{{chset-cell1|U+0B9F TAMIL LETTER TTA, U+0BCD TAMIL SIGN VIRAMA|ட்}} |
||
|{{chset- |
|{{chset-cell1|U+0BA3 TAMIL LETTER NNA, U+0BCD TAMIL SIGN VIRAMA|ண்}} |
||
|{{chset- |
|{{chset-cell1|U+0BA4 TAMIL LETTER TA, U+0BCD TAMIL SIGN VIRAMA|த்}} |
||
|{{chset- |
|{{chset-cell1|U+0BA8 TAMIL LETTER NA, U+0BCD TAMIL SIGN VIRAMA|ந்}} |
||
|{{chset- |
|{{chset-cell1|U+0BAA TAMIL LETTER PA, U+0BCD TAMIL SIGN VIRAMA|ப்}} |
||
|{{chset- |
|{{chset-cell1|U+0BAE TAMIL LETTER MA, U+0BCD TAMIL SIGN VIRAMA|ம்}} |
||
|{{chset- |
|{{chset-cell1|U+0BAF TAMIL LETTER YA, U+0BCD TAMIL SIGN VIRAMA|ய்}} |
||
|{{chset- |
|{{chset-cell1|U+0BB0 TAMIL LETTER RA, U+0BCD TAMIL SIGN VIRAMA|ர்}} |
||
|{{chset- |
|{{chset-cell1|U+0BB2 TAMIL LETTER LA, U+0BCD TAMIL SIGN VIRAMA|ல்}} |
||
|{{chset- |
|{{chset-cell1|U+0BB5 TAMIL LETTER VA, U+0BCD TAMIL SIGN VIRAMA|வ்}} |
||
|{{chset- |
|{{chset-cell1|U+0BB4 TAMIL LETTER LLLA, U+0BCD TAMIL SIGN VIRAMA|ழ்}} |
||
|{{chset- |
|{{chset-cell1|U+0BB3 TAMIL LETTER LLA, U+0BCD TAMIL SIGN VIRAMA|ள்}} |
||
|{{chset- |
|{{chset-cell1|U+0BB1 TAMIL LETTER RRA, U+0BCD TAMIL SIGN VIRAMA|ற்}} |
||
|{{chset- |
|{{chset-cell1|U+0BA9 TAMIL LETTER NNNA, U+0BCD TAMIL SIGN VIRAMA|ன்}} |
||
|{{chset- |
|{{chset-cell1|U+0B87 TAMIL LETTER I|இ}} |
||
|{{chset- |
|{{chset-cell1|||style=background:#DDD}} |
||
{{chset-table-footer}} |
|||
|} |
|} |
||
{{notelist}} |
|||
In the table above <span style="background-color:pink;">80</span> is U+0BE6 TAMIL DIGIT ZERO, which has been accepted in Unicode version 4.1. <span style="background-color:pink;">A0</span> is the NO-BREAK SPACE. |
|||
The codes <span style="background-color:#CFC;">AD</span> and <span style="background-color:#CFC;">FF</span> are unassigned. |
|||
== Conversion Tools == |
|||
Text encoded in UTF-8 can be converted to TSCII using the GNU iconv tools as follows, |
|||
<syntaxhighlight lang="BASH"> |
|||
$ iconv -f utf-8 -t tscii hello.utf8 > hello.tscii |
|||
</syntaxhighlight> |
|||
Whereas conversion from TSCII to UTF-8 is done by interchanging '''-f''' and '''-t''' flags. |
|||
=== Visual Application === |
|||
An open source project is available at [https://github.com/ThaniThamizhAkarathiKalanjiyam/AnyTaFont2UTF8 AnyTaFont2UTF8] is maintained by [[groups.yahoo.com/groups/isaiyini|Isaiyini Tamil Community]] |
|||
== See also == |
|||
* [[TACE16]] (Tamil All Character Encoding) |
|||
== References == |
== References == |
||
{{Reflist}} |
|||
<references/> |
|||
== External links == |
== External links == |
||
* [http://www.tscii.org/ TSCII Start Page] |
* [http://www.tscii.org/ TSCII Start Page] |
||
* [ |
* [https://www.unicode.org/notes/tn15/ Unicode Technical Note #15 Text conversion From TSCII 1.7 to Unicode] |
||
* [http://www.infitt.org/ INFITT (International Forum for Information Technology in Tamil)] |
* [http://www.infitt.org/ INFITT (International Forum for Information Technology in Tamil)] |
||
* [ |
* [https://web.archive.org/web/20030401012750/http://tamilone.com/ TSCII to Unicode Online & Webpage Conversion] |
||
* [http://padma.mozdev.org Padma – Mozilla extension for transforming TSCII to Unicode] |
* [http://padma.mozdev.org Padma – Mozilla extension for transforming TSCII to Unicode] |
||
⚫ | |||
⚫ | |||
{{character encoding}} |
{{character encoding}} |
||
⚫ | |||
{{CharacterEncoding-stub}} |
|||
⚫ |
Latest revision as of 20:48, 10 November 2024
Tamil Script Code for Information Interchange (TSCII) is a coding scheme for representing the Tamil script. The lower 128 codepoints are plain ASCII, the upper 128 codepoints are TSCII-specific. After long years of being used on the Internet by private agreement only, it was successfully registered with the IANA in 2007.[1]
TSCII encodes the characters in visual (written) order, paralleling the use of the Tamil Typewriter. Unicode, instead, uses the logical order encoding strategy for Tamil, following ISCII, in contrast to the case of Thai, where the visual order encoding grandfathered by TIS-620 was adopted.
The government of Tamil Nadu endorses its own TAB/TAM standards for 8-bit encoding and other, older encoding schemes can still be found on the web.
The free etext collection at Project Madurai uses the TSCII encoding, but has already started to provide Unicode versions.
History
[edit]The need for a common encoding for Tamil was felt by members of various mailing list based forums in mid-1990s, as there were multiple custom coded fonts were prevalent in those forums. While some of the commercial encodings were popular than the others, they were not accepted by wider community due to conflicting commercial interests. While Unicode was accepted by most as the future standard, most of the desktop systems at that time were still not capable of handling Unicode for Tamil language, and an interim 8-bit encoding was required.
A separate mailing list for discussion of such encodings (webmasters@tamil.net) was created in 1997 to initiate this discussion, starting with an email written by Dr.K.Kalyanasundaram to the popular Tamil author Sujatha who headed the committee for standardization of Tamil keyboard.[2] This forum quickly attracted enthusiastic participants from across the globe, including several prominent Tamil scholars.[neutrality is disputed] Archives of these discussion are maintained by INFITT.[3]
Subsequent to publishing TSCII, most of the members of webmasters@tamil.net mailing list became part of INFITT, which is a wider initiative to bring in standardization and continued development in various areas of Tamil computing.
Codepage layout
[edit]0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
8x | ௦[a] | ௧ | ஸ்ரீ | ஜ | ஷ | ஸ | ஹ | க்ஷ | ஜ் | ஷ் | ஸ் | ஹ் | க்ஷ் | ௨ | ௩ | ௪ |
9x | ௫ | ‘ | ’ | “ | ” | ௬ | ௭ | ௮ | ௯ | ஙு | ஞு | ஙூ | ஞூ | ௰ | ௱ | ௲ |
Ax | NBSP | ா | ி | ீ | ு | ூ | ெ | ே | ை | © | ௗ | அ | ஆ | ஈ | உ | |
Bx | ஊ | எ | ஏ | ஐ | ஒ | ஓ | ஔ | ஃ | க | ங | ச | ஞ | ட | ண | த | ந |
Cx | ப | ம | ய | ர | ல | வ | ழ | ள | ற | ன | டி | டீ | கு | சு | டு | ணு |
Dx | து | நு | பு | மு | யு | ரு | லு | வு | ழு | ளு | று | னு | கூ | சூ | டூ | ணூ |
Ex | தூ | நூ | பூ | மூ | யூ | ரூ | லூ | வூ | ழூ | ளூ | றூ | னூ | க் | ங் | ச் | ஞ் |
Fx | ட் | ண் | த் | ந் | ப் | ம் | ய் | ர் | ல் | வ் | ழ் | ள் | ற் | ன் | இ |
- ^ U+0BE6 TAMIL DIGIT ZERO, which was added with Unicode version 4.1 in March, 2005
Conversion Tools
[edit]Text encoded in UTF-8 can be converted to TSCII using the GNU iconv tools as follows,
$ iconv -f utf-8 -t tscii hello.utf8 > hello.tscii
Whereas conversion from TSCII to UTF-8 is done by interchanging -f and -t flags.
Visual Application
[edit]An open source project is available at AnyTaFont2UTF8 is maintained by Isaiyini Tamil Community
See also
[edit]- TACE16 (Tamil All Character Encoding)