Roman2Unicode v1.0
Roman Script to
Assamese Unicode (UTF-8) Script Converter
The Roman to Unicode converter has been written with the primary intention of
being able to create Assamese Unicode text
conveniently and with correct spelling contents of the words. The authors would
be grateful if users of this program send any text written for input to
converter program to the address utpal@tezu.ernet.in or
navanathsaharia@gmail.com.
It was developed by Navanath Saharia and Iftikar Hussain under the supervision
of Dr. Utpal Sharma as a part of research work on Assamese Language Processing
at Natural Language Processing Lab, Tezpur University (www.tezu.ernet.in\~nlp).
Input file format:
HIren bh\TtAcA^rjlE asm uptJkA b*TA ` lekhkr sh_re\shTh pur\skAr pAThkHe '
p_rtidin s#bAd guwAHATI , 31 Dice\mwr :2000 cnr ` asm uptJkA sAHitJ b*TA'r bAbe bishi\ST kbi HIren bh\TtAcA^rjk ni^rbAcit krA HEche |
asmr kAbJp_remI rAizr bAbe Azi ei shubh s#bAdTo sdrI kre ` uiliyAmchn megr shEXik nJAse ' |
aHA mA^rc mAHt guwAHATIt anu\shThit H'blgIyA ek bisheS anu\shThAnt bishi\ST
kbigrAkIk p_rsi\ddh guzrATI OpnJAsik - sAHitJik rghubIr cOdhArIye s\nmAnIy
` asm uptJkA sAHitJ b*TA' Anu\shThAnikbhAwe p_rdAn krib | ei b*TA HicApe
bh\TtAcA^rjlE ngd 1 lAkh TkA , ekhn p_rsh\stipt_r Aru p_rbIN shi\lpI shobhA b_r\Hmai
p_r\stut krA A^rHir eTA sudRhshJ soNAlI T_rphI p_rdAn krA H'b |
Output file format (UTF-8):
হীৰেন ভট্টাচাৰ্যলৈ অসম উপত্যকা বঁটা ` লেখকৰ শ্ৰেষ্ঠ পুৰস্কাৰ পাঠকহে'
প্ৰতিদিন সংবাদ গুৱাহাটী , ৩১ ডিচেম্বৰ :
২০০০ চনৰ ` অসম উপত্যকা সাহিত্য বঁটা'ৰ বাবে বিশিষ্ট কবি হীৰেন ভট্টাচাৰ্যক নিৰ্বাচিত কৰা হৈছে |
অসমৰ কাব্যপ্ৰেমী ৰাইজৰ বাবে আজি এই শুভ সংবাদটো সদৰী কৰে ` উইলিয়ামছন মেগৰ শৈক্ষিক ন্যাসে' |
অহা মাৰ্চ মাহত গুৱাহাটীত অনুষ্ঠিত হ'বলগীয়া এক বিশেষ অনুষ্ঠানত বিশিষ্ট কবিগৰাকীক প্ৰসিদ্ধ গুজৰাটীঔপন্যাসিক - সাহিত্যিক ৰঘুবীৰ চৌধাৰীয়ে সন্মানীয় ` অসম উপত্যকা সাহিত্য বঁটা' আনুষ্ঠানিকভাৱে প্ৰদান কৰিব |
এই বঁটা হিচাপে ভট্টাচাৰ্যলৈ নগদ ১ লাখ টকা , এখন প্ৰশস্তিপত্ৰ আৰু প্ৰবীণ শিল্পী শোভা ব্ৰহ্মই প্ৰস্তুত কৰা আৰ্হিৰএটা সুদৃশ্য সোণালী ট্ৰফী প্ৰদান কৰা হ'ব |
Key mappings:
----------------
A primary objective of Roman2Unicode
program is to make typing Unicode Assamese text
convenient. So care has been taken to select Roman letters corresponding to
Assamese letters in such a way that the pronunciation of the Assamese word can
be easily guessed. The following is the mapping used -
Vowels/Operators :
a A
i I
u U
Rh
e E
o O
Note: `a' as an operator may or may not be included in a word.
eg. kalam and klm both are same and acceptable,
asam is same as asm but not sm or sam.
Consonants :
k kh
g gh
nG
c ch
z jh
nY
T Th
D Dh
N
t th
d dh
n
p ph
b bh
m
j r
l w
sh S
s H
X R
rh y
_t # &
*
Consonant Operators :
^r - ref, eg. ta^rka (or t^rk)
_r - ra-kaar, eg. p_raNAm
J - ja-kaar, eg. bhAgJa
Juktakshars :
Juktakshars, or composite letters, are to be written as a
back-slash (i.e, \) followed by the component letters. eg.
p_ra\stut, u\ttar, etc.
Note: If the second component letter of a juktakshar is ba, then it
should be written as w to keep similarity to actual pronunciation. eg.
bi\SwabidJAlay, \swAdhIn
Requirement:
-----------------
Java 1.5 or higher version [installed]
How to Execute:
--------------------
Download and Unzip r2u.zip file in any directory,
using unzip command of Linux or Winrar or Winzip in case of Windows.
unzip r2u.zip
Change
directory to the unzip directory. Now type the following command to execute the
converter.
java Roman2Unicode input_file_name output_file_name
This will create a file output_file_name in the working directory.
And this output file is in UTF-8 format. Open output_file_name
with any Unicode Text Editor like notepade++
For any query/suggestions:
-----------------------------------
utpal@tezu.ernet.in
navanathsaharia@gmail.com