slides

January 9, 2018 | Author: Anonymous | Category: Arts & Humanities, Writing, Spelling
Share Embed Donate


Short Description

Download slides...

Description

FIRE 2013 Presentation on

:

Transliterated Search using Syllabification Approach By:Hardik Joshi1, Apurva Bhatt1, Honey Patel2 {hardikjjoshi,apurva.bhatt7,Honeypatel.39}@gmail.com 1Department

of Computer Science, Gujarat University, Ahmedabad, India. 2L.J. College of Engineering, Ahmedabad, India

Dec @FIRE 4rth Dec 2013

Content  Introduction  Our Approach  Syllabification  Our Results  Error And Analysis  Conclusion

Introduction  There is need to provide local language support in web based applications because various domains such as ecommerce sites require English knowledge.  The challenge in transliteration is take the word “राष्ट्रपति” for this word “rashtrapati”, “rashtrapathi”, “raashtrapathy”, “raashtrpati” are various possible combinations may possible which one should be correct is again an issue.  Transliteration tasks become difficult in presence of out of vocabulary words (OOV) and noisy words.

 In both the subtasks, the transliteration was performed using syllabification approach.  In the subtask-1, we had done the morphological analysis of English words , then a corpus based approach used to identify frequently occurring Hindi words.  In the subtask-2, the queries were formulated that contained both Roman and Devanagari script and Roman script for separate run submissions.

Syllabification Approach

syllable

 Linguists have different languages have constraints on possible consonant and vowel sequences that characterize not only the word structure for the language but also the syllable structure. Onset Rhyme

 Vowels @ center (nucleus)  consonant @ beginning (onset)  End is coda

coda

nucleus

Syllable Structure Example  Word

 Sprint

Training Format Source sudakar chhagan jitesh narayan shiv madhav mohammad

Target स ◌ु द ◌ुा क र छगण ज िु◌ ि ◌ु श न ◌ुा र ◌ुा य ण श िु◌ व म ◌ुा ध व म ◌ु ह म ◌ु म द

Algorithm for subtask-I  Step 1: First of all words are fetching in English dictionary.  Step 2: perform spell-check ,stemming and also morphological analysis for English language, if no spell error and match found then label the word as English =E.  Step 3: If English word are not found then check with English corpus of US News paper.

 Step 4: If English word found then check with English corpus of Indian news paper.  Step 5: If English word found in US News paper and not found in Indian news paper then word=E.

 Step 6: Step 2 and step 5 are parallel apply for English words and label as =\E.  Step 7: Remaining words would be transliterate into Hindi words and Label the word as = \H.  Step 8: Apply to Moses tool ,which one is help English words transliterate into Hindi words.

RESULT OF SUBTASK-1

Results For Subtask 2  Run 1 “मर सापन न कक रानी काब आयगी ि mere sapnon ki rani kab aayegi tu”.  Run 2 “mere sapnon ki rani kab aayegi tu”. Metrics

Run-1

Run-2

Maximum Score

Median Score

nDCG@5

0.5627

0.5262

0.8052

0.5620

nDCG@10

0.5619

0.5232

0.8002

0.5608

MAP

0.2546

0.2163

0.4236

0.2355

MRR

0.5835

0.5730

0.8440

0.5884

Error And Analysis  There are some problems in the transliteration which decreased the precision.  Error in the maatra : “sapnon” => “सापन न”, “ki” => “की”, “kab” => “काब”, “main” => “ममन” & “mein” => “मीन” , na => न & ka => क  Multiple Mapping of the words e.g. T = ि, ट, i.e. tera=>टरा, tum => िूम, to => ट , teri =>टरर .  Missing sounds (फ, ख, छ ‘chh’, ksh) i. e. for word “accha” we got “आक्का”, for , “poochho” we got “पछ ू ट”.

 Multiple Transliterations- c,k  The vowel are not giving perfect answers i.e. “lo” => “लॉ” , “ho”=> “ह र”, “ko” => “कॉ”  Spelling Variations(shree,shri)  Conjuncts formation(“kya” => “कया”)  Missing of vowels ‘ak tr khan’ (अक ु िर खान)  ‘y’ As Vowel: ‘anthony’ & ‘Shyam’

Conclusion  We used the syllabification approach and considered the most probable term in the transliteration process. The word labeling task was performed assuming that a term either belongs to English language or Hindi language. We were able to get high accuracy in English recall as the labeling approach used morphological analysis and dictionary approach. However due to syllabification model, the transliteration did not give high precision resulting in lower precision of transliteration tasks and subsequently lower precision metrics in the song lyrics retrieval tasks.

View more...

Comments

Copyright � 2017 NANOPDF Inc.
SUPPORT NANOPDF