Saturday, April 11, 2015

Statistical Analysis of Bhagavad Gita and Translation

Bhagavad Gita

I have re-started one of my favorite projects - developing a complete Sanskrit to Bengali translator for Bhagavad Gita. I had started working on this project twice in the last 2-3 years - but every time something else came up - work pressure, office nonsense, laziness & tamas, tensions etc and as a result the projects got shelved mid-way. And also since these were open-ended ideas and not time-bound projects, there was no way of tracking progress.

This time I am taking a different approach. I've read in a book called "Influence" by Prof Cialdini that writing down one's plan, increases one's commitment level and this in turn increases the chances of success.

So here I go.

Objective:
To develop a Sanskrit to Bengali translator using words from Bhagavad Gita to build the initial dictionary

Features:
- Should be able to do samdhi analysis and detect tokens appropriately
- Should be able to resolve between different meanings and cases (phale - nominative dual as well as locative singular)
- Should be able to detect incorrect spellings for common words and suggest the correct one
- And ranslate to Bengali spoken language to the extent possible

Phase 1:
Developing the dictionary using Bhagavad Gita Chapter 1 initially

1a - Do a statistical analysis of Bhagavad Gita in terms of token count
1b - Do samdhi splits and get those tokens also
1c - Do a count of tokens and arrange the tokens in descending order of frequency
1d - Add to dictionary with forms (part of speech, tense, case, number, person etc) and Bengali meaning
Threshold level: Most common words amounting to 80% on a cumulative basis shall be used.

Phase 2:
Apply the translator to Chapter 2 and test. Identify missing words and repeat 1a thru 1d. Repeat for each chapter.

Note: The dictionary shall be extensible. If one were to replace Bengali words with appropriate Marathi words, theoretically the translation should work.

Project Details
Time: 1 month per chapter
Start Date: April 2015
End Date: October 2016

May Lord Ganesha remove all obstacles to this endeavor !Subhodeep Mukhopadhyay

1 comment: