Showing posts with label FYP. Show all posts
Showing posts with label FYP. Show all posts

May 20, 2011

Displaying Sinhala in JTextArea

When I first tried to display Sinhala characters in a JTextArea, I got the following result, where all the Sinhala letters were replaced by boxes.

Original text : තමා_PRP හයිටිය_NNPI අතහැර_VNF පලා_VNF ගියේ_VP කැමැත්තකින්

The reason for this is that, it does not render Sinhala fonts.It is not a problem with detection of encoding. You can simply check the system default encoding with code below.
This produced with UTF-8 encoding, which was ok for my requirement.

Now this meant, that the problem is in using an appropriate font for the JTextArea.
File fontFile = new File(fontpath); //load the font file
Font newFont = Font.createFont(Font.TRUETYPE_FONT, fontFile);
txtOutput.setFont(newFont); //set the font to JTextArea

The above code will add a font to a JTextArea. Out of the many fonts I tried, only "dinamina.ttf" rendered properly. Even lklug did not provide proper results (It only rendered Sinhala and English characters were not at all displayed). Following is the final output.

Download sinhala fonts from here

Updates :

  • Found our that lklug has only Sinhala fonts
  • Also Bhashitha is another font that worked fine with me
  • March 31, 2011

    Mid point..Lessons learnt

    This post will keep on updating as and when I find new stuff and I realize something that I should have done.., etc

    1. Read as many as research papers possible..i have gone through 125+ so far..:(

    2. Don't try to group the research papers and put them in separate folders, because one paper may have to be put into several folders then. (If you try you will end up getting a headache ;) )

    3. Rename the research papers to the name of the research.(windows might not let you have long names, but linux does.., so go for linux :P)

    4. Use Linux (to name the papers really long and help with dolphin, specifically tabs is amazing..or you could install KDE on windows)

    5. As told above put all research papers in one folder and number them. I used to number as "01_RP (name)" where RP means research paper, BK for book and AR for articles

    6. Maintain an excel sheet with the paper number, name of paper, authors(useful when referencing) and tag the papers according to its content(example i used TOK for tokenization, ENC for encoding.., etc). it is important that you save these data because incase you loose the files you can get help of google easily. and  i assure that time spent on this will not go wasted.

    7. Another tip for skimming through the papers is to, first read the abstract and see how relevant it is to your project, you can even colour code the excel sheet depending on their importance. for example, for me papers on Sinhala were the most important so I colour coded it. But don't use more than 2-3, because then it looks highly unprofessional and complex.

    8. Also check the references for the papers you skim and search for them and keeps going..that's how I ended up with 125 ;)

    9. So when u want to write about tokenization check the papers with TOK tag and read them.

    10. You don't have to read the whole paper to tag them. just skim through the headings, and highlight important stuff as you go through them. Okular would be a good option for reviews on pdf.

    11. Maintain only a single document. Don't try to break the doc to the modules or cores of your research, it will create inconsistencies. Make sure you keep the navigator window open when you are writing.

    12. Remember to keep back up always. Keep online back up as well. I have uploaded all my research papers to Google docs. All 125 of them ;)

    13. Finish doc as soon as you can. Leave at least 4-5 days to go through all the research papers again. Trust me, you will find loads more facts you have missed. It makes the doc complete because you will be thorough with the subject by then and will be able to grasp facts more quickly.

    14. Don't get worried if you feel like it takes too long (given that you have enough time for deadline), because that means you are getting it into your nerves.

    15. Always think how you actually do something, think "programming". Don't just put something on doc, put exactly what you will be doing, and in detail. If you feel like programming something is too much work, go for an API. It is very important that you think in programming level and will help you a lot later on. And I suppose anyone would have knowledge and experience to think deep. Think of the trouble you had earlier. Think of where you got stuck earlier. Think of where you could get into trouble. After all it's FYP, you should be able to do this ;)

    16. And finally your output should be professional. And here's where you should head back to windows, because Microsoft office is the best office package and to my knowledge no one has beaten it yet. This is where you must polish(format) your doc.

    17. When you do citations don't try to apply the Harvard guidelines at once. Now that you have numbered your research papers use numbered citations (e.g. [12]..). It is easy when documenting. If you have filled out the excel sheet I mentioned, then citing with Harvard is not difficult afterwards.

    Mind you, that this is what worked for me. It might not work for most, but have a go if you think it will be helpful.. :D

    March 29, 2011

    mid point

    a big milestone in my life is yet to come, that's the day I mark my official closure of the degree. This is only a sub milestone. The mid point submission is on next Tuesday. (same day we are starting the 2nd sem :( ) I was supposed to write about my FYP, actually update it because, I am not doing what I mentioned previously I was doing ;) I am doing the POS tagger I had mentioned before. It is very interesting and better than all my previous ideas. I love it. :D

    I'll blog on all my experience as soon as I finish my submission. Kind of busy now. Consider this as a preface till then. .

    January 3, 2011


    the final domain...

    ah..just got to know that the PPF submission has been postponed to next week..yipee...:D

    Lets talk about this domain...My first idea was to create a translator. My supervisor explained to me that it is almost impossible to do such a huge thing in 8 months. I would have to identify my scope. A very limited scope. According to him, doing the translation (Sinhala to English) itself is  a huge thing and therefore I don't have to worry about making scope wider. He said its very risky to do this in 8 months still and that is really huge work for FYP. He gave me an idea to develop a Sinhala spell checker instead. he said it would give me a fall back option.

    and then vacation started..and i was asked to research about the domain. I researched and sent him a mail but he did not reply me. So I was blank on what to finalize on. Finally just the day vacation ended he sent me a msg in reply asking to meet him the following day. During that time I made a post in the Sinhala Unicode group about my FYP. Dr. Ruvan Weerasinghe sent me a mail after seeing my post mentioning some of the areas he thinks I could focus for FYP and one of it interested me. That's POS tagging. I researched a lot about it and made up my mind to do it. When I met my supervisor the following day, he told me to choose anyone I like and that both were ok. But he said he thinks I better do the spell checker. And I mentioned to him about my new interest in POS tagging. And he said that it is not gonna work out. It's too wide and if I limit it to a domain, that means I am being unfair to it..and.."Dr.Ruvan told you about it because that has been his research area for past 10 years" :P phew..that was a narrow escape..

    Translation has a small problem. It means I need to find a domain as well as a POS tagged word list. And if I could not get the word list from UCSC I will had to do it myself, which means that's 2 projects. :( I don't think I can handle it..So am back to the spell checker.I think doing a browser plug-in would be more worth than for an editor. But am still working on it..good luck to me..:D
    Later :

    I decided to do my FYP on POS Tagger and the scope was achievable. I got help from LTRL and Dr. Ruvan. I finally finished the FYP and gained a 1st class :)

    fyp...ready.. am gonna write about my FYP progress. I am horrified at the thought that I need to submit the PPF tomorrow.. :( I still don't now what to do...

    I'll start from the beginning.

    I didn't want to continue with my 3rd year because I was null with most of stuff. But, then again  I changed my mind. And most of all, if I take a semester break APIIT will increase fees :P..don't have to pay them more than they deserve.. then I started my mission FYP. Since long time back, I wanted to do something related to databases. However, after some googling I had changed my course to data mining without even knowing. So I went through the projects in the library as well. I wanted to do a "Stock price predicting system". So I went to meet the lecturer who had been doing databases for our batch through out the degree program. He and also PM told me it's a highly uncertain domain. They asked me to either choose a different domain. But since I am not creative, I couldn't find another domain..I gave up data mining.

    My next idea was an "offline signature verification system" . PM said this was ok unless it's different from what is already done at APIIT. So i made it a ANN system. And that was stupid. PM said "you can't say it's NN. You have to research.Perhaps there are many other better ways to approach" i started to have my dilemma or not to do..and although I did research alot about image processing, I felt that it wasn't really my thing.

    Then...the 3rd topic. I was helpless. I didn't want to mess up with image processing or audio. Something people will use.  My to-be-supervisor gave me an idea to develop a speech to formula generator. It seemed ok. And i was told I could use the Java Speech API. I had a inclass exam at uni and had to go to that and i missed a FYP session. Unfortunately, that day someone else has got approval to it. My to-be-supervisor had mentioned it to him and asked him to check with me and he didnt. so i lost that topic as well..

    And then....a report generator plugin for PHP in eclipse ;), a text summarizer, a audio summarizer, english to sinhala subtitle generator...bang!!! i hit my final domain. Natural Language Processing in Sinhala..this redirected me to my to-be-supervisor. He was very interested and had even talked with PM and taken me for his supervisee even before I had finalised my topic..And have to say.., I have not yet even finalised it..Lets talk about it in the next post..