-
Notifications
You must be signed in to change notification settings - Fork 0
/
conclusions.tex
27 lines (14 loc) · 6.88 KB
/
conclusions.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
\chapter{Concluding Remarks}
First of all, before talking about what I thought about the overall progress of this project and what I think of the English to Kinyarwanda translation model that I current have, I think it is important that I first talk about some important decisions that I have had to make when working on this project.
\begin{itemize}
\item \textbf{Why English to Kinyarwanda?} Why not Kinyarwanda to English? The reason why I chose to work on an English to Kinyarwanda translation model and not the other way around is because of need. Most Rwandans who would want to use a Translator would most likely use to translate content (books, articles, manuals instructions, etc.) from English to Kinyarwanda or from other popular languages to Kinyarwanda. I chose to work with English because it is one of the three official languages in Rwanda, the others being French and Kinyarwanda. English is also the language used in education, from primary schools to universities\cite{Samuelson2010}. As for the reason I chose to work on an English to Kinyarwanda translator, and not the other way around, that can be explained by the fact that Rwanda had been an oral society for most of its existence, and there aren't so many writings in Kinyarwanda that one would possibly want to translate to another language\cite[p. 59]{adekunle2007culture}. Even the majority of the ones that exist aren't digitalized or freely-available online, which makes using them or simply accessing even more challenging. %Therefore, out of pragmatic reasons, I chose to work on the English to Kinyarwanda model because, if it was to be developed into a stable/decent application, it is likely to be used than its Kinyarwanda to English counterpart.
\clearpage
\item \textbf{Why the Bible and the Constitution?} I chose to use the Bible as my training data because it is the only large body of text that has both English and Kinyarwanda translations that I could find online. I used the Rwandan constitution for tuning as it is an official document that has good-quality content for both Kinyarwanda and English. I also used it because, as a non-religious text, it offers a good variation to the content from the Bible that I had used in the training process.
\item \textbf{What is the main challenge that I faced?} The main challenge that I faced is data sparsity. Kinyarwanda is not a popular language by any metric. Even though it is estimated that Kinyarwanda is spoken by around 20 million people, the majority of them live in Rwanda and its neighboring countries (Burundi, Uganda, Tanzania, and Democratic Republic of the Congo), and it isn't used elsewhere in the World\cite{kayigema2010loanword}. That and the fact that Rwanda used to be an oral society and still is to some degree, guarantees that getting Kinyarwanda data or Kinyarwanda software is almost impossible. According to Samuelson \etal, despite its massive use in Rwanda and neighboring regions, ``mass literacy in Kinyarwanda remains weak''\cite{Samuelson2010}. One of the consequences that I faced when working on my project is that Moses, the SMT system that I used didn't even have a Kinyarwanda \textit{tokenizer}. I had no choice but to use the default (English) one and hope for the best!
\item \textbf{Why start with statistical machine translation?} I initially chose to use SMT as opposed to RBMT because that SMT systems are easy to work with and are easily scalable compared to rule-based translation models\cite{och2005statistical}. As my end goal for this project is to turn it into an open-source system that other people can contribute to, scalability was very important when taking this decision. I thought that implementing the translator using an SMT system would make it possible and easier for people, with different levels of technical expertise and Kinyarwanda proficiency, to contribute to the project either by providing more training data (in forms of translated texts), correcting translations generated by the translator, or writing code for translation tools specialized for Kinyarwanda.
\item \textbf{Why Moses?} As I did not have any previous experience working with SMT systems, this wasn't an easy choice. I had to try all the popular SMT systems and decoders and decide which one to use. I tried Joshua, Cdec, and Moses. Unfortunately, Joshua and Cdec ran into compiling issues on my computer, and despite my multiple attempts, I couldn't get them to work past the tutorial stages found on their websites. With that, I had only one choice left: Moses. However, even though I had no other choice when I started using it, through the process of using Moses to develop my English to Kinyarwanda translation model, I have come to appreciate the fact that it has a lots of documentation and supporting documents that are easily-accessible and freely-available online.
\item \textbf{What is next?} As I have unsuccessfully tried to gather enough Kinyarwanda | English translation data on my own, my plan is to make this project open-source by making all the source and data files publicly available on the project website\footnote{\protect\url{http://kinyarwanda.online/}}. That way anyone working on a similar project can access it and contribute to it at the same time. I will also continue to work on this project in my free time and I intend to use all means in my disposition (social media, project website, presentations, etc) to ask more people to contribute to small-scale Kinyarwanda translation projects like this one and to large-scale translation initiatives such as \textit{Google Translate}.
%For now, my next goal is to try to get as much Kinyarwanda data as possible. As I already mentioned, it is not that difficult to get English data. The real challenge is getting Kinyarwanda data in right format (\texttt{.txt} file, line by line sentences, character count not too small or too large, etc.). However, what is even more difficult is getting Kinyarwanda data that has correct, human-proofread, line by line, equivalent English translation. Regarding this, my plan going forward is to set up a website that will crowd-source translations by generating English text line by line and asking users to translate it. Hopefully, I will get a decent amount of data this way. In addition to this, I intend to keep looking for and collecting English $|$ Kinyarwanda data and adding it to my training corpus, that way I hope to keep improving my model.
\end{itemize}
To sum up, having asked and answered these important questions, all I have to add is that despite all the challenges I have faced when working on this project, I have more determination now more then ever. I am more determined because I have learned a lot by working on this project, and the experience in itself has been more rewarding than I ever could have hoped for when I embarked on this journey.
%and because I know that as long I keep working on this project, the quality of my translation model results will keep improving.