Author: Jack Robbins
This project uses a determinstic finite automaton(DFA) to recognize strings that are in a regular language of valid email addresses
The Language
-
$\Psi =$ {a, b, c,. . .,z} being the set of all lower case roman letters -
$\Pi =$ {.} -
$\Phi =$ {@}
In short, the alphabet for this language is all lowercase roman numerals, the "." and the "@" symbol.
From here, we define the language
-
$S_{1} = \Psi\Psi^\star$ , which defines strings of lowercase roman numerals of length at least 1 -
$S_{2} = \Pi\Psi\Psi^\star$ , which defines strings like$S_{1}$ , except that they begin with a dot -
$S_{3} =$ {.gov}, one possible valid ending of email addresses in$L$ -
$S_{4} =$ {.gr}, the other possible valid ending of email addresses in$L$
To make this more grounded, two examples of strings that are defined over the Language L are: "abc@learner.gr" or "a.b.c.d@d.c.b.a.gov". Notice how every "." is followed by at least one letter, and there is only one "@" symbol. Strings like "my@website@example.gov" would not be accepted, even though this string ends properly in a ".gov", the use of two "@" symbols makes it incorrect. Strings like "abc@abc.com" are easier to identify as not being in the language, because stings in
By definition, a langauge is regular if there exist a Deterministic Finite Automaton that accepts every string in the language and at the same time rejects every string not in the language. As we will soon see, there is a DFA
Formally, the DFA that recognizes the language
-
$Q =$ {$q_{1}, q_{2}, q_{3}, q_{4}, q_{5}, q_{6}, q_{7}, q_{8}, q_{9}, q_{10}$ } is the set of all states in$M$ -
$\Sigma = \Psi \cup \Pi \cup \Phi$ is the alphabet that$L$ is defined over, as described above -
$\delta: Q \times \Sigma \rightarrow Q$ is the transition function, which is defined as:Q $\Psi_{-g, r, o, v}$ g r o v $\Pi$ $\Phi$ $q_{1}$ $q_{2}$ $q_{2}$ $q_{2}$ $q_{2}$ $q_{2}$ $q_{10}$ $q_{10}$ $q_{2}$ $q_{2}$ $q_{2}$ $q_{2}$ $q_{2}$ $q_{2}$ $q_{1}$ $q_{3}$ $q_{3}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{10}$ $q_{10}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{5}$ $q_{10}$ $q_{5}$ $q_{4}$ $q_{6}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{10}$ $q_{10}$ $q_{6}$ $q_{4}$ $q_{4}$ $q_{7}$ $q_{8}$ $q_{4}$ $q_{5}$ $q_{10}$ $q_{7}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{5}$ $q_{10}$ $q_{8}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{9}$ $q_{5}$ $q_{10}$ $q_{9}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{4}$ $q_{5}$ $q_{10}$ $q_{10}$ $q_{10}$ $q_{10}$ $q_{10}$ $q_{10}$ $q_{10}$ $q_{10}$ $q_{10}$ -
$q_{1}$ is the starting state -
$F =$ {$q_{7}$ ,$q_{9}$ } is the set of accepting states
Note
Once the DFA goes into state
Though useful for rigorously defining the DFA
Here is an example of how the DFA would process the string "a.bc@ex.gov"
- Begin in start state
$q_{1}$ - Read symbol a
$\in \Psi$ , move to$q_{2}$ - Read symbol "."
$\in \Pi$ , move to$q_{1}$ - Read symbol b
$\in \Psi$ , move to$q_{2}$ - Read symbol c
$\in \Psi$ , move to$q_{2}$ - Read symbol "@"
$\in \Phi$ , move to$q_{3}$ - Read symbol e
$\in \Psi$ , move to$q_{4}$ - Read symbol x
$\in \Psi$ , move to$q_{4}$ - Read symbol "."
$\in \Pi$ , move to$q_{5}$ - Read symbol g, move to
$q_{6}$ - Read symbol o, move to
$q_{8}$ - Read symbol v, move to
$q_{9}$
The entire string has been processed, and
To demonstrate the functionality of these concepts, the program dfa.cpp implements ./"Path-to-compiled-file" < test_cases.txt
. In case you want to see the output without running the program, the file tests_output.txt contains the output of all of these tests being run.