"Random-Greedy" is the short name for this project, aiming to generate and compare low-rate deletion correcting codes.
This short name is also the name of out main artifact, a simple algorithm that provides better deletion correcting
abilities, compared to other codes we checked. In the project, eight methodologies for generating codes (some of them
with meta-parameters) were created, and compared for their deletion correcting abilities (so no insertions nor
substitutions are allowed). The codes are low-rate, so each codeword has
This work is submitted as the final project in the course "Coding and Algorithms for Memories" (236379), at Taub Faculty
of Computer Science, Technion - Israel Institute of Technology. The project was written by Orel Adivi
(orel.adivi [at] cs.technion.ac.il)
and Daniel Noor (daniel.noor [at] cs.technion.ac.il)
, and under the supervision
of Daniella Bar-Lev and associate professor Eitan Yaakobi. The work was done in semester winter 2022-2023. The project
is released under MIT license.
In order to generate the codes and use them, an installation of
CPython 3.10
is required (the implementation is platform
independent, and was tested on both Windows and Linux).
The project uses NumPy
, Matplotlib
, overrides
, pylcs
, and tqdm
Python libraries, which can be installed using
Pip
package installer:
python -m pip install -r requirements.txt
Then, the codes can be imported, generated and used. For example, for generating RandomGreedyCode
of length 100 and
with 2 options for each selection point, and using it, the following Python
code can be used:
import numpy as np
from codes.RandomGreedyCode import RandomGreedyCode
my_code = RandomGreedyCode(length=100, options=2) # generate the code
print(f'Codeword for the value 0:\t{my_code.encode(24)}') # encode a value
print(f'Value for a codeword:\t{my_code.decode(np.zeros((100,)))}') # decode a codeword
print(f'Maximal number of deletions:\t{my_code.max_deletions()}') # calculate deletion-distance
Other codes can be used similarly, as they share the interface defined in the
Code
class.
In this project, eight code classes are available:
-
RandomCode
– this class generates code by selecting serially random codewords that were not previously chosen. -
GreedyCode
– this class generates code selecting codewords to be as different as possible, by flipping bits during a serial traversal of the previously - generated codewords.
-
LinSpaceCode
– this class generates code by repeating a pattern of alternating$0$ -s and$1$ -s with a length that increases linearly. -
LogSpaceCode
– this class generates code by repeating a pattern of alternatingpattern_false
-s ($0$ is the default) andpattern_true
-s ($1$ is the default) with a length that increases linearly. -
RepetitionCode
– This class generates code by taking all words of length$log(n)$ for$n$ the codeword length, and repeating each letter$n/log(n)$ times, thus generating codewords of length$n$ . -
VTRepetitionCode
– this class generates code by taking all words from$VT_0(m)$ for a given parameterm
and multiplying each letter$n/m$ times. -
VTRepetitionNaryCode
– this class generates code by taking all words from$VT_0(m, q)$ for given parameterm
and a baseq
, converting the words to binary and multiplying each letter$n/m$ times. -
RandomGreedyCode
– this class generates code serially by generating a set of codeword candidates of sizeoptions
, and then selecting the candidate with the largest distance from the previous codewords.
All these classes are derived from the class
Code
, which allows the selection of codeword
length in the construction (with the parameter length
) and provides the methods encode
(for encoding a value),
decode
(for decoding a codeword), and max_deletions (for calculating the maximal number of
deletions allowed).
In the utils
folder, additional classes and functions,
used in the implementation of the different codes, are provided:
LevenshteinDistance.py
– this file contains a function that computes Levenshtein deletion distance between an expected string and a given string.LongestCommonSubsequence.py
– contains a function that computes the length of the longest common subsequence of given two strings (the function uses a dynamic programming algorithm, and was replaced with the implementation inpylcs
).VTCode.py
– contains a VT code generator, based on an implementation from aprevious work
.
In order to compare the different code generation methodologies, and to tune meta-parameters for the relevant codes, we
wrote a Python
script that runs these experiments, and another Python
script that generates the figures from these
experiments. For running the two files, the following script can be executed:
python Experiments.py
python GenerateFigures.py
Running the experiments is expected to last several hours, so we also run this script in a
GitHub action. Additionally, the results are
available in the artifacts
folder. Furthermore,
for testing the utils
folder files, we wrote a test file
that can be executed using the following command:
python -m unittest Unittests.py
In the execution of the first two files, the following figures are generated:
- Figure 1 – this figure shows the connection between the codeword length and maximal number of fixable deletions, for different codes.
- Figure 2 – this figure shows the connection between the codeword length and maximal number of fixable deletions, for codes that can not generate the desired number of codewords.
- Figure 3 – this figure shows the
connection between the codeword length and maximal number of fixable deletions, for
VTRepetitionNaryCode
in different bases. - Figure 4 – this figure shows the
connection between the codeword length and maximal number of fixable deletions, for
RandomGreedyCode
with different number of options to choose from in each iteration.
The figures are described and discussed in detailed in the project report PDF file.
The project was designed in accordance with the object-oriented programming (OOP) principles. For documentation, a website was created and a SUPPORT.md file was written. The project was written using PyCharm Professional and was managed using GitHub.
In order to ensure the correctness of commits sent to the GitHub server, a continuous integration pipeline was set. These checks are run automatically for each pull request and each push. The following actions were set:
- Run experiments - this action runs all the experiments and generates the related figures.
- Run unittests - this action runs all
the unittests for the
utils
folder files. - Style check - this action performs
basic checks of the
Python
files for detecting syntax errors. - Compile report - this action compiles the LaTeX file to the PDF report file.
- Website - this action updates the
Random-Greedy website with the content of this
README.md
file. - Vulnerability check - this
action help managing the
Python
code and its dependencies. - Dependency review - this
action help managing the
Python
code dependencies for pushes. - Dependabot - this action helps update the versions of the dependencies.
For the relevant actions, the checks were run in all the supported Python version CPython 3.10
and on both Windows
(Windows Server 2022) and Linux (Ubuntu 20.04).
Please feel free to contact us with any questions you have about this project.