Genomes and object granularity II

Genomes
As I was finishing the previous part of these series on computational genomics I couldn’t help myself to engage in an actual implementation of a genetic algorithm. But how can one start? Well I tell you… by doing it

If you studied operational research the term genetic algorithm shouldn’t be a novelty at all. It started as a theory for solving optimisation problems based on an approach (= heuristic) that tries to simulate the process of natural selection. Today it is applied in many fields ranging from computational science to economics.

TL;DR? Here’s a code treat for you.

We need to go deeper

Melanie Mitchell makes it easy for anyone trying to grasp the basics of genetic algorithms:

  • Start with a randomly generated population of n l-bit chromosomes (candidate solutions to a problem)
  • Calculate the fitness f(x) of each chromosomes x in the population.
  • Repeat the following steps until n offspring have been created:
    • Select a pair of parent chromosomes from the current population, the probability of selection being an increasing function of fitness. Selection is done ” with replacement”, meaning that the same chromosome can be selected more than once to be come a parent.
    • With probability pc (the “crossover probability or “crossover rate”), cross over the pair at randomly chosen point (chosen with uniform probability) to form two offspring. If no crossover takes place, form two offspring that are exact copies of their parents.
    • Mutate the two offspring at each locus with probability pm (the mutation probability of mutation rate), and place the resulting chromosomes in the new population. If n is odd, one new population member can be discarded at random.
  • Replace the current population with the new population.
  • Back to step 2.

So a basic genetic algorithm seems to have 3 coupled traits: selection, crossover, mutation. Is this enough? Well.. yes to keep it simple of course. We still don’t have a problem to solve though, biological cells have been evolving for some time now (say almost 4 billion years!) and they are the true definition of problem solving, just imagine the smallest chemical entity confined in its tiny environment trying to protect itself from extinction. It adapted into huge complexity organisms by using “similar” techniques to spear it’s way through the world.

Real world problems

the_struggle_is_real

Having a problem to solve is obviously important as it will fill in the context gaps on what is “fitness” and “population” and all other entities involved in a genetic algorithm. I wanted to pick something hard, something meaningful, and oh boy did I? How about a NP-hard problem from the same family of protein folding? At least something understandable and easy to put in words. And voilá the Travelling Salesman Problem (TSP).

The TSP was formulated around 1930 and has been ever since very effective in provoking migraines and insomnia for mathematical optimisation enthusiasts. There should be a prescription saying it’s specially ill advised to think about it when conducting any type of heavy machinery.

But how to explain what a NP problem is? For lack of better explanation I now invite you dear reader to watch the following entertaining clip which showcases the optimisation/complexity zoo:

So we got our problem (yey), now what? Now we need to tell the computer how to doo eet! It’s been fruitful already for me to scratch the surface of genetics, why not keeping it going? I was hoping for months for a nice problem to learn Rust. Why? Because Rust makes bold promises for systems development, of the likes nobody has dared to make since the very foundation of the C party it grows from. However I’ll leave up to you to discover why many are excited about it, there are tons of blog posts about it.

Let the games begin!

Starting on coding with a new language while still unsure of the conceptual model was something that itched me immediately. By conceptual model I mean the application of the concept of genetic algorithm to an optimisation problem such as the TSP. As such before actually thinking about the overall source organisation I started by first understanding Rust development and playing around with it. I started laying down the objects required such as GA, population, city and tour. After spending a few (good) hours wrestling the compiler I started to realise that I was seeing the problem the other way around.

My problem is not the GA to solve the TSP, as far as the GA goes all it cares is that whatever uses it follows certain traits, GA is the interface not the object (Duh!). Worse even, I had a singleton graph – “Stop this madness” – I said to myself. But why did I do this? Well first I was just starting with Rust; second, when hacking something together there’s this rewarding feeling of an “aha moment” when you realise a way better way of doing it, and so it happened. This concept is loooong known, specially when hacking something around. (PS: find out how/why here)

My second attempt was way better, but not yet perfect I confess. You can find the result here.

But wait… there’s more!

If you opened the repo link already you might have noticed there is more than just Rust in the repo. There’s also NodeJS! Yep, I wanted to (learn) make my little command line app exposed to the glorious world wide web and I did! It’s a simple access layer which uses hypermedia’ish as the engine application state. I say “‘ish” as JSON is not exactly a recognised hypermedia format.

At some point I was also playing with MongoDB as a persistence layer but I settled for a simple caching mechanism with node-cache.

Looking back

This journey started with genomes and finished with REST api’s and solving (yet again!) a travelling salesman problem in Rust. All I have to say is that it has been fun. Surely it’s not the best piece of code out there but it features neat constructs which I invite you to take a look:

  • Rust object JSON serialization/deserialization
#[derive(RustcEncodable, RustcDecodable)]
struct Input {
	graph_type: String,
	graph: Graph,
	options: InputOptions,
}

#[derive(RustcEncodable, RustcDecodable)]
struct InputOptions {
	mutation_rate: f64,
	elitism: bool,
	population_size: usize,
	tournament_size: usize,
}

impl ToJson for Input {
    fn to_json(&self) -> Json {
        let mut d = BTreeMap::new();
        d.insert("graph_type".to_string(), 
                self.graph_type.to_json());
        d.insert("graph".to_string(), 
                self.graph.to_json());
        d.insert("options".to_string(), 
                self.options.to_json());
        Json::Object(d)
    }
}

impl ToJson for InputOptions {
    fn to_json(&self) -> Json {
        let mut d = BTreeMap::new();
        d.insert("mutation_rate".to_string(), 
                self.mutation_rate.to_json());
        d.insert("elitism".to_string(), 
                self.elitism.to_json());
        d.insert("population_size".to_string(), 
                self.population_size.to_json());
        d.insert("tournament_size".to_string(), 
                self.tournament_size.to_json());
        Json::Object(d)
    }
}

Rust/NodeJS communication

#[no_mangle]
pub extern "C" fn compute_adapter(input: *const c_char) -> *const c_char {
    let c_input = unsafe { CStr::from_ptr(input).to_bytes() };
    match str::from_utf8(c_input) {
        Ok(input) => compute(input),
        Err(e) => CString::new(e.to_string()).unwrap().as_ptr(),
    }
}

// Actual handling
fn compute(input: &str) -> *const c_char {
//...

Builder design pattern

impl TourBuilder {

    /// Constructor for an empty population
    pub fn new() -> TourBuilder {
        TourBuilder {
            tour: Vec::new(),
            fitness: 0.0,
	}
    }

    /// Constructor for an empty population with allocated capacity
    pub fn generate_empty_with_size(&mut self, tour_size: usize) 
        -> &mut TourBuilder {
        self.tour = Vec::with_capacity(tour_size);
        self
    }

    /// Constructor for generating a random tour
    pub fn generate_random_tour(&mut self, mut graph: Graph) 
        -> &mut TourBuilder {
        self.tour = graph.get_map();
        thread_rng().shuffle(&mut self.tour);
        self
    }

    /// Terminates construction and returns instance
    pub fn finalize(&self) -> Tour {
        Tour { 
        	tour: self.tour.clone(),
        	fitness: self.fitness,
        }
    }
}

REST HATEOAS

function api_root_get(req, res, next) {

    var api_root = new hal.Resource({
                    name: pjson.name,
		    version: pjson.version,
		    repository: pjson.repository,
		    cacheable: true
    }, '/'+pjson.name);

    api_root.link('compute', '/'+pjson.name+'/compute');

    res.send(api_root);
    return next();
}

Trait interfacing

/// Genetic algorithm interface definition
trait GA {
    fn tournament_selection(&mut self) -> Tour;
    fn crossover(&self, parent_1: Tour, parent_2: Tour) -> Tour;
    fn mutate(&self, tour: Tour) -> Tour;
}

/// Travelling salesman problem structure definition
pub struct TSP {
    routes: Population,
    cities: Graph,
    tournament_size: usize,
    mutation_rate: f64,
    _elitism: bool,
}

The project could be adapted to support other algorithms (Simulated annealing anyone?), more graph types or even adding weights/gains to each node in the graphs making it even more interesting.

Before I finish I’d just like to highlight a few points on Rust:

  • It’s not an easy language to master Though I spent a decent amount of time learning Rust I am still far from mastering every concept. Ownership is specially important!

  • It’s not easy to find documentation, but… Well I was happy enough to use the stable release version from May 2015 and many things have changed, many repos use old deprecated functions and there aren’t many great references to learn from, but the community is very good and helpful. Everyone really enjoys the language and provides productive feedback. It can only get better.

  • The classless (OO’ish) design tends to be clean Languages such as Go and Rust tend to focus on discipline through enforced compilation rules and making design to an interface and not an implementation. It is still possible to write spaguetti code but at least it’s clearer when you do so right from the start. It’s not so fun when you just want to hack something together though…

  • Modules… modules everywhere! If you pick up Rust, one of your first questions will be: “Ok, what’s the best way to organize my code?”. Guess what? It has rules that you must follow religiously. In one hand it kind of promotes more structured granularity and code organization. In the other… well… it’s bluntly annoying and it feels a bit like an overkill. You can’t really have your own preference on this matter, again… it doesn’t really promote hacking something together.

References:

1 – An Introduction to Genetic Algorithms (Complex Adaptive Systems)

2 – The evolution of cells by Wikipedia

3 – Travelling salesman problem by Wikipedia

4 – Rust language book

5 – The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition (2nd Edition)

Repo:

1 – TSP solved with genetic algorithm written in Rust

Matjaz Trcek
Matjaz Trcek
SRE @ Magnolia CMS

Working as an SRE in Magnolia CMS. In my free time I work on many side projects some of which are covered in this blog.