Digital Genetics

Here's a strange one, I've been working on a population simulator, and I had a thought regarding parent-to-child gene transfer. The genes are currently stored in boolean arrays (8 in size). When the parents make the child, I'm considering using a simple xor of the two parents in order to make the child's gene makeup. Is there anything obviously wrong (or right) with this?
Here's an illustration:
Parent 1 : 10011011
Parent 2 : 00101001
Child 1^2: 10110010

EDIT: 3 minutes of burying my brain in punnett squares have convinced me I'm a moron. I'm now looking into the logic behind ACTUAL gene inheritance. The genetic strands will be organized in 8 base pairs (so bool array[16] arranged in a 1,2,1,2 fashion) and punnett squares will determine the possibilities. A random number will then choose which possibility gets passed on. Any obvious flaws there?
Last edited on
That depends on what you are trying to accomplish? There must be dozens of ways to do it. One way is to swap the high and low bits of each, creating two children. You could then randomly kill one of them if you want only one child per parent pair. You could also use a function to genetically "rate" all the organisms in your sim and the pair the best with the best. (Used in some AI techniques). You could also have another sim using the same function, and but par the best with the worst and see where it takes you. These are just a few examples I can think of off the top of my head. Hope it helps.
closed account (N36fSL3A)
Well that means if 2 parents mate 2 times they'd have to identical children. Wouldn't you also use random numbers for this?
Fredbill30 makes a good point. You need some randomness, maybe use a larger array say 32. Then for each group of 8 randomly pick an operation (and, or, xor) to perform on that section.
Children should be similar to their parents
`xor' would be a really bad operation in that case.

To simulate mutations, have a probability (low) of bit flip.

Not sure if I understand your edit, ¿may post some code?
Why don't you use a genetic algorithm? Specifically you want to implement the crossover and mutation operators.

Crossover:
1. Take the first chromosome of the first parent
2. With a random chance (of, for example, 70%), generate a random position and swap everything from the current chromosome after the position with the corresponding chromosome in the other parent
3. Repeat for the next chromosome until all chromosomes have been processed

Pseudo-code
1
2
3
4
5
for i in [0, parent[0].chromosomes.count)
    if random_chance(0.7)
        position = random([0, parent[0].chromosomes[i].length))
        for j in [position, parent[0].chromosomes[i].length)
            swap(parent[0].chromosomes[i][j], parent[1].chromosomes[i][j]);


Mutation:
1. Take the first bit in the DNA of the child (from crossover)
2. With a random chance (of, for example, 0.1%), flip the bit
3. Repeat for the next bit until all bits have been processed

Pseudo-code
1
2
3
4
for i in [0, parent[0].chromosomes.count)
    for j in [0, parent[0],chromosomes[i].length)
        if random_chance(0.001)
            parent[0],chromosomes[i][j] = !(parent[0],chromosomes[i][j])


This produces totally new individuals derived from the parents with little chance of collision. It usually creates as many offspring as parents, but you could do it repeatedly to generate more offspring or just discard some if you only want one. You could have a random chance of twins (identical or otherwise) and such.
Last edited on
Here's my idea.
Assuming Parent 1 has : 11|00|01|10|11|10|01|10
And that Parent #2 has: 01|10|01|11|10|01|01|01
The first 2 digits of each strand are a base pair, 11 could be understood as DD (dominant-dominant) whereas 01 could mean dD (recessive-dominant).
Therefore, using a punnett square as follows:
1
2
3
 |1  1
0|10 10
1|11 11

One can determine that the child has a fifty-fifty chance of being dominant-dominant or dominant recessive. The coin flip would be decided by a random number generator.
I have no code for this yet but I'm working on it. I'm just wondering if there's a serious flaw (the your-entire-project-is-derpy kind).

EDIT: Just saw Chrisname's post, would the punnett squares take care of the crossover? and mutation is a fantastic idea, may I steal it?
Last edited on
sargon94 wrote:
would the punnett squares take care of the crossover

No, cross-over involves swapping, your way is just combining (also, no modern geneticist worth his salt uses Punnett squares for anything serious). Also, base pairs are not "dominant/recessive", they're A, C, G, or T/U (T for DNA and U for RNA). Alleles are either dominant or recessive depending on whether they produce a working protein or an inert one (genetic disorders usually result in the lack of a working protein and so are recessive (e.g. haemophilia is the lack of working FVIII protein) but can also result in the production of a toxic protein which makes them dominant; also recessive genes can be harmless, like the genes for blue eye colour which results from a lack of melanin production in the iris).

mutation is a fantastic idea, may I steal it?

I don't think the abstract concept known only as Biology will mind.

If you're interested, I just wrote half of a first draft (so it's really a 0.5th draft) of a genetic algorithm tutorial which I'm posting here. It'll be up before long, and eventually it will contain a bunch of crossover and mutation algorithms you can use (pseudocode implementations only because I want to leave them as exercises).
closed account (N36fSL3A)
Don't you guys think the percentage would be higher, maybe .5? I believe that's way to low of a change of mutation.

Hence I BELIEVE.
I don't think the abstract concept known only as Biology will mind.

Touche

And you're thinking no punnett squares eh? Fair enough, I'm familiar (passively) with the A, C, G, T system for DNA, the trouble is: I have no idea what implications AGTC or AGTT could have for example. I suppose I could make it up but if i'm just doing that the punnett square system seems easier. That being said, I am very interested in your tutorial, I'll keep an eye out for it.

How do you think I could go about chromosome swapping? From what I understand all of them have their own DNA inside them, so I'de be swapping only part of it out for the other. The change would remain small since the other chromosome is corresponding and therefore very similar. Does that sound about right? And do you know what ACGT/U could actually translate into in-game? (i.e. have scientists mapped our genetic code to this extent?)

@Fredbill30: the way I have it mind now, 23 chromosomes with say... 64 base pairs each makes 1472 bits, which with a .1% rate makes 1.5 flips per child (average). Seems reasonable.
I've seen genetic algorithm, which did use [url=http://en.wikipedia.org/wiki/Gray_code]Gray code[/url].


DNA has alphabet size 4 (the bases). Proteins have alphabet size 20 (the amino acids). Three bases in DNA make a [url=http://en.wikipedia.org/wiki/Codon]codon[/url]. Each codon is translated to some amino acid. A single point mutation in DNA may or may not change the amino acid.

Amino acid composition of a protein dictates its shape. Shape determines what kind of interactions one molecule has with other molecules.


DNA is double-stranded, so if you have C in one strand, then the other has G. Mutate C to T, and the other strand must mutate from G to A. Ratio of different base types in DNA has physical effects (on DNA) http://en.wikipedia.org/wiki/GC-content
@sargon94
Each triplet of bases (codon) is used to refer to an amino acid for protein synthesis. It's quite complicated. DNA is double stranded. To synthesise a protein, first the DNA is unzipped (the strands come apart at the start of the gene that describes the protein). One of the two strands is called a "template strand" and the other is a "coding strand": the coding strand actually describes the structure of the protein (the sequence of amino acids, represented by codons (triplets of nucleotides)) and the template strand is simply the opposite of the coding strand. The enzyme RNA polymerase reads the template strand (starting at a START codon) and creates a"messenger" RNA (mRNA) strand which is opposite to it (ending at a STOP codon), and therefore equivalent to the coding strand (it's not quite identical because it's RNA instead of DNA: RNA has to be used because DNA is too large to leave the nucleus of a cell; the main difference is that RNA uses a molecule called uracil (U) instead of thymine (T)). The mRNA strand leaves the nucleus of the cell and binds to something called a ribosome which reads the mRNA strand. For each triplet of bases on the mRNA, it finds a tRNA (transfer RNA) molecule which is complementary (has opposite nucleotides). The tRNA comes bound to an amino acid, for example, leucine binds with AAU and is therefore represented by UUA in mRNA (or TTU in DNA). The amino acid forms a peptide bond with the previous amino acid (if there is one), and then the tRNA of the previous amino acid unbinds. Eventually you have a chain of amino acids (a peptide chain or polypeptide, also known as primary structure). After the ribosome is finished, the chain folds up (called secondary structure in two dimensions or tertiary structure in three) forming a protein. Some proteins, such as collagen, are formed from multiple polypeptide chains.

NB: The terms "nucleotide" and "base" are used interchangeably; all nucleotides are bases although not all bases are nucleotides.

How do you think I could go about chromosome swapping? From what I understand all of them have their own DNA inside them, so I'de be swapping only part of it out for the other

In real life, chromosomes have a shape a bit like an X. One of the ways they cross over is by coiling their legs around each other. I don't know what happens next but they basically swap genes. In genetic algorithms it's a lot simpler. One way to do it is to pick a single point on the chromosome and swap every bit after that point with the other chromosome. You should probably do it that way.

23 chromosomes with say... 64 base pairs each

In biology, a chromosome has about 0.1-3.75 Mbp (mega (i.e. million) base pairs). At most, you could fit four base pairs in a byte (since they need two bits each) so that would be at least 0.25 * 4 Mbp * 23 = ~21.9 MiB for a human. So if you want to simulate lots of people you probably should simplify it a little if you want to run on something other than a supercomputer. Or you could use a simple bacteria instead, since some of them have a genome of only about 30 kiB.

Here's something interesting (maybe): since gametes are haploid cells, they have about 11 MiB of data and sex therefore represents a transfer of 11 MiB * 200,000,000 = ~2 petaBytes of data for the sake of 11 MiB, which is an efficiency of about 0.000 000 5%. That's like uploading your entire hard drive a million times just to send someone a single JPG image.
Ok, I've begun work programming and I'm leaning towards having 23 chromosomes, each with 16 base pairs. 12 to 14 of those pairs in each chromosome will be vital (i.e. excessive mutation to them will result in a dead/infertile result), which leaves plenty of pairs to mess with. Since my current birth/death ration is 4 to 1, i'll be adding predispositions to cancer/pneumonia/generic-virulent-diseases to keep the population from exploding. Large groups will attract infections and facilitate the spread of diseases (cesspools essentially). Other genes that I have in mind will be ADHD (They move faster and much more incoherently, this will be a 00 phenotype (recessive-recessive only), Deceptive (People without the deceptive gene will tend to gravitate towards them, this will be 11, 10, 01 (dominant allele)), and resistant (to certain infections/disease, they will come in various degrees and may be dominant or recessive).

I'll be replacing A,C,G,T/U with a simplified binary system.

Most infections will begin with two recessive alleles coding for it coming together, they will spread from the unlucky child.

That's what I'm going with so far. And I'll be using chromosome swapping like you suggested (thanks for that, I have good hope for this).
Rather than having all these rules about death/birth ratio and such, why not try to program it in such a way that if the population gets too big, it will become unsustainable and suffer huge losses like in real life? Your program will be much more interesting if it has emergent behaviours like these that weren't actually specifically programmed in but just occur due to simple rules.

A life simulator like this is a project I've wanted to do for a long time, but I can never decide on the architecture. I want to do one that simulates insects with simple brains (neural networks) and a genetic algorithm but I always start writing it and then scrap it because it's not perfect.
Rather than having all these rules about death/birth ratio and such, why not try to program it in such a way that if the population gets too big, it will become unsustainable and suffer huge losses like in real life?


I think I gave you the wrong idea. The birth/death ratio is not hardcoded, it is the result of generations of people being born, running around making genetic copies (by running into other people), then eventually dying. The entire process is random. The way they move about is random, and since hitting others is what creates copies, the amount of copies they create is also random. I'm currently trying to figure out a "natural" way create the population collapses. Any ideas? I'm currently thinking plague (historically, it's the most common).
Well, the most common things to keep populations in check are predation, limited food and disease. If you keep the amount of food available on the map constant, and include logic for organisms to starve, then that will make the population plateau. Also, if you have logic for diseases to spread with the probability of spreading disease higher when organisms are closer together (as is the case in real life), then the more organisms there are, the more disease will spread, since adding more organisms to an area of constant size will make them have to get closer together. And then of course you would probably want to implement death due to ageing (perhaps increase the probability of contracting disease and decrease the amount of time an organism can go without food). Lastly, you might also want to do random natural disasters that kill a large portion of the population whenever it approaches a certain maximum.
Last edited on
Hmm, I'm really liking the idea of constant and centralized food sources, I can have my guys wander aimlessly then bee-line for food when they get down to half a tank or something. I'm thinking 2 food groupings on opposite sides (regenerating at a constant rate). The opposite sides opens up the possibility for conflict later on (seeing as we are our own predator). I'm thinking as they age, they will still be able to eat the same amount but it will be converted to energy less efficiently, they will also move more slowly. Also, being down to 1/3 food cuts their available energy in half. Energy is used to reproduce, move, and resist disease or infection. Running out of food wouldn't necessarily kill them but they would be very vulnerable to everything (and very slow), until it hits a critical point and offs them. Not sure about the natural disasters yet, I need to see how efficient limited food keeps the population down.
Topic archived. No new replies allowed.