17th March 2009
IntroductionChaos Game Representation is a
graphical representation of a sequence. It is a Method of converting a long
one dimensional sequence (e.g. English text or Genetic sequence) into a
graphical form. Chaos game representation (CGR) is a novel holistic approach that
provides a visual image of a DNA sequence quite different from the traditional
linear arrangement of nucleotides.Chaos Game Representation (CGR) can
recognize patterns in the nucleotide sequences, obtained from databases, of a
class of genes using the techniques of fractal structures and by considering
DNA sequences as strings composed of four units, G, A, T and C. Such recognition
of patterns relies on visual identification It is an
application of nonrandom input to an iterated function system. The method was
first proposed for DNA sequence, now it is being used for any arbitrary
symbols. CGR exhibits striking pattern characteristics of the genome. Since the
patterns are unique to genomes this can be used to identify genome fragments or
to detect horizontal gene transfers. In addition, Quantitative analysis of CGR
patterns can be used for sequence comparison both Alignmentfree and alignmentbased sequence
comparisons are possible with CGR. chaosIt is a word derived from the greek typically refers to unpredictability. In the metaphysical sense, it is the opposite of law and order. Chaos refers to an apparent lack of order in a system that nevertheless obeys particular laws or rules. Mathematically chaos means an aperiodic deterministic behavior which is very sensitive to its initial conditions. A system is called chaotic if it is impossible to make accurate long term predictions about the behavior of the system.According to Ron Leadbetter, “Chaos is from the Greek word Khaos, meaning "gaping void". There are many explanations as to who or what Chaos is, but most theories state that it was the void from which all things developed into a distinctive entity, or in which they existed in a confused and amorphous shape before they were separated into genera. In other words, Chaos is or was "nothingness." .Important properties of a chaotic system are,
chaos theoryChaos theory is the collection of results, methods, and visualization techniques used to study dynamical systems. The name "chaos theory" comes from the fact that the systems that the theory describes are apparently disordered, but chaos theory is really about finding the underlying order in apparently random data. Systems  no matter how complex  rely upon an underlying order, and that very simple or small systems and events can cause very complex behaviors or events. Chaos theory attempts to explain how chaos is produced and how can it be understood and controlled. Chaos Theory is a name given to recent wideranging attempts to uncover the statistical regularity hidden in processes that otherwise appears random, such as turbulence in fluids, weather patterns, predatorprey cycles, the spread of disease, and even the onset of war. Systems described as "chaotic" are extremely susceptible or at risk to changes in initial conditions. As a result, small uncertainties in measurement are magnified over time, making chaotic systems predictable in principle but unpredictable in practice. One of the first researchers in chaos theory was a meteorologist, Edward Lorenz. He was using a set of equations to model the weather. When he solved the system numerically, he found that a particle moving subject to atmospheric forces has a very complicated trajectory or path. He also found that any such a trajectory always approaches the same general pattern of an attractor. This pattern is butterfly shaped and is now known as the Lorenz attractor. The conclusion of these observations is that a very small change in initial conditions can produce unpredictable and sometimes drastic results by triggering a series of increasingly significant events. In theory, the flutter of a butterfly's wings in Australia could, for example, produce a snow storm in the Northeastern, thousands of miles away. Thus, nonlinear dynamics , chaotic dynamical systems or simply ” chaos” deals with the structure of complex curves known as fractals, complex geometric shapes, which can be defined by simple algorithms. I.e., a fractal is the result of a mathematically calculated equation which can be transferred into an image to be shown. A dynamical system is a set of functions (rules or equations) that specify how variables change over time. A mathematical model of a human system where numerical quantities are used to represent the state of the system is an example of a dynamical system. Usually, in a dynamical system, the changes of dynamical quantities after a short time period can be determined by combining the numerical quantities in a formula. The change over longer time periods can be determined by iterating the formula. Chaos theory deals with the behavior of certain non linear dynamical systems that exhibit the phenomenon known as chaos, and this phenomenon is highly sensitive to the initial conditions. Chaos theory tries to find the order in the systems, what appears to be completely random data. Even though, Chaos may look “randomlike” it may occurs in deterministic systems with no randomness. The idea behind is that it is possible to get completely random results from normal equations. Thus, Chaos theory covers the reverse also. Chaos theory constructs: · Phase Space: A conceptualization (often pictorial or geometric) of the possible states a system might take. It conveys that, at best, we see only a portion of “reality” at one time. And that will be always that part, on which we choose to focus. · Strange Attractors and Basins of Attraction: Attractors are patterns in phase space to which values of variables settle into after transients die out. Strange attractors are focal points for the most challenging patterns generated by dynamical, chaotic systems. Basins of attraction are the areas containing those patterns within their boundaries. · Fractals: Fractal boundaries (or simply fractals) are mathematical representations of the irregular "lines" of demarcation between separate units. Pictures result from iteration of non linear equations, usually in a feed back loop, a set of points produced. Graphing these points produces these images Fractals convey that reality is rarely as clear/clean cut as we picture it. · Bifurcation/ Cascade: Bifurcation is a scientific way to say something splits in two—branches. If patterns bifurcate quickly enough, they can become complex very fast, leading to bifurcation cascade and chaos. · Unpredictability: Unpredictability is the inability to state with certainty the next state (or, for that matter, the previous state) of a system given knowledge of its present state. · Recursivity: It is selfreflexiveness, and selfrelectiveness, feeding information from one’s patterns back into the process of producing them. · Equilibrium: Equilibrium is the tendency or inertia of system not to change its patterns by staying near or returning to points of attraction. Patterns change significantly and most unpredictably in far from equilibrium in chaotic systems. · Selfaffinity: It denotes the tendency for phenomena to evidence recurring patterns. The affinity can be over size, time, different angles, or other ways more difficult to see or to grasp, for example, by the process that generates them or probabilistically. dynamical systemsThe dynamic
of any situation refers to how the situation changes over the course of time. A
dynamical system is a physical setting together with rules for
how the setting changes or evolves from one moment of time to the next.
Dynamical system is a system in which the situation changes over the course of
time. Physical settings along with a set of functions/rules that specifies how
variables changes over time in a dynamical system. The current state of a
dynamical system is specified by the current values of its variables. The system state at time t is
an instantaneous description of the system which is sufficient to predict the
future states of the system without recourse to states prior to t. The
evolution of the system state refers to a sequence or continuous trajectory
through the space of possible system states. E.g.: Pendulum, Mathematical model of human system, where numerical
quantities represent the state of system, some varies with time while others
remains fixed. the chaos game
playing the chaos game with rolls of red, green, blue, blue. Now continue in this fashion for a small number of rolls of the die. Five rolls are sufficient if you are playing the game “by hand'' or on a graphing calculator, and eight are sufficient if you are playing on a highresolution computer screen. (If you start with a point outside the triangle, you will need more of these initial rolls.) After a few initial rolls of the die, begin to record the track of these traveling points after each roll of the die. The goal of the chaos game is to roll the die many hundreds of times and predict what the resulting pattern of points will be. Most people who are unfamiliar with the game guess that the resulting image will be a random smear of points. Others predict that the points will eventually fill the entire triangle. Both guesses are quite natural, given the random nature of the chaos game. But both guesses are completely wrong. The resulting image is anything but a random smear; with probability one, the points form what mathematicians call the Sierpinski triangle . The Sierpinski triangle [14] In the Figure top triangle is in RED, lower left triangle in GREEN, and lower right triangle in BLUE. We have used color merely to indicate the proximity or closeness of the vertex with the given color. For example, the portion of the triangle closest to the green vertex is colored green, and so forth. The sequence of points generated by the chaos game is called the orbit of the seed. The process of repeating the rolls of the die and tracing the resulting orbit is called iteration. Iteration is important in many areas of mathematics. In fact, the branch of mathematics known as discrete dynamical systems theory is the study of such iterative processes. There are two remarkable facets of the chaos game. 1. The Sierpinski triangle is one of the most basic types of geometric images known as fractals. 2. The second is the fact that this figure results no matter what seed is used to begin the game With probability one, the orbit of any seed eventually fills out S. The words “with probability one'' are important here. Obviously, if we always roll ``red,'' the orbit will simply tend directly to the red vertex. Of course, we do not expect a fair die to yield the same two numbers at each roll. The following figures shows Sierpinski triangles in each iterations:
First recursion of Sierpinski triangle
Second recursion
of Sierpinski triangle
Third recursion of Sierpinski triangle
fourth recursion of Sierpinski triangle
CGR a brief historyChaos Game
Representation (CGR) was proposed as a scaleindependent representation for
genomic sequences by Jeffrey in 1990 (Jeffrey, H. J. 1990). In his method of representing gene structure using Chaos
game, H. Joel Jeffrey discovered some
characteristic patterns presented below: The CGR of Human Beta Globin exhibhits a characteristic pattern (almost empty area in the upper right quadrant (the gquadrant). A smaller copy of this scoop appears in the left (cquadrant), presenting a double scoop appearance. Also the CGR exhibhits a property of selfsimilarity, a concept very important in the study of fractals and chaotic dynamics. Also it is stated that the features found in CGR of Human Beta Globin are found in a number of other genetic sequences. CGR of Human Beta Globin Region on Chromosome 11 [1]
The results come out can be summarized as:
In his paper Nick Goldman confirms a
double scoop pattern in almost all vertebrate DNA sequences and in the CGR of
human Tcell lymphotropic virus (type III) genome and other human viruses is
entirely attributable to a scarcity of CG dinucleotides The biological reasons
for this scarcity are wellunderstood, being the selective disadvantage of CG
dinucleotides which are prone to methylation and subsequent Mutation. Indeed,
it must be doubtful whether morecomplex patterns of information in DNA
sequences will be visible in CGRs, since simpler patterns will tend to obscure
more complex ones. Therefore the possibility of using CGR
for revealing patterns in genes can be extended to identify a pattern for Exon
and Intron sequences. Later, three years after the original
proposition; it was demonstrated and interpreted the translation of CGR quadrant
frequencies into oligonucleotide frequencies. It was a good indication that
CGRs can be more useful than simple evaluation of nucleotide, dinucleotide and
trinucleotide frequencies’ (Goldman, 1993). So for several years, the
possibility that the CGR format can be used for representing the nucleotide
sequence as well as identifying the resulting sequence scheme had not been
fully explored. In 2001 Almeida proved that the distributions of positions in
the CGR plane are generalization of Markov chain probability tables that
accommodates noninteger orders (Almeida 2001). This brought CGR to the main
stream again. In CGR representation for promoter, quadrants are taken as A is at position (0,0) G at position (1,1) T at position (0,1) and C at (1,0) which is different way of usual representation, i.e., A at(0,0) G at (0,1) C at(1,0) and T at(1,1) . The reason for this variation is the AT rich and GC rich nature of the promoter region. The increase in their concentrations is represented as diagonals and can be easily observable. But in the standard case it is represented as bands and cannot be observed prominently.
CGR of nucleotide sequenceIn CGR the four nucleotides A, G, C, and T are assigned to the corners of a square as in the given figure
Nucleotide A is at position (0,0) G at position (1,0) T
at position (1,1) and C at (0,1). Any
arbitrary nucleotide is represented as one point in the square. The first point
plotted halfway between the centre of the square and the corner corresponding
to the first nucleotide of the sequence and successive points plotted halfway
between the previous point and the corner corresponding to the base of each
successive sequence site. Every point of a CGR is a representation of sequence up to
that position. For example in the CGR of
the sequence “ATTGCAGGCT” the sixth point represents the sequence
“ATTGCA”. Thus there is a on to one
correspondence between the subsequences and the points in the CGR. Since a base is always plotted in its
quadrant, any sequence will always be plotted somewhere in the quadrant of its
last base, and conversely any two points in the same quadrant must have the
same last base. important features of CGR

Bioinformatics >