1. Introduction
Genetic Programming (GP) [1] is an automatic programming approach applied in a wide range of application areas, such as circuit design, mathematical modeling, data mining, image analysis, regression analysis, natural disaster prediction, etc. [2] [3] [4] [5] [6] . Its fundamental design idea comes from a genetic algorithm [7] [8] , which is derived from such a rule as “survival of the fittest”, that is, evolving constantly populations of individuals, followed by the evaluations, and finally obtaining the desired solution with the best fitness value to some given problem. With the rapid development of computing technology, this search-based method has become one of the most important tools to deal with many optimization problems.
GP algorithm consists of 5 steps: a) constructing an initial population of individuals associated with approximate solutions to some given problem; b) evaluating individuals in the population for their fitness values and algorithmic termination condition. If the condition holds, go to step e); otherwise, execute the next step; c) generating a new generation of individuals on the basis of genetic operators and strategies; d) go to step b); e) regarding the individual with the best fitness value as the desired result. This is the common structure of various GP systems.
Up to now, there appear a great many GP variants, which include Grammatical Evolution (GE) [9] [10] [11] , Gene Expression Programming (GEP) [12] [13] [14] , Multi-Expression Programming (MEP) [3] [14] , Cartesian Genetic Programming (CGP) [3] [15] [16] , etc. Generally speaking, both their designs and implementations will be concerned with individual representations, genetic operators, evolutionary controls, evaluations, and the like. Consequently, novel GPs designed for some specific problems have the characteristics of structural consistency, but it will be very time-consuming and laborious to design and develop them from scratch.
In this paper, we intend to make a preliminary discussion on the rapid prototyping development of Genetic programming. To our knowledge, similar work related to it is relatively rare. Since many GP variants share the same structure, we manage to build a suitable bridge between them, so that the rapid prototyping development conception and target GP variants can be explored and tested on certain public GP platforms.
2. Basic Principle
Representation plays an important role in GP system. As Rothlauf put it in reference [17] without representations, no use of genetic and evolutionary algorithms is possible. Several other concepts related to genotype representations range from decoding, solution space or phenotypic space to individual evaluation. Search, control and other related operations are primarily applied in genotype space. After individuals have been obtained from some constrained rules, they are generally interpreted as an approximate solution in phenotype space by decoding method, and whether or not it is a good result depends on the fitness evaluation.
Consider GP1 and GP2 as two GP systems with the same solution space, regressively analyzed sample dataset S, and evaluation criteria E: SolutionSpace ´ {S} → Real, I1 and I2 as their individual spaces, D1: I1 → SolutionSpace and D2: I2 → SolutionSpace as their decoding methods respectively, then what is the relationship, mathematically speaking, between their search processes? Can they help each other to some extent?
In principle, we can establish the relation Equation (1) for these two search-based GP systems, where ε stands for the error term.
(1)
If having h: I1 → I2, then there is Equation (2):
(2)
(3)
Since the non-error term on the right side of Equation (2) can be formally obtained from the left side of Equation (2) by replacing its two occurrences of D1 with D2h, we can formally get Equation (3). This means the calculation of the right side of Equation (3) can be approximately computed in GP1 through substitution of D2h for D1 in GP1. Since GP1 embraces services many other GPs need, we call it the basic GP development platform.
It is worth noticing that D2 and h play a critical role in rapid prototype development of GP system. In Section 3, we will solve the selection problem of h, a mapping from individuals of the basic GP platform onto that of target GPs.
3. Proposed Approach
The first step of the proposed method is to choose πGrammatical Evolution (πGE) [18] as the basic GP platform. GE is a GP variant with variable length genotypes [10] [19] [20] . One of its advantages is that it can use a context-free grammar to describe phenotypes of individuals, therefore theoretically generating programs in an arbitrary programming language. However, GE is usually implemented in terms of leftmost derivations. For example, let δ = αAβ be a sentential form of some grammar G = (VN, VT, S, P) [21] [22] , where VN is a finite set of non-terminal symbols, VT on the contrary a finite set of terminals. Any derivation of sentential form in G will be derived from the start symbol S based on some production rule in P. So, the leftmost derivation of αγβ from αAβ with respect to A ::= γ in P is the substitution of γ for the leftmost non-terminal , say A, in δ. In order to realize this leftmost derivation, i.e. αAβ => αγβ, in GE whose genotype consists of codons, the production A ::= γ selected is determined by Equation (4), where the number of alternative productions for A is denoted as NumChoices(A).
The production rule selected
= (value of the current codon) mod NumChoices(A) (4)
To enhance the flexibility of derivations, a novel encoding method which make codons to encode both expanded non-terminal and production information to be used, i.e. codon = (v1, v2), achieves the desired efficacy. This improvement leads to the πGE. The algorithm structure is similar to that of GP, and the detail can be found in [18] . The major difference between πGE and canonical GE lies in the fact that both the expanded non-terminal, say X, and candidate production rule of X, selected for the derivations are determined by Equations (5) and (6). cNumNonterminals used below is the number of all non-terminals occurring in current sentential form. Table 1 demonstrates an example grammar and a derivation of a Boolean expression in the case of the grammar.
The nonterminal X expanded = v1 mod cNumNonterminals + 1 (5)
The candidate production of X = v2 mod NumChoices(X) (6)
What we should deal with in the second step of the proposed approach is to effectively delineate individuals of objective GP using a context-free grammar, and treat them as sentences of the corresponding language. Once πGE has been selected as the basic GP platform, as is done in this paper, there come a lot of
![]()
Table 1. Example derivation of a Boolean expression.
Notes: Nonterminal and production rule are determined by Equations (5) and (6).
mappings from πGE onto individual space of the target GP system. In the present paper, Formulas (5) and (6) are also used to define the mapping relation h discussed in Section 2.
Finally, as far as the third step is concerned, we will focus on the programming of individual decoder and evaluation component which play an essential part in all GP designs. After completing the previous work, the rapid GP prototyping development system without extra coding requirements will be successfully achieved. In addition, the decoding process may also encounter such problem as incomplete mapping of individuals, that is, the derived sentential form still contains some non-terminal symbols, therefore, it is not a valid expression. The measure taken here is to design and implement default mapping rules (see complete mapping used in [19] [20] ) compatible with description grammar of individuals for dealing with it. The following provides an overview of the rapid prototype development procedure:
a) Implementing a basic GP prototyping development platform like πGE;
b) Specifying target GP individuals using a context-free grammar;
c) Designing targeted individual decoders, evaluation components and default mapping rules in terms of sample dataset and results of steps a and b;
d) Running programs and analyzing the obtained results.
4. Experiment and Analysis
This section intends to conduct regression analysis on problems from the literature [10] [23] in the context of the above method. The observation dataset is sampled in the same way as that of [10] at 20 observation points of variable y like {−1, −0.9, −0.8, −0.76, −0.72, −0.68, −0.64, −0.4, −0.2, 0, 0.2, 0.4, 0.63, 0.72, 0.81, 0.90, 0.93, 0.96, 0.99, 1} in the range of [−1, 1].
(7)
(8)
(9)
To demonstrate that developing GP on rapid prototyping development system is of easiness and efficiency, we have designed and implemented two GP systems, which can be functionally categorized into the standard tree-based GP and the improved GEP (ImpGEP) [24] according to their decoding means. Having implemented πGE as the basic development platform, experiments with them on Formula (7) through Formula (9) can be carried out. The individual descriptions, default mapping rules and decoding algorithms used by these GPs are shown in Figures 1-4, respectively. Except that their decoding methods are different from each other and representations should be specified by some grammars, other components such as evolutionary controls, genetic operators implemented in πGE, the basic GP development platform, are shared among various objective GPs. Major parameters employed here are given in Table 2.
Experiment with GP on the regression problems
![]()
Table 2. Parameters used in these experiments.
Notes: MxLength = Max Length of individual; FixLength = Fixed Length individual; ProCro = Probability of Crossover.
![]()
Figure 1. Grammar and default mapping for individuals of tree-based GP.
![]()
Figure 2. Decoder of the improved GEP with its decoded expressions.
1) Brief comment on GP
Classical GP [1] is proposed by Koza for automatically programming computer programs to solve given problems. The major idea embedded in GP is to decode the evolved tree-based individuals into expressions on the basis of its five-step algorithm framework (see Section 1). On the one hand, individuals can
![]()
Figure 3. Grammar and default mapping for individuals of the improved GEP.
![]()
Figure 4. Decoder for individuals of the improved GEP.
be constantly generated from the algorithm, decoding mechanism within it is responsible for the transformation of them into expressions on the other. Since a context-free grammar can represent tree-based embedded relations between sub-expressions, we introduce it to specify both genotypes and phenotypes. In this case, the phenotypic space is the same as the language obtained. Naturally, the identity mapping is suitable for the decoder.
2) Experimental process
Having implemented the required πGE, our major work towards solving the above problems by GP includes 2 steps:
a) Designing a grammar as well as its corresponding default mapping rule as shown in Figure 1, so as to specify the individuals of concern;
b) Programming both the decoder and evaluation component so that phenotypes and fitness values can be decoded from individuals and well evaluated, respectively.
In this experiment, the decoder and evaluation mechanism are the identity mapping and the least square error principle, respectively.
Experiment with the Improved GEP on the regression problems
1) Brief comment on GEP
GEP [12] is a linearly represented GP variant with fixed-length individuals of terminal symbols and functions. Its decoding method as partly given in Figure 2 is impressive. It transforms individuals into expression trees (ET trees) at first, and followed by traversal operations on them to get the expected expressions [12] . However, for the sake of simplicity, the decoder M1 used here is designed according to the depth-first decoding mode.
Another feature is its structural constraint of individuals, that is to say, each individual is composed of a head and a tail that satisfy such length restriction as t = h * (n − 1) + 1, where t, h represent length of tail and head, and n is the maximum arity of functions involved. Considering that the decoder of canonical GEP can only find one possible solution in each run, we have made an improvement for figuring out many expressions from genotype reusing technique [24] on it so that the wining chance can be enhanced. For instance, the improved GEP tends to reuse genotype information through continuously applying original GEP decoding approach (say M1 in Figure 2) to every element (or word) of an individual as shown in Figure 2, thus obtaining many alternative solutions. The improved decoder M2 is also declared in Figure 4.
2) Experimental process
Having finished the implementation of πGE as basic development platform, we have the following problem solving process:
a) Designing both the grammar and the corresponding default mapping as given in Figure 3 for specifying the individuals of concern;
b) Coding both the decoder and evaluation component so as to decode and evaluate individuals and their fitness values, respectively.
In this experiment, evaluation criterion is the least square error. Given the following three functions, the decoder of the improved GEP, denoted M2 in Figure 2, can be realized by applying the function DecodeOfDepthFirstGEP firstly to different elements of an individual to get many candidate expressions, then evaluating and choosing the fittest expression as the desired solution. The implementation of M2 is shown in Figure 4.
DecodeOfDepthFirstGEP(Individual, cPos): A function to construct an expression from the element indexed by cPos in the individual.
GetFuncOrOpnd(Individual, pos): A function to get a element/word indexed by pos in the individual.
LeastSquareError(p, q, 'y', Points): A function to compute the least square error between functions p and q. 'y' stands for the variable of them. The variable Points is a list of the observation points of y.
Finally, running the decoders and evaluation components obtained above in the context of πGE, sample dataset and grammars given above will result in Figure 5 and Figure 6. The evaluation procedure used here is realized on the least square error principle. It follows from these results that the solutions obtained from each rapid developed method can gradually approach to the optimization
(a)
(b)
(c)
Figure 5. Solving f, g, h by GP and the improved GEP. (a) Solving f, g, h by GP; (b) Solving f, g, h by GEP; (c) Solving f by GP and GEP.
(a)
(b)
Figure 6. Solving g, h by GP and the improved GEP. (a) Solving g by GP and GEP; (b) Solving h by GP and GEP.
goals when applying that method to solve the regression problems of concern. Additionally, it can also be seen from Figure 6 that in the given environment of these experiments, the tested GP outperforms the GEP variant on g and h with respect to the fitness values, and from Figure 5(a) and Figure 5(b) that the regression analysis on g seems easier than on f and h. With the help of the proposed method, we can quickly establish a GP prototyping system and simulate it on πGE, the basic GP development platform. Besides, this approach also provides with a common environment for comparisons among many GPs.
Although the number of examples given here is small, we can get some important cognition and enlightenment from them. So far, there are many GP variants. The essential intention implied in the abovementioned method and effect is to construct and search the target semantic objects in the phenotypic space by changing both the search space and corresponding decoders. As such, this method not only helps designers to improve the efficiency of GP research and development, but also helps to use genotype space, an unpredictable kaleidoscope, to examine and understand semantic objects and domain knowledge.
5. Conclusion
We have made a preliminary investigation into rapid GP prototype development and applications in the present paper, and more systematical and in-depth issues like how to integrate formal structures and semantics into it will become our future work. The major advantage of using this approach to design and implement GP is that designers and implementers can ignore many implementation details and concentrate their energy on the essence of the problem, such as representation, decoding method and population evaluation. For representation, what we need to do is to define individuals by designing a context-free grammar. And decoder and evaluation procedure are the components to be programmed only provided mapping relation between the basic development platform and target GPs is determined. These GE-based task segmentation methods are of great theoretical and practical significance to solving complex practical problems and studying high-performance computing of genetic programming.
Acknowledgements
This work was supported partly by the National Natural Science Foundation of China under Grant No. 61977018.