基于遗传算法的软件测试技术

A Survey on Software Testing Techniques using Genetic Algorithm Chayanika Sharma1, Sangeeta Sabharwal2, Ritu Sibal3Department of computer Science and Information Technology, University of Delhi, Netaji Subhas Institute of Technology Azad Hind Fauz Marg, Dwarka, Sector -3, New Delhi - 110078, India AbstractThe overall aim of the software industry is to ensure delivery of high quality software to the end user. Toensure high quality software, it is required to test software. Testing ensures that software meets user specifications and requirements. However, the field of software testing has a number of underlying issues like effective generation of test cases, prioritisation of test cases etc which need to be tackled. These issues demand on effort, time and cost of the testing.Different techniques and methodologies have been proposed for taking care of these issues. Use of evolutionary algorithms for automatic test generation has been an area of interest for many researchers.Genetic Algorithm (GA) is one such form of evolutionary algorithms. In this research paper, we present a survey of GA approach for addressing the various issues encountered during software testing.Keywords: Software testing, Genetic Algorithm1. IntroductionTesting is primarily done on software as well as in web for testing client and server architecture. Software testing is one of the major and primary techniques for achieving high quality software. Software testing is done to detect presence of faults, which cause software failure. However, software testing is a time consuming and expensive task [29], [20], [28]. It consumes almost 50% of the software system development resources [3], [20]. Software testing can also be defined as process of verifying and validating software to ensure that software meets the technical as well as business requirements as expected [16].Verification is done to ensure that the software meets specification and is close to structural testing whereas validation is close to the functional testing and is doneby executing software under test (SUT) [18]. Broadly, testing techniques include functional (black box) and structural (white box) testing. Functional testing is basedon functional requirements whereas structural testing is done on code itself [13] [10] [24]. Gray box testing is hybrid of white box testing and black box testing [8].Testing can be done either manually or automatically by using testing tools. It is found that automated software testing is better than manual testing. However, very few test data generation tools are commercially available today [14]. Various techniques have been proposed for generating test data or test cases automatically. Recently, lot of work is being done for test cases generation using soft computing techniques like fuzzy logic, neural networks, GA, genetic programming and evolutionary computation providing keys to the problem areas of software testing.Evolutionary testing is an emerging methodology for automatically producing high quality test data [10]. GA is well known form of the evolutionary algorithms conceived by John Holland in United States during late sixties [6] [25]. In [21], evolutionary black box testing is also applied on embedded systems to test its functional and non-functional properties. GA has been applied in many optimization problems for generating test plans for functionality testing, feasible test cases and in many other areas [5] [15]. GA has also been used in model based test case generation [3] [23] [26] [27]. In object oriented unit testing as well as in the black box testing, GA is used for automatic generation of test cases [23], [10], [15]. Concerning testing of web applications, many tools, new techniques and methods have been developed to address issues like maintainability, testability, security, performance, correctness and reliability of web application [8]. Web applications are composed of web pages and components and interaction between them executes web servers, HTTP, browser (the client side) and networks. A web page is information viewed on the client side in a single browser window [16]. In [30], user session data of web application is used to generate test cases by applying GA.In this research paper, a survey of different software testing techniques where GA is efficiently used is presented. This paper is divided into 4 sections. Section 2 describes briefly the working of a GA. In section 3, applications of GA in different types of software testing is described. Section 4 concludes the paper and gives an overview of our future work.2. Genetic Algorithm: A Brief IntroductionIn the past, evolutionary algorithms have been applied in many real life problems. GA is one such evolutionaryalgorithm. GA has emerged as a practical, robust optimization technique and search method. A GA is a search algorithm that is inspired by the way nature evolves species using natural selection of the fittest individuals.The possible solutions to problem being solved are represented by a population of chromosomes. A chromosome is a string of binary digits and each digit that makes up a chromosome is called a gene. This initial population can be totally random or can be created manually using processes such as greedy algorithm. The pseudo code of a basic algorithm for GA is as follows [6]:-Initialize (population)Evaluate (population)While (stopping condition not satisfied){Selection (population)Crossover (population)Mutate (population)Evaluate (population)}GA uses three operators on its population which are described below:-Σ Selection: A selection scheme is applied to determine how individuals are chosen for mating based on their fitness. Fitness can be defined as a capability of an individual to survive and reproduce in an environment. Selection generates the new population from the old one, thus starting a new generation. Each chromosome is evaluated in present generation to determine its fitness value. This fitness value is used to select the better chromosomes from the populationfor the next generation.Σ Crossover or Recombination: After selection, the crossover operation is applied to the selected chromosomes. It involves swapping of genes or sequence of bits in the string between two individuals. This process is repeated with different parent individuals until the next generation has enough individuals. After crossover, the mutation operator is applied to a randomly selected subset of the population.Σ Mutation: Mutation alters chromosomes in small ways to introduce new good traits. It is applied to bring diversity in the population.3. Using Genetic Algorithm in SoftwareTestingIn this section we will discuss in detail the applications of GA in different areas of testing like test planning [5], minimization of test cases in regression testing [11], model based testing [3] [23] [26] [27] and web testing [30].3.1 Applications of GA in White Box TestingStructural testing can be done in the form of data flow testing or path testing. Path testing involves generating a set of paths that will cover every branch in the program and finding the set of test cases that will execute every path in this set of program path [16] [18]. In data flow testing, the focus is on the points at which variables receive values and the points at which these values are used [2]. Next, we will discuss briefly some of the research work regarding the applications of GA in white box testing.3.1.1 Data Flow TestingM.R. GirgisGirgis [7] has proposed a structural oriented automatic test data generation technique that uses a GA guided by the data flow dependencies in the program to fulfil the all-uses criterion. The program to be tested is converted into a Control Flow Graph (CFG) where each node represents a block in a program and the edges of the flow graph depict the control flow of the statements.Variables in a program under test are divided into 'cuses' and 'p-uses’ variables. c-uses variables are those which are used in computations or as a predicates in a program whereas p-uses variables are associated with edges of the flow graph. In order to fulfil the all-uses criteria, the def-clear path (a path containing no new definition of a current variable) from each definition of a variable to each use of that variable need to be determined. To find out the set of paths satisfying alluses criteria, it is necessary to determine def c-use(dcu) and def p-use(dpu) of a variable i.e. the def- clear paths to their c-use at node i and def-clear paths to their p-use at edge (i, j).Using the location of a variable defs and uses in a program under test, combined with the ‘Basic state reach algorithm’, the sets dcu(i)and dpu(i,j) are determined. From the ‘Basic state reach algorithm’ two sets reach (i) and avail (i) are determined where reach (i) is the set of all variable defs that “reach” node i and set avail (i) is the set of all “available” variable defs at node i i.e. The union of set of global defs at node i and the set of all defs that reach this node. dcu(i) : reach(i)«c -use(i) (1)dpu(i, j) : avail(i)« p - use(i, j)List of all dcu and dpu sets in the procedure calls satisfying the all-uses criterion are determined along with killing nodes (nodes containing other definition of a variable in a current path) that must not be included in the current path. In this approach, GA accepts instrumented version of the program under test, the list of def-use sets to be covered, the number of input variables, and the domain and the precision of each input variables as an input. A binary vector is used to represent a chromosome. The length of the input is determined by the domain and the precision. The domain is represented by Di = [ai, bi] where each variable in a program takes values from the range [ai, bi]. Each domain Di should be cut into equal size ranges if decimal places di is desired for a variable to achieve precision. If mi is an integer denoting the length of a chromosome or a string such that then a binary string denoting a variable of length mi fulfil the precision requirement. The mapping from the binary string i, into a real number from the range [ai, bi] is performed by the following formula:-Where, xi’ represents the decimal value of the binary string i.By applying di = 0, the above formula can be used to map binary string i into an integer number from the range [ai, bi]. Each chromosome represents a test case for a program which is represented by a binary string of specified length. Each chromosome is then represented by a decimal number by using (3) or (4).The fitness value eval (vi) for each chromosome vi (i =1...., pop_size) is calculated as follows:-A test case, vi is effective if its fitness value eval (vi) > 0. Each test case or chromosome is evaluated and the program is executed to record the def – use paths in the program that are covered by the test cases as its input.All the test cases are selected with effective eval (vi) or good fitness value. Selection is done by roulette wheel selection and proposed random selection method. The effective test cases then become parents of the new population. If none is effective then all the individuals are chosen as the parents.DiscussionThe test case generation by the proposed GA is more effective as compared to the random testing technique. The proposed selection method generates better results than the roulette wheel selection method. However, the proposed selection method has not been evaluated and compared with other selection methods like stochastic, uniform and tournament selection methods.3.1.2 Path TestingP.R Srivastava et al. In [20], P.R. Srivastava and Tai have presented a method for optimizing software testing efficiency by identifying the most critical path clusters in a program. The SUT is converted into a CFG. Weights are assigned to the edges of the CFG by applying 80-20 rule. 80 percentage of weight of incoming credit is given to loops and branches and the remaining 20 percentage of incoming credit is given to the edges in sequential path. The summation of weights along the edges comprising a path determines criticality of path. Higher the summation more critical is path and therefore must be tested before other paths. In this way by identifying most critical paths that must be tested first, testing efficiency is increased.Another test generation approach proposed by P.R Srivastava is based on path coverage testing [19]. The test data is generated for Resource Request algorithm using Ant Colony Optimization algorithm (ACO) and GA. Resource request algorithm is deadlock avoidance algorithm used for resource allocation by operating system to the processes in execution cycle [10]. The ACO algorithm is inspired from behaviour of real ants where ants find closest possible route to a food source or destination. The ants generate chemical substance called pheromones which helps ants to follow the path. The pheromone content increases as more ants follow the trail. The possible paths of CFG are generated having maximum number of nodes. Using ACO, optimized path ensuring safety sequence in resource request algorithm is generated covering all edges of CFG. Using GA, suitable test data set is generated which covers the need for each process. The backbone of genetic process is the fitness function which counts number of times a particular data enters and continues the resource request algorithm. Higher the value of count, higher is chances of avoiding a deadlock. The test data with higher values of count is taken and genetic crossover and mutation is applied to yield better results. Simultaneously, poor test data is removed each time.DiscussionThe experimental results shows that success rate of ACO are much better than GA. In weighted CFG approach [20], experiments were done on small examples and need to be done on larger commercial examples. Moreover, method can be further improved.Dr. Velur et al.The approach for test cases generation from directed graph has been proposed by Dr. Velur [29]. In this work, directed graph of intermediate states of system under test is created to exhibit the expected behaviour of system as shown in Fig 1.A directed graph is represented by G = [V, E], where V represents state or vertices and edges represents flow of control. Thereafter, a graph containing n nodes is represented by an incidence matrix of order n * n where an entry ‘1’ in the matrix represents edge between nodes and ‘0’ represents no edge or connection between them (see Fig 1). By using the nodes of graph as the base population, pair of nodes are generated which areselected as parents by applying the dual graph generation technique proposed as shown in Fig 2. The population is initialized by random selection of graphs of size 43 and 250 individuals are generated. The tournament selection method is used, where two individuals are chosen randomly and individual with the maximum fitness is chosen for crossover. Fitness is calculated by using the ‘Current maximum clique algorithm’ and Approximation algorithm’. Fitness is assigned by finding the clique of size 5 and the number of independent sets of size 5 in the population which comprise of number of graphs in the population. The graph with 0 fitness value indicates the clique of size 0 and no independent sets of size 5 in the graph. Nodes which are already visited are discarded and GA cycle continues till all the nodes of the graph are visited once.The graph is first converted into a binary string as shown in Fig 1. Next, the arcs of an original graph are converted into nodes as shown in Fig 2. For example, if an edge1 is an incoming to some node and the edge 2 is outgoing edge for the same node then an edge is created from edge 1 to edge 2 which acts as nodes in corresponding dual graph. The dual graph is then eulerized by duplicating the arcs for balancing the node polarities. As dual graph is traversed, all possible two links combination in dual graph for example bc, bf, bg ... are noted down. All the dual combinations are then encoded in 0 and 1 format as genetic population.DiscussionThis technique will be more suitable for network testing and system testing where predictive model based tests are not optimized to generate the outputs. The approach uses tournament selection method only. This approach has not been compared with new proposed approaches for generating test cases from the CFG. Moreover, the effectiveness of the fitness evaluation criteria has not been justified.Maha Alzabidi et al.Maha Alzabidi [14] has proposed automatic structural test case design using evolutionary testing. Software structural testing is done by taking path coverage for testing. For path testing, CFG is used in their work for representing a program where the nodes are the basic blocks and the edges between the nodes indicate the flow of the program. Meaningful paths are extracted from CFG and are selected as a target path for testing. Test cases are generated to trace the new path which leads to the target path. The test result is evaluated to determine that the testing objective criteria are satisfied by executing the selected path.The fitness function is named as a Shifted–Modified- Similarity (SMS) which is a modification to the hamming distance. The symmetric difference or hamming distance is calculated for cascading edges for target path and current path. Similarities are thennormalized and summed associated with a weighting factor. This value is used as an objective function to evaluate the individuals in the population.DiscussionThe approach improves the fitness function. Performance of different GA parameters is studied in this paper. Parameters have been applied on different test programs and results have shown that double crossover is more effective than single crossover applied on a test program. The approach is applied on the small programs but has not been evaluated on the complex programs involving loops, arrays and linked lists using different data types.Jose Carlos et al.The structural testing of object oriented programs requires traversing the complex control flow paths, resulting in the complex test cases generation which defines the elaborate state scenarios. Jose has proposed the methodology for evaluating the quality of both feasible and unfeasible test cases for structural oriented unit testing of object oriented java programs [10]. The test cases that are terminated with a call to a method and are completely executed are termed as feasible test cases whereas the test cases which abort prematurely are termed as unfeasible test cases. In this work, the test cases are represented as strongly typed genetic programming (STGP) individuals where each individual contains number of STGP trees equal to the number of arguments in the current method or method under test (MUT). The tree is traversed by depth first traversal algorithm which generates the sequence of method calls or scenarios.Traversing the trees by depth first traversal generates a linear sequence of computations or method call sequence (MCS) of MUT which is represented by a CFG as shown in Fig 3. The nodes in a CFG are assigned the weights and the fitness of test cases is computed. The fitness of feasible test case is computed on the basis of number of times a particular CFG node was exercised by the test cases of previous generations whereas unfeasible test cases are measured by the method calls that threw the exceptions or distance between the runtime exception indexes which results in prematurely termination of a test case.The CFG nodes weights are evaluated at the beginning of every generation according to formula proposed by Carlos:-Where the hitCni parameter is the “Hit count” representin。