遗传编程和Python的限制AST表达式LLVM JIT

关于理由一个小前奏

所以,我正在写在C / C象征主义回归机++叫闪耀,其目的是对遗传编程文库的JIT(像Pyevolvefor instance). The main rationale behind Shine is that we have today a lot of research on speeding Genetic Programming using GPUs (the GPU fever !) or any other special hardware, etc, however we don’t have many papers talking about optimizing GP using the state of art compilers optimizations like we have on clang, gcc, etc.

The “hot spot” or the component that consumes a lot of CPU resources today on Genetic Programming is the evaluation of each individual in order to calculate the fitness of the program tree. This evaluation is often executed on each set of parameters of the “training” set. Suppose you want to make a symbolic regression of a single expression like the Pythagoras Theorem and you have a linear space of parameters from 1.0 to 1000.0 with a step of 0.1 you have 10.000 evaluations for each individual (program tree) of your population !

什么服务所做的是下面的图片描述:

它采用遗传编程发动机的个体,然后将其转换为LLVM中间表示(LLVM汇编语言),之后它运行的改造经过LLVM的(这里是现代编译器的真正力量在GP背景下进入),然后将LLVM JIT优化的LLVM IR转换为本地代码的指定目标(X86和PowerPC等)。

你可以看到服务的体系结构如下:

这种架构带来了遗传规划了很大的灵活性,可以为可能后来由LLVM支持的任何语言你的个人使用情况下写的功能,哪些事项服务是LLVM IR,你可以使用任何语言,LLVM支持然后使用由LLVM产生的IR,可以从C,C ++,Ada的,FORTRAN,d,等混合代码,并使用自己的函数作为遗传规划树的非末端节点。

服务仍是其发展较早,它看起来简单的想法,但我仍然有很多问题需要解决,像对JIT评估过程本身,而不是做呼叫从Python的使用JIT编译的树木ctypes的绑定。

对Python的AST本身做遗传编程

During the development of Shine, an idea happened to me, that I could use a restricted Python抽象语法树(AST)作为一个遗传编程引擎个人表示,这样做的主要优点是灵活性和重用了很多东西的可能性。Of course that a shared library written in C/C++ would be useful for a lot of Genetic Programming engines that doesn’t uses Python, but since my spare time to work on this is becoming more and more rare I started to rethink the approach and use Python and the LLVM bindings for LLVM (LLVMPY),我才发现,原来是很容易使用JIT LLVM一组有限的Python的AST的本地代码,而这也正是这篇文章将会显现。

JIT'ing受限的Python AST

LLVM的最惊人的部分显然是经过改造,所述JIT的量,当然通过一个简单的API使用整个框架的能力(确定,不是那么简单有时)。为了简化这个例子中,我将使用任意的限制AST集合了Python AST仅支持减( - ),加(+),乘(*)和除法(/)。

要了解Python的AST,你可以使用Python解析器,转换成源AST:

>>>进口AST >>> ASTP = ast.parse( “2 * 7”)>>> ast.dump(ASTP)“模块(体= [Expr的(值= BinOp(左= NUM​​(N = 2),OP = MULT(),右= NUM​​(N = 7)))])”

什么是解析创建了包含抽象语法树BinOp二元运算) with the left operator as the number 2, the right operator as the number 7 and the operation itself as乘法(多重),很容易亚洲金博宝理解。我们现在要做的创建LLVM IR是创建将访问树的每个节点的访问者。要做到这一点,我们也可以继承了PythonNodeVisitorClass from theAST模块。What the NodeVisitor does is to visit each node of the tree and then call the method ‘visit_OPERATOR’ if it exists, when the NodeVisitor is going to visit the node for the BinOp for example, it will call the method ‘visit_BinOp’ passing as parameter the BinOp node itself.

在类的JIT游客将看起来像下面的代码的结构:

#导入AST和LLVM进口* LLVM的Python绑定进口AST从llvm.core进口*从llvm.ee进口*进口llvm.passes作为LP级AstJit(ast.NodeVisitor):DEF __init __(个体经营):通

What we need to do now is to create an initialization method to keep the last state of the JIT visitor, this is needed because we are going to JIT the content of the Python AST into a function and the last instruction of the function needs to return what was the result of the last instruction visited by the JIT. We also need to receive a LLVM Module object in which our function will be created as well the closure type, for the sake of simplicity I’m not type any object, I’m just assuming that all numbers from the expression are integers, so the closure type will be the LLVM integer type.

DEF __init __(个体,模块,参数):self.last_state =无self.module =模组#参数,将在IR功能self.parameters =参数self.closure_type = Type.int()#的属性以保持被创建链接创建的函数#,所以我们可以用它来JIT后self.func_obj =无self._create_builder()高清_create_builder(个体经营):整数类型则params的#有多少参数= [self.closure_type] * LEN(self.parameters)#函数的原型,返回一个整数#和接收所述整数参数ty_func = Type.function(self.closure_type,则params)#添加的功能名称为“func_ast_jit” self.func_obj = self.module该模块。add_function(ty_func,“func_ast_jit”)#创建用于索引,PNAME在枚举指定的每个参数的函数的参数(self.parameters):self.func_obj.args [索引] = .NAME#PNAME创建一个基本块和助洗剂BB = self.func_obj.append_basic_block( “入口”)self.builder = Builder.new(BB)

现在,我们需要对我们的客人实行什么是对的“visit_OPERATOR”方法BinOp并为名称operators. We will also implement the method to create the return instruction that will return the last state.

#A“名称”是在AST生产时访问#变量,比如“2 + X + Y”,“X”和“y”是#对AST为表达式创建的两个名称的节点。高清visit_Name(个体经营,节点):#这个变量就是函数的参数?指数= self.parameters.index(node.id)self.last_state = self.func_obj.args [指数]返回self.last_state#这里我们创建一个LLVM IR整数常量使用#货号节点,在表达式“2 + 3“你有两个#民节点上,NUM(N = 2)和民(N = 3)。高清visit_Num(个体经营,节点):self.last_state = Constant.int(self.closure_type,node.n)返回self.last_state#为DEF visit_BinOp二元运算访问者(自我,节点):#获取操作,左,右参数LHS = self.visit(node.left)RHS = self.visit(node.right)OP = node.op#转换每个操作(子,添加,MULT,DIV)到其#LLVM IR整数指令等效如果isinstance(OP,ast.Sub):OP = self.builder.sub(左,右轴, 'sub_t')的elif isinstance(OP,ast.Add):OP = self.builder.add(左,右轴, 'add_t')elif的isinstance(OP,ast.Mult):OP = self.builder.mul(左,右轴, 'mul_t')的elif isinstance(OP,ast.Div):OP = self.builder.sdiv(左,右轴,“sdiv_t“)self.last_state =运回self.last_state#建立与过去的状态返回(RET)语句高清build_return(个体经营):self.builder.ret(self.last_state)

And that is it, our visitor is ready to convert a Python AST to a LLVM IR assembly language, to run it we’ll first create a LLVM module and an expression:

模块= Module.new( 'ast_jit_module')#请注意,我使用两个变量 'A' 和 'b' EXPR =“(2 + 3 * B + 33 *(10/2)+ 1 + 3/3 +一)/ 2" 节点= ast.parse(表达式)打印ast.dump(节点)

将输出:

模块(体= [Expr的(值= BinOp(左= BinOp(左= BinOp(左= BinOp(左= BinOp(左= BinOp(左= NUM​​(N = 2),OP =添加(),右= BinOp(左= NUM​​(N = 3),OP = MULT(),右=名称(ID = 'b',CTX =负载()))),OP =添加(),右= BinOp(左= NUM​​(N =33),OP = MULT(),右= NUM​​(N = 2))),OP =添加(),右= NUM​​(N = 1)),OP =添加(),右= NUM​​(N = 3)),OP =添加(),右=名称(ID = 'A',CT​​X =负载())),OP =股利(),右= NUM​​(N = 2)))])

现在,我们终于可以对生成AST运行我们的访问者检查LLVM IR输出:

访问者= AstJit(模块,[ '一', 'B'])visitor.visit(节点)visitor.build_return()打印模块

将输出LLVM IR:

;的moduleId = 'ast_jit_module' 限定I32 @func_ast_jit(I32%A,I32%B){条目:%mul_t = MUL I32 3,%B%add_t =添加I32 2,%mul_t%add_t1 =添加I32%add_t,165%add_t2=添加I32%add_t1,1个%add_t3 =添加I32%add_t2,1%add_t4 =添加I32%add_t3,%A%sdiv_t = SDIV I32%add_t4,2 RET I32%sdiv_t}

现在是真正的乐趣开始的时候,我们要运行LLVM优化过程具有同等GCC -02优化级别来优化我们的代码,要做到这一点,我们创建一个PassManagerBuilder和PassManager的PassManagerBuilder是增加了通行证组件PassManager,您也可以手动添加像死代码消除,内联函数等任意的变换:

PMB = lp.PassManagerBuilder.new()#优化级别pmb.opt_level =下午2点= lp.PassManager.new()pmb.populate(下午)#执行通入模块pm.run(模块)的打印模块

将输出:

;的moduleId = 'ast_jit_module' 限定I32 @func_ast_jit(I32%A,I32%B)非展开readnone {条目:%mul_t = MUL I32%B,3%add_t3 =添加I32%A,169%add_t4 =添加I32%add_t3,%mul_t%sdiv_t = SDIV I32%add_t4,2 RET I32%sdiv_t}

在这里,我们拥有了Python AST表达的优化的LLVM IR。下一步骤是将其JIT IR为本地代码,然后与一些参数执行它:

EE = ExecutionEngine.new(模块)arg_a = GenericValue.int(Type.int(),100)= arg_b GenericValue.int(Type.int(),42)= RETVAL ee.run_function(visitor.func_obj,[arg_a,arg_b])打印 “返回值:%d” %retval.as_int()

将输出:

返回:197

就是这样,你已经创建了一个AST-> LLVM IR转换器,优化了LLVM IR与改造通行证,然后使用LLVM执行引擎它转换为本地代码。我希望你喜欢=)

引用本文为:基督教S. Perone,“遗传编程和Python的限制AST表达式LLVM JIT,”在亚洲金博宝未知领域,15/08/2012,//www.cpetem.com/2012/08/genetic-programming-and-a-llvm-jit-for-restricted-python-ast-expressions/

邀请:PYCON美国2011 - Python中的遗传编程

如果你要PYCON美国2011年,我想请你谈“在Python遗传编程“谈话将被给予埃里克Floehr3月12日下午1时20分- 2:05时三十分

以下是摘要:

Did you know you can create and evolve programs that find solutions to problems? This talk walks through how to use Genetic Algorithms and Genetic Programming as tools to discover solutions to hard problems, when to use GA/GP, setting up the GA/GP environment, and interpreting the results. Using pyevolve, we’ll walk through a real-world implementation creating a GP that predicts the weather.

(…)

遗传算法(GA) and Genetic Programming (GP) are methods used to search for and optimize solutions in large solution spaces. GA/GP use concepts borrowed from natural evolution, such as mutation, cross-over, selection, population, and fitness to generate solutions to problems. If done well, these solutions will become better as the GA/GP runs.

GA /全科医生一直使用的问题域as diverse as scheduling, database index optimization, circuit board layout, mirror and lens design, game strategies, and robotic walking and swimming. They can also be a lot of fun, and have been used to evolve aesthetically pleasing artwork, melodies, and approximating pictures or paintings using polygons.

GA/GP is fun to play with because often-times an unexpected solution will be created that will give new insight or knowledge. It might also present a novel solution to a problem, one that a human may never generate. Solutions may also be inscrutable, and determining why a solution works is interesting in itself.

成功pyevolve多为加速遗传编程

正如我们所知,遗传编程通常需要为健身功能和树操作(在交叉业务)精深加工能力,并使用像Pyevolve一个纯Python的方法时,这个事实可以是一个巨大的问题。因此,为了克服这种情况,我用Python的多处理功能来实现平行度评估approach in Pyevolve and I was surprised by the super linear speedup I got for a cpu bound fitness function used to do the symbolic regression of the Pythagoras theorem:C = \ SQRT {A ^ 2 + B ^ 2}。我用了GP相同的种子,因此它消耗了几乎相同的CPU资源,两个测试类别。下面是我得到的结果:

pyevolve_multiprocessing

我使用的第一个健身景观了2.500分,后来不得不的6.400分健身景观,这里是我使用的源代码(你只需要打开使用多道处理选项setMultiProcessing方法,所以Pyevolve会使用多进程,当你有一个以上的单核,您可以启用日志记录功能检查是怎么回事幕后):

from pyevolve import * import math rmse_accum = Util.ErrorAccumulator() def gp_add(a, b): return a+b def gp_sub(a, b): return a-b def gp_mul(a, b): return a*b def gp_sqrt(a): return math.sqrt(abs(a)) def eval_func(chromosome): global rmse_accum rmse_accum.reset() code_comp = chromosome.getCompiledCode() for a in xrange(0, 80): for b in xrange(0, 80): evaluated = eval(code_comp) target = math.sqrt((a*a)+(b*b)) rmse_accum += (target, evaluated) return rmse_accum.getRMSE() def main_run(): genome = GTree.GTreeGP() genome.setParams(max_depth=4, method="ramped") genome.evaluator += eval_func genome.mutator.set(Mutators.GTreeGPMutatorSubtree) ga = GSimpleGA.GSimpleGA(genome, seed=666) ga.setParams(gp_terminals = ['a', 'b'], gp_function_prefix = "gp") ga.setMinimax(Consts.minimaxType["minimize"]) ga.setGenerations(20) ga.setCrossoverRate(1.0) ga.setMutationRate(0.08) ga.setPopulationSize(800) ga.setMultiProcessing(True) ga(freq_stats=5) best = ga.bestIndividual() if __name__ == "__main__": main_run()

正如你所看到的,人口规模是800个人有8%的突变率和简单的20代进化100%的跨越速度。在健身景观当然,你并不需要这么多点的,我用2.500+点创建一个CPU密集型的健身功能,否则,加速可小于1.0,由于开销的进程之间的通信。对于第一种情况(2.500分的健身景观),我有一个3.33x加速并为last case (6.400 points fitness landscape) I’ve got a3.28x加速。这些测试是在2个内核PC(英特尔酷睿2)执行。

使用遗传编程逼近丕号

PI

由于许多(或者在现实生活亚洲金博宝中哈哈很少)的人都知道,今天是丕逼近日!所以它的时间做出了贡献,以庆祝这个有趣的一天=)

我的贡献是用Python和Pyevolve近似Pi number运用遗传编程approach. I’ve created the functions gp_add(+), gp_sub(-), gp_div(/), gp_mul(*) and gp_sqrt (square root) to use as non-terminals of the GP. The fitness function is very simple too, it simple returns the absolute difference between the Pythonmath.pi和评估个人。我也用一个人口规模1K个体的8最大树深度和随机短暂的常量随机整数。我已经在运行GP约8分钟(40代)得到了最好的近似为3.1416185511,最好为3位数字,您可以改善它,并让它运行更多的时间来获得更好的近似值。

这是我与GP得到了公式(点击放大):

tree_pi

这里是脚本的输出:

最好的(0):3.1577998365错误:0.0162071829最好(10):3.1417973679错误:0.0002047143最好的(20):3.1417973679错误:0.0002047143最好(30):3.1417973679错误:0.0002047143最好(40):3.1416185511错误:0.0000258975  -  GenomeBase分数:0.000026健身:15751.020831 PARAMS:{ 'MAX_DEPTH':8 '方法': '倾斜'}插槽[计算器](计数:1)时隙[Initializator](计数:1)名称:GTreeGPInitializator  - 重量:0.50 DOC:本initializator接受后续参数:* MAX_DEPTH *树* *方法的方法的最大深度,接受“成长”或“满” .. versionadded :: 0.6 * GTreeGPInitializator *功能。时隙[的Mutator(计数:1)名称:GTreeGPMutatorSubtree  - 重量:0.50文档:GTreeGP,子树的Mutator的增变.. versionadded :: 0.6 * * GTreeGPMutatorSubtree功能插槽[交叉](计数:1)名称:GTreeGPCrossoverSinglePoint  - 重量:0.50  -  GTree高度:8节点:21 GTreeNodeBase [童车= 1]  -  [gp_sqrt] GTreeNodeBase [童车= 2]  -  [gp_div] GTreeNodeBase [童车= 2]  -  [gp_add] GTreeNodeBase [童车= 0]  -  [26]GTreeNodeBase [童车= 2]  -  [gp_div] GTreeNodeBase [童车= 2]  -  [gp_mul] GTreeNodeBase [童车= 2]  -  [gp_add] GTreeNodeBase [童车= 2]  -  [gp_sub] GTreeNodeBase [童车= 0]  -  [34]GTreeNodeBase [童车= 2]  -  [gp_sub] GTreeNodeBase [童车= 0]  -  [44] GTreeNodeBase [童车= 0]  -  [1] GTreeNodeBase [童车= 2]  -  [gp_mul] GTreeNodeBase [童车= 0]  -  [49]GTreeNodeBase [童车= 0]  -  [43] GTreeNodeBase [童车= 1]  -  [gp_sqrt] GTreeNodeBase [童车= 0]  -  [18] GTreeNodeBase [童车= 0]  -  [16] GTreeNodeBase [童车= 2]  -  [gp_add]GTreeNodeBase [童车= 0]  -  [24] GTreeNodeBase [童车= 0]  -  [35]  -  GTReeGP表达式:gp_sqrt(gp_div(gp_add(26,gp_div(gp_mul(gp_add(gp_sub(34,gp_sub(44,1)),gp_mul(49,43)),gp_sqrt(18)),16)),gp_add(24,35)))

最后,这里是源代码:

from __future__ import division from pyevolve import * import math def gp_add(a, b): return a+b def gp_sub(a, b): return a-b def gp_div(a, b): return 1 if b==0 else a/b def gp_mul(a, b): return a*b def gp_sqrt(a): return math.sqrt(abs(a)) def eval_func(chromosome): code_comp = chromosome.getCompiledCode() ret = eval(code_comp) return abs(math.pi - ret) def step_callback(engine): gen = engine.getCurrentGeneration() if gen % 10 == 0: best = engine.bestIndividual() best_pi = eval(best.getCompiledCode()) print "Best (%d): %.10f" % (gen, best_pi) print "\tError: %.10f" % (abs(math.pi - best_pi)) return False def main_run(): genome = GTree.GTreeGP() genome.setParams(max_depth=8, method="ramped") genome.evaluator += eval_func ga = GSimpleGA.GSimpleGA(genome) ga.setParams(gp_terminals = ['ephemeral:random.randint(1, 50)'], gp_function_prefix = "gp") ga.setMinimax(Consts.minimaxType["minimize"]) ga.setGenerations(50000) ga.setCrossoverRate(1.0) ga.setMutationRate(0.09) ga.setPopulationSize(1000) ga.stepCallback.set(step_callback) ga.evolve() best = ga.bestIndividual() best.writeDotImage("tree_pi.png") print best if __name__ == "__main__": main_run()

如果你有兴趣,为什么今天是丕逼近一天,看到一些资源:

小卡通

一些背景历史

有些皮逼近

遗传编程和Flex布局

为了展示遗传编程的Pyevolve可以灵活,我已经做用一个简单的例子Adobe Flex的和Pyevolve,这个例子只是为了说明如何演变某种Flex的布局,我已经未实现健身功能,这个例子仅仅使用创建一个随机的Flex布局MXML。所以,这里是例子的代码Pyevolve:

import random from pyevolve import * def gp_hbox(x, y): return "%s %s" % (x,y) def gp_vbox(x, y): return "%s %s" % (x,y) def gp_panel(x, y): return "%s %s" % (x,y) def eval_func(chromosome): code_comp = chromosome.getCompiledCode() for a in xrange(0, 5): for b in xrange(0, 5): evaluated = eval(code_comp) return random.randint(1,100) def main_run(): genome = GTree.GTreeGP() genome.setParams(max_depth=5, method="ramped") genome.evaluator += eval_func ga = GSimpleGA.GSimpleGA(genome) button = repr("") label = repr("") text_input = repr("") ga.setParams(gp_terminals = [button, label, text_input], gp_function_prefix = "gp") ga.setMinimax(Consts.minimaxType["minimize"]) ga.evolve(freq_stats=5) print ga.bestIndividual() if __name__ == "__main__": main_run()

As you can see, I’ve created the layout tags like HBox, VBox and Panel as functions of GP and the Button, Labe, TextInput as terminals of the GP, the result is very funny, it’s just a random layout, but you can use your imagination to create some nice and interesting fitness functions.

下面是从人口随机个体所产生的SWF:

我希望你喜欢=)