介绍
这是2020年的帖子,所以新年快乐给你所有!
我LLVM一个巨大的风扇,因为11年前,当我开始玩它来JIT数据结构如AVLS,然后稍后JIT限制AST树并从TensorFlow图JIT本机代码。此后,LLVM演变成最重要的编译器框架的生态系统之一,是由很多重要的开源项目采用了时下。
一个很酷的项目,我最近才知道的就是Gandiva。Gandiva被开发Dremio再后来donated to Apache Arrow(荣誉给Dremio团队为)。Gandiva的主要思想是,它提供了一个编译器生成LLVM IR可以在分批操作Apache的箭。Gandiva被用C ++编写,并配有很多实现构建表达式树,可以是使用JIT'ed LLVM不同的功能。这种设计的一个很好的特性是,它可以使用LLVM来自动优化复杂的表达式,增加了原生的目标平台矢量如AVX同时箭批量操作和执行本机代码,以计算表达式。
The image below gives an overview of Gandiva:

在this post I’ll build a very simple expression parser supporting a limited set of operations that I will use to filter a Pandas DataFrame.
建设有Gandiva简单的表达
在this section I’ll show how to create a simple expression manually using tree builder from Gandiva.
使用Gandiva Python绑定到JIT和表达
建立我们的解析器和表达式生成器表达式之前,让我们手动建立与Gandiva一个简单的表达。首先,我们将创建一个简单的熊猫数据框以数字从0.0到9.0:
进口熊猫作为PD进口pyarrow为PA进口pyarrow.gandiva作为gandiva#创建一个简单的熊猫数据帧DF = pd.DataFrame({ “×”:[1.0 * I为i的范围(10)]})表= pa.Table.from_pandas(DF)架构= pa.Schema.from_pandas(DF)
我们转换的数据帧到Arrow Table, it is important to note that in this case it was a zero-copy operation, Arrow isn’t copying data from Pandas and duplicating the DataFrame. Later we get theschema
从表中,包含列类型和其他元数据。
在那之后,我们要使用Gandiva建立下面的表达式来过滤数据:
(X> 2.0)和(x <6.0)
这个表达式将使用节点从Gandiva可以了:
builder = gandiva.TreeExprBuilder() # Reference the column "x" node_x = builder.make_field(table.schema.field("x")) # Make two literals: 2.0 and 6.0 two = builder.make_literal(2.0, pa.float64()) six = builder.make_literal(6.0, pa.float64()) # Create a function for "x > 2.0" gt_five_node = builder.make_function("greater_than", [node_x, two], pa.bool_()) # Create a function for "x < 6.0" lt_ten_node = builder.make_function("less_than", [node_x, six], pa.bool_()) # Create an "and" node, for "(x > 2.0) and (x < 6.0)" and_node = builder.make_and([gt_five_node, lt_ten_node]) # Make the expression a condition and create a filter condition = builder.make_condition(and_node) filter_ = gandiva.make_filter(table.schema, condition)
This code now looks a little more complex but it is easy to understand. We are basically creating the nodes of a tree that will represent the expression we showed earlier. Here is a graphical representation of what it looks like:
检查所生成的LLVM IR
不幸的是,还没有找到一种方法来转储用箭头的Python绑定生成LLVM IR,但是,我们可以只使用C ++ API构建相同的树,然后查看生成的LLVM IR:
自动field_x =字段( “X”,FLOAT32());自动模式=箭头::架构({field_x});自动node_x = TreeExprBuilder :: MakeField(field_x);汽车2 = TreeExprBuilder :: MakeLiteral((则float_t)2.0);汽车6 = TreeExprBuilder :: MakeLiteral((则float_t)6.0);自动gt_five_node = TreeExprBuilder :: MakeFunction( “GREATER_THAN”,{node_x,二},箭头::布尔());自动lt_ten_node = TreeExprBuilder :: MakeFunction( “LESS_THAN”,{node_x,六},箭头::布尔());自动and_node = TreeExprBuilder :: MakeAnd({gt_five_node,lt_ten_node});自动条件= TreeExprBuilder :: MakeCondition(and_node);的std :: shared_ptr的<过滤>过滤器; auto status = Filter::Make(schema, condition, TestConfiguration(), &filter);
上面的代码是一样的Python代码,但是using the C++ Gandiva API. Now that we built the tree in C++, we can get the LLVM Module and dump the IR code for it. The generated IR is full of boilerplate code and the JIT’ed functions from the Gandiva registry, however the important parts are show below:
;Function Attrs: alwaysinline norecurse nounwind readnone ssp uwtable define internal zeroext i1 @less_than_float32_float32(float, float) local_unnamed_addr #0 { %3 = fcmp olt float %0, %1 ret i1 %3 } ; Function Attrs: alwaysinline norecurse nounwind readnone ssp uwtable define internal zeroext i1 @greater_than_float32_float32(float, float) local_unnamed_addr #0 { %3 = fcmp ogt float %0, %1 ret i1 %3 } (...) %x = load float, float* %11 %greater_than_float32_float32 = call i1 @greater_than_float32_float32(float %x, float 2.000000e+00) (...) %x11 = load float, float* %15 %less_than_float32_float32 = call i1 @less_than_float32_float32(float %x11, float 6.000000e+00)
正如你所看到的,在IR我们可以看到调用功能less_than_float32_float_32
和greater_than_float32_float32
这是(在这种情况下很简单的)Gandiva功能做浮动比亚洲金博宝较。通过查看函数名前缀注意函数的专业化。
什么是颇为有趣的是,LLVM将适用于所有的优化在这个代码,它会为目标平台的高效的本地代码同时戈黛娃和LLVM将采取确保内存对齐将成为扩展,如AVX用于正确的护理矢量。
这IR代码我发现是不是真正执行了一个,但优化的一个。和在优化的一个我们可以看到,内联LLVM的功能,如显示在下面的优化代码的一部分:
%x.us =负载浮子,浮子*%10,对准4%11 = FCMP OGT浮子%x.us,2.000000e + 00%12 = FCMP OLT浮子%x.us,6.000000e + 00%not.or。COND =和I1%12%11
你可以看到,表达的是现在简单多了优化后的LLVM应用其强大的优化和内联很多Gandiva funcions的。
建设有Gandiva一个熊猫过滤器表达式JIT
现在,我们希望能够实现,因为大熊猫类似的东西DataFrame.query()
使用Gandiva功能。我们将面临的第一个问题是,我们需要分析一个字符串,如(X> 2.0)和(x <6.0)
,以后我们将不得不建立使用从Gandiva树构建的Gandiva表达式树,然后评估上箭头的数据表达。
Now, instead of implementing a full parsing of the expression string, I’ll use the Python AST module to parse valid Python code and build an Abstract Syntax Tree (AST) of that expression, that I’ll be later using to emit the Gandiva/LLVM nodes.
The heavy work of parsing the string will be delegated to Python AST module and our work will be mostly walking on this tree and emitting the Gandiva nodes based on that syntax tree. The code for visiting the nodes of this Python AST tree and emitting Gandiva nodes is shown below:
class LLVMGandivaVisitor(ast.NodeVisitor): def __init__(self, df_table): self.table = df_table self.builder = gandiva.TreeExprBuilder() self.columns = {f.name: self.builder.make_field(f) for f in self.table.schema} self.compare_ops = { "Gt": "greater_than", "Lt": "less_than", } self.bin_ops = { "BitAnd": self.builder.make_and, "BitOr": self.builder.make_or, } def visit_Module(self, node): return self.visit(node.body[0]) def visit_BinOp(self, node): left = self.visit(node.left) right = self.visit(node.right) op_name = node.op.__class__.__name__ gandiva_bin_op = self.bin_ops[op_name] return gandiva_bin_op([left, right]) def visit_Compare(self, node): op = node.ops[0] op_name = op.__class__.__name__ gandiva_comp_op = self.compare_ops[op_name] comparators = self.visit(node.comparators[0]) left = self.visit(node.left) return self.builder.make_function(gandiva_comp_op, [left, comparators], pa.bool_()) def visit_Num(self, node): return self.builder.make_literal(node.n, pa.float64()) def visit_Expr(self, node): return self.visit(node.value) def visit_Name(self, node): return self.columns[node.id] def generic_visit(self, node): return node def evaluate_filter(self, llvm_mod): condition = self.builder.make_condition(llvm_mod) filter_ = gandiva.make_filter(self.table.schema, condition) result = filter_.evaluate(self.table.to_batches()[0], pa.default_memory_pool()) arr = result.to_array() pd_result = arr.to_numpy() return pd_result @staticmethod def gandiva_query(df, query): df_table = pa.Table.from_pandas(df) llvm_gandiva_visitor = LLVMGandivaVisitor(df_table) mod_f = ast.parse(query) llvm_mod = llvm_gandiva_visitor.visit(mod_f) results = llvm_gandiva_visitor.evaluate_filter(llvm_mod) return results
正如你所看到的,它的代码,我不支持每一个可能的Python表达式,但它的一个子集轻微非常简单。亚洲金博宝我们做这个班什么是基本的比较和BinOps(二元运算)的Gandiva节点的Python AST的转换节点,。我也正在改变的语义&
和|
operators to represent AND and OR respectively, such as in Pandasquery()
function.
Register as a Pandas extension
下一步是使用创建一个简单的熊猫扩展gandiva_query()
方法,我们创建了:
@ pd.api.extensions.register_dataframe_accessor( “gandiva”)类GandivaAcessor:高清__init __(自我,pandas_obj):self.pandas_obj = pandas_obj高清查询(个体经营,查询):返回LLVMGandivaVisitor.gandiva_query(self.pandas_obj,查询)
这就是它,现在我们可以使用这个扩展做的事情,例如:
df = pd.DataFrame({"a": [1.0 * i for i in range(nsize)]}) results = df.gandiva.query("a > 10.0")
正如我们已经注册了熊猫的扩展名为gandiva
that is now a first-class citizen of the Pandas DataFrames.
Let’s create now a 5 million floats DataFrame and use the newquery()
方法对其进行过滤:
DF = pd.DataFrame({ “一”:[1.0 * I为i的范围(50000000)]})df.gandiva.query( “一<4.0”)#这将输出:#阵列([0,1,2,3],D型细胞= UINT32)
Note that the returned values are the indexes satisfying the condition we implemented, so it is different than the Pandasquery()
that returns the data already filtered.
我做了一些基准测试,我们发现甘戴瓦ually always faster than Pandas, however I’ll leave proper benchmarks for a next post on Gandiva as this post was to show how you can use it to JIT expressions.
That’s it ! I hope you liked the post as I enjoyed exploring Gandiva. It seems that we will probably have more and more tools coming up with Gandiva acceleration, specially for SQL parsing/projection/JITing. Gandiva is much more than what I just showed, but you can get started now to understand more of its architecture and how to build the expression trees.
- 基督教S. Perone