diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023190638447.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023190638447.png new file mode 100644 index 00000000..d2833aa3 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023190638447.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023190739105.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023190739105.png new file mode 100644 index 00000000..4e4b5f37 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023190739105.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023191023520.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023191023520.png new file mode 100644 index 00000000..cc428c3f Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023191023520.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023191418964.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023191418964.png new file mode 100644 index 00000000..9334ee22 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023191418964.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023192202652.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023192202652.png new file mode 100644 index 00000000..10bdc009 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023192202652.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023193003109.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023193003109.png new file mode 100644 index 00000000..a54a6dd4 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241023193003109.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024003015103.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024003015103.png new file mode 100644 index 00000000..426f0292 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024003015103.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024003306137.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024003306137.png new file mode 100644 index 00000000..9dac6973 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024003306137.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024004057004.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024004057004.png new file mode 100644 index 00000000..0b5ca8f7 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024004057004.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024004534988.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024004534988.png new file mode 100644 index 00000000..97f6f42f Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024004534988.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024155638518.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024155638518.png new file mode 100644 index 00000000..94c02067 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024155638518.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024155644457.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024155644457.png new file mode 100644 index 00000000..94c02067 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024155644457.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024223736231.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024223736231.png new file mode 100644 index 00000000..789c393d Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024223736231.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024223821441.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024223821441.png new file mode 100644 index 00000000..893a12ae Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024223821441.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024224556795.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024224556795.png new file mode 100644 index 00000000..9a321561 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024224556795.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024231404936.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024231404936.png new file mode 100644 index 00000000..8ed0d3aa Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024231404936.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024233326719.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024233326719.png new file mode 100644 index 00000000..0d53fe62 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241024233326719.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025000808785.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025000808785.png new file mode 100644 index 00000000..441e171d Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025000808785.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025000830757.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025000830757.png new file mode 100644 index 00000000..2aca45bb Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025000830757.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025001912800.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025001912800.png new file mode 100644 index 00000000..a8436fc4 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025001912800.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025002541833.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025002541833.png new file mode 100644 index 00000000..def86df8 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025002541833.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025002636784.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025002636784.png new file mode 100644 index 00000000..a55e44e6 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025002636784.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025163555325.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025163555325.png new file mode 100644 index 00000000..13991297 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025163555325.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025165407625.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025165407625.png new file mode 100644 index 00000000..7caa4002 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025165407625.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025165929249.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025165929249.png new file mode 100644 index 00000000..34f8b95d Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241025165929249.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241114221453686.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241114221453686.png new file mode 100644 index 00000000..76e4ab03 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241114221453686.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241114221542442.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241114221542442.png new file mode 100644 index 00000000..e1bd91ba Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241114221542442.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241120215102033.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241120215102033.png new file mode 100644 index 00000000..7d30ba08 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241120215102033.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204212515783.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204212515783.png new file mode 100644 index 00000000..115999dd Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204212515783.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204212755148.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204212755148.png new file mode 100644 index 00000000..f8719e1a Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204212755148.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204213921661.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204213921661.png new file mode 100644 index 00000000..fdada5ae Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204213921661.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204214306982.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204214306982.png new file mode 100644 index 00000000..36c8f563 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204214306982.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204223014677.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204223014677.png new file mode 100644 index 00000000..61922fdb Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204223014677.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204223209935.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204223209935.png new file mode 100644 index 00000000..bdac8269 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204223209935.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204224230809.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204224230809.png new file mode 100644 index 00000000..31b061ab Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204224230809.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204224314354.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204224314354.png new file mode 100644 index 00000000..63a7a5e6 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204224314354.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204225741522.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204225741522.png new file mode 100644 index 00000000..4d39126b Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241204225741522.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205210425837.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205210425837.png new file mode 100644 index 00000000..8eaac3b4 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205210425837.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205211007895.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205211007895.png new file mode 100644 index 00000000..822996b2 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205211007895.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205213546698.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205213546698.png new file mode 100644 index 00000000..cfae7332 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205213546698.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205213844749.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205213844749.png new file mode 100644 index 00000000..e01dc1b8 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205213844749.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205214725649.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205214725649.png new file mode 100644 index 00000000..ae81ca8d Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205214725649.png differ diff --git a/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205214751624.png b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205214751624.png new file mode 100644 index 00000000..48fc1d17 Binary files /dev/null and b/WeeklyReports/Hackathon_7th/08_zty-king/images/image-20241205214751624.png differ diff --git "a/WeeklyReports/Hackathon_7th/08_zty-king/images/vpp\345\216\273\345\260\276.png" "b/WeeklyReports/Hackathon_7th/08_zty-king/images/vpp\345\216\273\345\260\276.png" new file mode 100644 index 00000000..bd1ba9ed Binary files /dev/null and "b/WeeklyReports/Hackathon_7th/08_zty-king/images/vpp\345\216\273\345\260\276.png" differ diff --git "a/WeeklyReports/Hackathon_7th/08_zty-king/vpp\345\216\273\345\260\276\345\210\207\345\210\206\347\255\226\347\225\245_2024.12.5.md" "b/WeeklyReports/Hackathon_7th/08_zty-king/vpp\345\216\273\345\260\276\345\210\207\345\210\206\347\255\226\347\225\245_2024.12.5.md" new file mode 100644 index 00000000..ef103bb2 --- /dev/null +++ "b/WeeklyReports/Hackathon_7th/08_zty-king/vpp\345\216\273\345\260\276\345\210\207\345\210\206\347\255\226\347\225\245_2024.12.5.md" @@ -0,0 +1,117 @@ +# Vpp去尾工作 + +## 1.检查当前是否支持模型层数为(vpp_degree*pp_degree-1) + +### 1.1 追溯报错源头 + +运行参数为vpp_degree=2,pp_degree=2,hidden_layer=7 + +![image-20241204212515783](images/image-20241204212515783.png) + +报错,有assert检查,追溯相关代码, + +image-20241204212755148 + +可以看到,这里做了一个判断,即切分结构名称的个数必须是num_chunks的整数倍,同时发现一些定义,切分方式、切分结构名称等 + +## 2.运行参数为vpp_degree=2,pp_degree=2,hidden_layer=8,进行追溯 + +### 2.1 追溯seg_method + +先追溯seg_method + +image-20241204213921661 + +image-20241204214306982 + +切分方式,vpp默认为llamaDecoderLayerAuto + +### 2.2 追溯seg_struct_names + +![image-20241204223014677](images/image-20241204223014677.png) + +image-20241204224230809 + +![image-20241204224314354](images/image-20241204224314354.png) + +​ 从_extract_seg_method解读发现,该函数主要是根据正则匹配表达式,匹配struct_name为seg_method(忽略大小写)的,也就是匹配算子的struct_name为llamaDecoderLayerAuto的,再看_get_seg_struct_names发现,forward算子的开始index即对所有算子从前往后找,找到第一个llamaDecoderLayerAuto,而结束的index则是从后往前找,找到第一个llamaDecoderLayerAuto,打印出来可以看到就是我们训练时设置的hidden layer的个数![image-20241204225741522](images/image-20241204225741522.png) + +### 2.3 追溯vpp_reshard部分的核心代码complete_chunk_id + +#### step1: + +![image-20241205210425837](images/image-20241205210425837.png) + +​ 获取一些重要的元素,其中sub_process_meshes为子进程网格,即如何将计算分布到多个设备上 + +获取所有算子即ops + + + +#### step2,3: + +![image-20241205211007895](images/image-20241205211007895.png) + +如果用户没有指定的分配方法,则使用vpp strategy,这里按vpp的思想获取seg_pp_stages,seg_chunk_ids,seg_layer_num效果如下: + +![image-20241205213546698](images/image-20241205213546698.png) + +​ 块需要按照设备交错的方式进行分布,即每个设备上的块号,相互之间是一定不会相同的,在此场景下,就是例如:0号设备,分别放0号和1号块,每个块放2层,控制逻辑还在后面的代码中,此处不详细记录。 + +​ 第一层的op起始index从0开始,最后一层的op终止index从最后一个op的index结束,其余的op则遍历op列表,直到找到第一个当前op的struct_name=当前层的名称,即记录每一层的最开始的那个op的index,如下: + +![image-20241205213844749](images/image-20241205213844749.png) + +#### step4: + +image-20241205214725649 + +image-20241205214751624 + +​ 在这一步,是进行vpp reshard的主要逻辑,start_index和stop_index记录的是这一块中的所有层的起始index和终止index,因为seg_pp_stage是pp_degree对num_chunks求余得到的,所以按照num_chunks循环的时候,就遵循vpp原则,依次将chunk按照pp_stage递增的顺序放置,并且不断循环,从而达到每个"0"块放一遍所有的设备,再将每个1块放一遍所有的设备。 + +## 3.开发思路 + + + +### 3.1 更改assert设置 + +​ 支持 整除或者(len(seg_struct_names)+1) %num_chunks ,即支持少一层的操作,且当为少一层的操作时,此时num_chunks数不能等于len(seg_struct_names)+1,即块数不能等于层数+1,否则少一层,则有一个块是空的,当前暂不支持。 + +``` +assert ( + (len(seg_struct_names) % num_chunks == 0) or ((len(seg_struct_names)+1) % num_chunks == 0 and (len(seg_struct_names)+1) // num_chunks != 1) + ) +``` + +### 3.2 少一层的设计思路 + +vpp去尾 + +​ 可以看到,首先左边的图,即pp少一层后,我们先分析哪些数值会变化影响结果: + +``` + sub_process_meshes = get_sub_process_mesh_by_program(dist_program) + pp_degree = pipeline_strategy.pp_degree + vpp_degree = pipeline_strategy.vpp_degree + seg_method = pipeline_strategy.vpp_seg_method + schedule_mode = pipeline_strategy.schedule_mode + num_chunks = pp_degree * vpp_degree + seg_pp_stages = [i % pp_degree for i in range(num_chunks)] + seg_chunk_ids = [i // pp_degree for i in range(num_chunks)] + seg_layer_num = len(seg_struct_names) // num_chunks +``` + +​ 观察重要参数`seg_pp_stages`、`seg_chunk_ids`、`seg_layer_num`,其中`seg_pp_stages`为[0,1,0,1]不变,`seg_chunk_ids`为[0,0,1,1]不变,`seg_layer_num`发生了变化,即从`2`变成了`1`,而根据我们的图示可以看到,我们此时希望的仍然是`seg_layer_num`为`2`,只有最后一个chunk的`seg_layer_num`为1,因此,我们将此处seg_layer_num给换掉,让其和块对应起来,即如下代码: + +``` +seg_layer_num=[0]*num_chunks#记录每个块里面的层数 +for j in len(0,seg_struct_names):#把层数分别分给每个块,保证每个块至少先被分一层,按此逻辑可以支持少多层,只要保证每个块至少一层 + i=j%num_chunks + seg_layer_num[i]=seg_layer_num[i]+1 + +``` + +把后续的所有seg_layer_num用seg_layer_num[seg_id]代替即可 + +​ 带入右边的情况验证,同样支持。