-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NewComm] No.3 compatiable upgrade for global_scatter op #57161
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
新增的 grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0910 15:22:31.892735 53684 init.cc:101] Before Parse: argc is 2, Init commandline: dummy --tryfromenv=enable_record_memory,gemm_use_half_precision_compute_type,dygraph_debug,enable_new_ir_in_executor_trace_run,enable_opt_get_features,gpugraph_enable_hbm_table_collision_stat,apply_pass_to_program,einsum_opt,init_allocated_mem,inner_op_parallelism,sync_nccl_allreduce,new_ir_apply_inplace_pass,prim_enabled,pe_profile_fname,cpu_deterministic,rocksdb_path,gpugraph_debug_gpu_memory,selected_gpus,local_exe_sub_scope_limit,fuse_parameter_groups_size,free_when_no_cache_hit,enable_gpu_memory_usage_log_mb,embedding_deterministic,benchmark,use_autotune,stride_kernel_blacklist,tracer_mkldnn_ops_off,cudnn_deterministic,fraction_of_gpu_memory_to_use,get_host_by_name_time,enable_exit_when_partial_worker,eager_delete_tensor_gb,set_to_1d,low_precision_op_list,check_nan_inf,fuse_parameter_memory_size,enable_all2all_use_fp16,use_stream_safe_cuda_allocator,communicator_send_queue_size,use_system_allocator,enable_gpu_memory_usage_log,use_pinned_memory,enable_tracker_all2all,dist_threadpool_size,new_executor_use_inplace,auto_growth_chunk_size_in_mb,use_virtual_memory_auto_growth,executor_log_deps_every_microseconds,multiple_of_cupti_buffer_size,tensor_operants_mode,fleet_executor_with_standalone,tracer_profile_fname,npu_storage_format,gpugraph_dedup_pull_push_mode,new_executor_static_build,enable_api_kernel_fallback,gpugraph_merge_grads_segment_size,max_inplace_grad_add,communicator_max_merge_var_num,gpugraph_hbm_table_load_factor,nccl_blocking_wait,allreduce_record_one_event,gpugraph_sparse_table_storage_mode,use_fast_math,cudnn_exhaustive_search_times,gpugraph_slot_feasign_max_num,conv2d_disable_cudnn,use_mkldnn,call_stack_level,paddle_num_threads,enable_new_ir_in_executor,add_dependency_for_communication_op,run_kp_kernel,host_trace_level,enable_new_ir_api,gpugraph_storage_mode,memory_fraction_of_eager_deletion,eager_delete_scope,rpc_send_thread_num,gpugraph_enable_segment_merge_grads,graph_metapath_split_opt,graph_load_in_parallel,fraction_of_cpu_memory_to_use,new_executor_use_cuda_graph,check_kernel_launch,fast_eager_deletion_mode,sort_sum_gradient,jit_engine_type,communicator_is_sgd_optimizer,new_executor_sequential_run,enable_auto_detect_gpu_topo,initial_cpu_memory_in_mb,enable_cublas_tensor_op_math,gpugraph_enable_gpu_direct_access,check_nan_inf_level,cache_inference_while_scope,allocator_strategy,trt_ibuilder_cache,cublaslt_exhaustive_search_times,dynamic_static_unified_comm,search_cache_max_number,use_stride_kernel,convert_all_blocks,conv_workspace_size_limit,cudnn_exhaustive_search,new_executor_serial_run,tracer_mkldnn_ops_on,reallocate_gpu_memory_in_mb,enable_unused_var_check,cudnn_batchnorm_spatial_persistent,static_executor_perfstat_filepath,use_shm_cache,enable_auto_rdma_trans,reader_queue_speed_test_mode,new_executor_use_local_scope,initial_gpu_memory_in_mb,enable_sparse_inner_gather,gpu_allocator_retry_time,fraction_of_cuda_pinned_memory_to_use,gpugraph_load_node_list_into_hbm,graph_get_neighbor_id,free_idle_chunk,gpu_memory_limit_mb,new_executor_log_memory_stats,use_cuda_managed_memory,print_sub_graph_dir
I0910 15:22:31.892868 53684 init.cc:109] After Parse: argc is 2
I0910 15:22:32.643769 53684 tcp_store.cc:355] input timeout900, member timeout:900
I0910 15:22:32.643843 53684 tcp_utils.cc:181] The server starts to listen on IP_ANY:53296
I0910 15:22:32.643960 53684 tcp_utils.cc:130] Successfully connected to 127.0.0.1:53296
I0910 15:22:32.643970 53684 tcp_store.cc:404] TCPStore add.
I0910 15:22:32.644040 53723 tcp_store.cc:184] TCPStore: recv command: 0.
I0910 15:22:32.644105 53684 tcp_store.cc:371] _timeout:900
I0910 15:22:32.644120 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:32.644137 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:32.683939 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:32.683996 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:32.684005 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:32.724041 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:32.734146 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:32.734194 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:32.773941 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:32.774014 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:32.774022 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:32.814013 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:32.824121 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:32.824159 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:32.863989 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:32.864090 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:32.864097 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:32.904033 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:32.914140 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:32.914165 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:32.953970 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:32.954056 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:32.954063 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:32.994036 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.004145 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.004209 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.043974 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.044070 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.044076 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.083971 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.094053 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.094079 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.133942 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.134018 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.134027 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.173967 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.184048 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.184082 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.223942 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.224020 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.224027 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.264015 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.274125 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.274154 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.313941 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.314004 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.314013 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.353960 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.364042 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.364080 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.403939 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.404006 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.404016 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.443964 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.454047 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.454074 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.493942 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.494009 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.494015 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.534009 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.544124 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.544167 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.583936 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.584005 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.584012 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.623982 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.634066 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.634114 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.673944 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.674026 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.674033 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.713986 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.724081 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.724117 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.763937 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.764003 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.764011 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.803968 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.814050 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.814075 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.853940 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.854010 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.854018 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.893970 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.904058 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.904088 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:33.943964 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:33.944046 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:33.944053 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:33.984021 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:33.994132 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:33.994169 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.033968 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.034060 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.034066 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.074012 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.084126 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.084185 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.123939 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.124006 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.124013 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.163995 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.174078 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.174104 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.213976 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.214069 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.214076 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.253995 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.264106 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.264165 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.303941 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.304004 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.304005 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.343986 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.354069 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.354105 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.393939 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.394006 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.394013 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.433961 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.444041 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.444068 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.483968 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.484066 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.484071 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.523962 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.534047 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.534078 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.573936 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.574005 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.574014 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.613967 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.624056 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.624097 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.663947 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.664019 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.664026 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.703999 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.714102 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.714160 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.753978 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.754066 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.754067 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.793984 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.804093 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.804136 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.843941 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.844010 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.844018 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.884006 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.894115 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.894150 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:34.933954 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:34.934036 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:34.934043 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:34.974030 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:34.984156 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:34.984256 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:35.023998 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:35.024104 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:35.024111 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:35.064045 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:35.074146 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:35.074179 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:35.113996 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:35.114099 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:35.114105 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:35.153986 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:35.164081 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:35.164114 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:35.203940 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:35.204005 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:35.204012 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:35.244006 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:35.254091 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:35.254135 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:35.293967 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:35.294062 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:35.294067 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:35.333967 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:35.344059 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:35.344134 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:35.383939 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:35.384008 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:35.384016 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:35.423967 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:35.434057 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:35.434098 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:35.473938 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:35.474007 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:35.474014 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:35.514032 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:35.524118 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:35.524158 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:35.563936 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:35.564002 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:35.564009 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:35.603968 53684 tcp_store.cc:376] 1 worker ready, total 2, _timeout:900
I0910 15:22:35.614053 53684 tcp_store.cc:425] TCPStore wait.
I0910 15:22:35.614086 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:35.653949 53723 tcp_store.cc:161] TCPStore: wait reply (1) for key (/init/).
I0910 15:22:35.653988 53723 tcp_store.cc:184] TCPStore: recv command: 0.
I0910 15:22:35.654021 53723 tcp_store.cc:184] TCPStore: recv command: 1.
I0910 15:22:35.654021 53684 tcp_store.cc:419] TCPStore get.
I0910 15:22:35.693997 53684 tcp_store.cc:376] 2 worker ready, total 2, _timeout:900
I0910 15:22:35.693998 53723 tcp_store.cc:184] TCPStore: recv command: 3.
I0910 15:22:35.694027 53684 tcp_store.cc:400] TCPStore initialized.
I0910 15:22:35.694645 53684 dynamic_loader.cc:227] Try to find library: libnccl.so from default system path.
I0910 15:22:35.697638 53684 tcp_store.cc:411] TCPStore set.
I0910 15:22:35.697674 53723 tcp_store.cc:184] TCPStore: recv command: 2.
I0910 15:22:35.736965 53723 tcp_store.cc:92] TCPStore: notify the socket: 127.0.0.1:56784 that key: /NCCLCommContext/0581a4357a3dee491a2f0503cb7d60b1b is ready.
I0910 15:22:35.737056 53723 tcp_store.cc:184] TCPStore: recv command: 1.
W0910 15:22:36.585840 53684 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.7, Runtime API Version: 11.7
I0910 15:22:36.586236 53684 dynamic_loader.cc:227] Try to find library: libcudnn.so from default system path.
W0910 15:22:36.591588 53684 gpu_resources.cc:149] device: 0, cuDNN Version: 8.4.
I0910 15:22:39.954592 53684 dynamic_loader.cc:227] Try to find library: libcuda.so from default system path.
I0910 15:22:39.962185 53684 op_desc.cc:1096] CompileTime infer shape on split
I0910 15:22:39.964063 53684 op_desc.cc:1096] CompileTime infer shape on concat
I0910 15:22:39.964169 53684 infershape_utils.cc:547] BuildInferMetaContext: op kernel signature - Kernel Signature - name: concat; inputs: X; attributes: axis; outputs: Out
I0910 15:22:39.964838 53684 op_desc.cc:1096] CompileTime infer shape on alltoall
I0910 15:22:39.965381 53684 op_desc.cc:1096] CompileTime infer shape on split
I0910 15:22:39.965898 53684 op_desc.cc:1096] CompileTime infer shape on concat
I0910 15:22:39.965929 53684 infershape_utils.cc:547] BuildInferMetaContext: op kernel signature - Kernel Signature - name: concat; inputs: X; attributes: axis; outputs: Out
I0910 15:22:39.966588 53684 op_desc.cc:1096] CompileTime infer shape on global_scatter
I0910 15:22:39.967795 53684 conditional_block_op_helper.cc:108] Found conditional_block op num: 0, conditional_block_grad op num: 0
I0910 15:22:39.967818 53684 while_op_helper.cc:154] Found while op num: 0, while grad op num: 0
I0910 15:22:39.967839 53684 recurrent_op_helper.cc:259] Found recurrent op num: 0, recurrent grad op num: 0
I0910 15:22:39.968827 53684 program_interpreter.cc:136] New Executor is Running.
I0910 15:22:39.968842 53684 interpreter_util.cc:1110] Creating Variables
I0910 15:22:39.968868 53684 scope.cc:203] Create variable feed
I0910 15:22:39.968890 53684 interpreter_util.cc:1142] Create Variable feed global, which pointer is 0x55f740428010 type is 9
I0910 15:22:39.968911 53684 scope.cc:203] Create variable fetch
I0910 15:22:39.968926 53684 interpreter_util.cc:1142] Create Variable fetch global, which pointer is 0x55f73a90b520 type is 10
I0910 15:22:39.968942 53684 interpreter_util.cc:567] Static build: 0
I0910 15:22:39.968946 53684 conditional_block_op_helper.cc:108] Found conditional_block op num: 0, conditional_block_grad op num: 0
I0910 15:22:39.968951 53684 while_op_helper.cc:154] Found while op num: 0, while grad op num: 0
I0910 15:22:39.968955 53684 recurrent_op_helper.cc:259] Found recurrent op num: 0, recurrent grad op num: 0
I0910 15:22:39.971614 53684 op_desc.cc:1096] CompileTime infer shape on fetch_v2
I0910 15:22:39.972440 53684 conditional_block_op_helper.cc:108] Found conditional_block op num: 0, conditional_block_grad op num: 0
I0910 15:22:39.972462 53684 while_op_helper.cc:154] Found while op num: 0, while grad op num: 0
I0910 15:22:39.972467 53684 recurrent_op_helper.cc:259] Found recurrent op num: 0, recurrent grad op num: 0
I0910 15:22:39.976342 53684 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6733c01200), and remaining 0
I0910 15:22:39.976502 53684 feed_fetch_method.cc:39] SetFeedVariable name=feed index=1
I0910 15:22:39.976624 53684 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6733c01400), and remaining 0
I0910 15:22:39.978617 53684 feed_fetch_method.cc:39] SetFeedVariable name=feed index=0
I0910 15:22:39.978644 53684 interpreter_util.cc:1110] Creating Variables
I0910 15:22:39.978660 53684 scope.cc:203] Create variable alltoall_0.tmp_0
I0910 15:22:39.978667 53684 interpreter_util.cc:1147] Create Variable alltoall_0.tmp_0 locally, which pointer is 0x55f73a927ce0 type is 7
I0910 15:22:39.978679 53684 scope.cc:203] Create variable concat_0.tmp_0
I0910 15:22:39.978683 53684 interpreter_util.cc:1147] Create Variable concat_0.tmp_0 locally, which pointer is 0x55f73a9251b0 type is 7
I0910 15:22:39.978688 53684 scope.cc:203] Create variable concat_1.tmp_0
I0910 15:22:39.978691 53684 interpreter_util.cc:1147] Create Variable concat_1.tmp_0 locally, which pointer is 0x55f73d8679f0 type is 7
I0910 15:22:39.978695 53684 interpreter_util.cc:1142] Create Variable feed global, which pointer is 0x55f740428010 type is 9
I0910 15:22:39.978700 53684 interpreter_util.cc:1142] Create Variable fetch global, which pointer is 0x55f73a90b520 type is 10
I0910 15:22:39.978704 53684 scope.cc:203] Create variable global_scatter_0.tmp_0
I0910 15:22:39.978709 53684 interpreter_util.cc:1147] Create Variable global_scatter_0.tmp_0 locally, which pointer is 0x55f73a928d40 type is 7
I0910 15:22:39.978713 53684 scope.cc:203] Create variable local_expert_count
I0910 15:22:39.978716 53684 interpreter_util.cc:1147] Create Variable local_expert_count locally, which pointer is 0x55f74050e410 type is 7
I0910 15:22:39.978721 53684 scope.cc:203] Create variable local_input_buf
I0910 15:22:39.978724 53684 interpreter_util.cc:1147] Create Variable local_input_buf locally, which pointer is 0x55f73a2bd520 type is 7
I0910 15:22:39.978729 53684 scope.cc:203] Create variable split_0.tmp_0
I0910 15:22:39.978732 53684 interpreter_util.cc:1147] Create Variable split_0.tmp_0 locally, which pointer is 0x55f74050eed0 type is 7
I0910 15:22:39.978742 53684 scope.cc:203] Create variable split_0.tmp_1
I0910 15:22:39.978746 53684 interpreter_util.cc:1147] Create Variable split_0.tmp_1 locally, which pointer is 0x55f74050f370 type is 7
I0910 15:22:39.978750 53684 scope.cc:203] Create variable split_1.tmp_0
I0910 15:22:39.978753 53684 interpreter_util.cc:1147] Create Variable split_1.tmp_0 locally, which pointer is 0x55f74050f810 type is 7
I0910 15:22:39.978757 53684 scope.cc:203] Create variable split_1.tmp_1
I0910 15:22:39.978761 53684 interpreter_util.cc:1147] Create Variable split_1.tmp_1 locally, which pointer is 0x55f74050fd40 type is 7
I0910 15:22:39.978894 53684 interpreter_util.cc:567] Static build: 0
I0910 15:22:39.978899 53684 conditional_block_op_helper.cc:108] Found conditional_block op num: 0, conditional_block_grad op num: 0
I0910 15:22:39.978909 53684 while_op_helper.cc:154] Found while op num: 0, while grad op num: 0
I0910 15:22:39.978914 53684 recurrent_op_helper.cc:259] Found recurrent op num: 0, recurrent grad op num: 0
I0910 15:22:39.979454 53684 operator.cc:2207] op type:feed, expected_kernel_key:{data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.979519 53684 interpreter_util.cc:792] feed : finally selected kernel_key: {data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.979707 53684 operator.cc:2207] op type:feed, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.979714 53684 interpreter_util.cc:792] feed : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.979879 53684 operator.cc:2207] op type:split, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.979894 53684 interpreter_util.cc:792] split : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.979997 53684 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6733c01600), and remaining 0
I0910 15:22:39.980037 53684 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6733c01800), and remaining 0
I0910 15:22:39.980255 53684 operator.cc:2207] op type:concat, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.980266 53684 interpreter_util.cc:792] concat : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.980280 53684 infershape_utils.cc:547] BuildInferMetaContext: op kernel signature - Kernel Signature - name: concat; inputs: X; attributes: axis; outputs: Out
I0910 15:22:39.980350 53684 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6733c01a00), and remaining 0
I0910 15:22:39.980427 53684 operator.cc:2207] op type:alltoall, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.980440 53684 interpreter_util.cc:792] alltoall : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.980484 53684 alltoall_op.cu.cc:76] new comm_context_manager has rid 0
I0910 15:22:39.980567 53684 nccl_comm_context.cc:150] rank 0 send 2 to 0
I0910 15:22:39.980578 53684 nccl_comm_context.cc:170] rank 0 recv 2 from 0
I0910 15:22:39.980587 53684 nccl_comm_context.cc:150] rank 0 send 2 to 1
I0910 15:22:39.980592 53684 nccl_comm_context.cc:170] rank 0 recv 2 from 1
I0910 15:22:39.982601 53684 alltoall_op.cu.cc:113] new comm_context_manager has rid 0
I0910 15:22:39.982666 53684 operator.cc:2207] op type:split, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.982681 53684 interpreter_util.cc:792] split : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.982769 53684 operator.cc:2207] op type:concat, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.982777 53684 interpreter_util.cc:792] concat : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.982790 53684 infershape_utils.cc:547] BuildInferMetaContext: op kernel signature - Kernel Signature - name: concat; inputs: X; attributes: axis; outputs: Out
I0910 15:22:39.982868 53684 operator.cc:2207] op type:global_scatter, expected_kernel_key:{data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.982880 53684 interpreter_util.cc:792] global_scatter : finally selected kernel_key: {data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0910 15:22:39.982947 53684 tensor_util.cc:309] TensorCopySync 4 from Place(gpu:0) to Place(cpu)
I0910 15:22:39.983554 53684 tensor_util.cc:309] TensorCopySync 4 from Place(gpu:0) to Place(cpu)
I0910 15:22:39.983585 53684 global_scatter_op.cu.cc:112] new comm_context_manager has ring_id 0 rank1: grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0910 15:22:31.887579 53686 init.cc:101] Before Parse: argc is 2, Init commandline: dummy --tryfromenv=auto_growth_chunk_size_in_mb,gpugraph_merge_grads_segment_size,new_executor_serial_run,fast_eager_deletion_mode,new_executor_sequential_run,new_ir_apply_inplace_pass,dygraph_debug,enable_exit_when_partial_worker,gpu_memory_limit_mb,use_system_allocator,apply_pass_to_program,use_stride_kernel,fraction_of_gpu_memory_to_use,new_executor_use_local_scope,get_host_by_name_time,graph_load_in_parallel,enable_gpu_memory_usage_log,embedding_deterministic,tensor_operants_mode,gpugraph_enable_segment_merge_grads,use_virtual_memory_auto_growth,check_nan_inf,allreduce_record_one_event,communicator_send_queue_size,conv_workspace_size_limit,enable_new_ir_api,allocator_strategy,fuse_parameter_groups_size,tracer_mkldnn_ops_off,add_dependency_for_communication_op,einsum_opt,inner_op_parallelism,set_to_1d,initial_gpu_memory_in_mb,gpugraph_enable_hbm_table_collision_stat,jit_engine_type,gpugraph_load_node_list_into_hbm,sync_nccl_allreduce,fraction_of_cuda_pinned_memory_to_use,graph_metapath_split_opt,use_cuda_managed_memory,gpugraph_debug_gpu_memory,selected_gpus,communicator_is_sgd_optimizer,cublaslt_exhaustive_search_times,cudnn_deterministic,local_exe_sub_scope_limit,reallocate_gpu_memory_in_mb,fuse_parameter_memory_size,new_executor_log_memory_stats,free_idle_chunk,dynamic_static_unified_comm,convert_all_blocks,tracer_profile_fname,gpugraph_enable_gpu_direct_access,nccl_blocking_wait,cudnn_batchnorm_spatial_persistent,free_when_no_cache_hit,new_executor_use_inplace,new_executor_static_build,search_cache_max_number,print_sub_graph_dir,prim_enabled,fleet_executor_with_standalone,reader_queue_speed_test_mode,npu_storage_format,low_precision_op_list,benchmark,gpugraph_storage_mode,cudnn_exhaustive_search,pe_profile_fname,paddle_num_threads,max_inplace_grad_add,enable_opt_get_features,enable_new_ir_in_executor,enable_gpu_memory_usage_log_mb,enable_tracker_all2all,gpugraph_dedup_pull_push_mode,use_autotune,enable_record_memory,gemm_use_half_precision_compute_type,check_nan_inf_level,check_kernel_launch,use_stream_safe_cuda_allocator,enable_api_kernel_fallback,static_executor_perfstat_filepath,gpugraph_sparse_table_storage_mode,cache_inference_while_scope,initial_cpu_memory_in_mb,enable_new_ir_in_executor_trace_run,enable_all2all_use_fp16,rocksdb_path,executor_log_deps_every_microseconds,enable_cublas_tensor_op_math,gpugraph_hbm_table_load_factor,rpc_send_thread_num,enable_auto_rdma_trans,init_allocated_mem,memory_fraction_of_eager_deletion,use_pinned_memory,run_kp_kernel,multiple_of_cupti_buffer_size,fraction_of_cpu_memory_to_use,eager_delete_tensor_gb,enable_unused_var_check,graph_get_neighbor_id,use_fast_math,cpu_deterministic,trt_ibuilder_cache,sort_sum_gradient,stride_kernel_blacklist,cudnn_exhaustive_search_times,enable_sparse_inner_gather,call_stack_level,gpu_allocator_retry_time,tracer_mkldnn_ops_on,eager_delete_scope,use_mkldnn,communicator_max_merge_var_num,host_trace_level,conv2d_disable_cudnn,dist_threadpool_size,use_shm_cache,gpugraph_slot_feasign_max_num,new_executor_use_cuda_graph,enable_auto_detect_gpu_topo
I0910 15:22:31.887701 53686 init.cc:109] After Parse: argc is 2
I0910 15:22:32.642262 53686 tcp_store.cc:355] input timeout900, member timeout:900
I0910 15:22:32.642366 53686 tcp_utils.cc:107] Retry to connect to 127.0.0.1:53296 while the server is not yet listening.
I0910 15:22:35.642525 53686 tcp_utils.cc:130] Successfully connected to 127.0.0.1:53296
I0910 15:22:35.642560 53686 tcp_store.cc:404] TCPStore add.
I0910 15:22:35.654054 53686 tcp_store.cc:400] TCPStore initialized.
I0910 15:22:35.654320 53686 tcp_store.cc:425] TCPStore wait.
I0910 15:22:35.737076 53686 tcp_store.cc:419] TCPStore get.
I0910 15:22:35.737514 53686 dynamic_loader.cc:227] Try to find library: libnccl.so from default system path.
W0910 15:22:36.586409 53686 gpu_resources.cc:119] Please NOTE: device: 1, GPU Compute Capability: 8.0, Driver API Version: 11.7, Runtime API Version: 11.7
I0910 15:22:36.586889 53686 dynamic_loader.cc:227] Try to find library: libcudnn.so from default system path.
W0910 15:22:36.593832 53686 gpu_resources.cc:149] device: 1, cuDNN Version: 8.4.
I0910 15:22:39.878001 53686 dynamic_loader.cc:227] Try to find library: libcuda.so from default system path.
I0910 15:22:39.882344 53686 op_desc.cc:1096] CompileTime infer shape on split
I0910 15:22:39.883431 53686 op_desc.cc:1096] CompileTime infer shape on concat
I0910 15:22:39.883482 53686 infershape_utils.cc:547] BuildInferMetaContext: op kernel signature - Kernel Signature - name: concat; inputs: X; attributes: axis; outputs: Out
I0910 15:22:39.884037 53686 op_desc.cc:1096] CompileTime infer shape on alltoall
I0910 15:22:39.884549 53686 op_desc.cc:1096] CompileTime infer shape on split
I0910 15:22:39.885048 53686 op_desc.cc:1096] CompileTime infer shape on concat
I0910 15:22:39.885066 53686 infershape_utils.cc:547] BuildInferMetaContext: op kernel signature - Kernel Signature - name: concat; inputs: X; attributes: axis; outputs: Out
I0910 15:22:39.885666 53686 op_desc.cc:1096] CompileTime infer shape on global_scatter
I0910 15:22:39.886638 53686 conditional_block_op_helper.cc:108] Found conditional_block op num: 0, conditional_block_grad op num: 0
I0910 15:22:39.886660 53686 while_op_helper.cc:154] Found while op num: 0, while grad op num: 0
I0910 15:22:39.886673 53686 recurrent_op_helper.cc:259] Found recurrent op num: 0, recurrent grad op num: 0
I0910 15:22:39.887354 53686 program_interpreter.cc:136] New Executor is Running.
I0910 15:22:39.887367 53686 interpreter_util.cc:1110] Creating Variables
I0910 15:22:39.887387 53686 scope.cc:203] Create variable feed
I0910 15:22:39.887399 53686 interpreter_util.cc:1142] Create Variable feed global, which pointer is 0x55f127fb77c0 type is 9
I0910 15:22:39.887411 53686 scope.cc:203] Create variable fetch
I0910 15:22:39.887418 53686 interpreter_util.cc:1142] Create Variable fetch global, which pointer is 0x55f126f14240 type is 10
I0910 15:22:39.887432 53686 interpreter_util.cc:567] Static build: 0
I0910 15:22:39.887436 53686 conditional_block_op_helper.cc:108] Found conditional_block op num: 0, conditional_block_grad op num: 0
I0910 15:22:39.887440 53686 while_op_helper.cc:154] Found while op num: 0, while grad op num: 0
I0910 15:22:39.887444 53686 recurrent_op_helper.cc:259] Found recurrent op num: 0, recurrent grad op num: 0
I0910 15:22:39.889797 53686 op_desc.cc:1096] CompileTime infer shape on fetch_v2
I0910 15:22:39.890475 53686 conditional_block_op_helper.cc:108] Found conditional_block op num: 0, conditional_block_grad op num: 0
I0910 15:22:39.890491 53686 while_op_helper.cc:154] Found while op num: 0, while grad op num: 0
I0910 15:22:39.890496 53686 recurrent_op_helper.cc:259] Found recurrent op num: 0, recurrent grad op num: 0
I0910 15:22:39.894100 53686 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6767c01200), and remaining 0
I0910 15:22:39.894214 53686 feed_fetch_method.cc:39] SetFeedVariable name=feed index=1
I0910 15:22:39.894300 53686 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6767c01400), and remaining 0
I0910 15:22:39.896188 53686 feed_fetch_method.cc:39] SetFeedVariable name=feed index=0
I0910 15:22:39.896209 53686 interpreter_util.cc:1110] Creating Variables
I0910 15:22:39.896217 53686 scope.cc:203] Create variable alltoall_0.tmp_0
I0910 15:22:39.896224 53686 interpreter_util.cc:1147] Create Variable alltoall_0.tmp_0 locally, which pointer is 0x55f12821b560 type is 7
I0910 15:22:39.896232 53686 scope.cc:203] Create variable concat_0.tmp_0
I0910 15:22:39.896235 53686 interpreter_util.cc:1147] Create Variable concat_0.tmp_0 locally, which pointer is 0x55f128237720 type is 7
I0910 15:22:39.896240 53686 scope.cc:203] Create variable concat_1.tmp_0
I0910 15:22:39.896245 53686 interpreter_util.cc:1147] Create Variable concat_1.tmp_0 locally, which pointer is 0x55f128236a70 type is 7
I0910 15:22:39.896248 53686 interpreter_util.cc:1142] Create Variable feed global, which pointer is 0x55f127fb77c0 type is 9
I0910 15:22:39.896260 53686 interpreter_util.cc:1142] Create Variable fetch global, which pointer is 0x55f126f14240 type is 10
I0910 15:22:39.896263 53686 scope.cc:203] Create variable global_scatter_0.tmp_0
I0910 15:22:39.896267 53686 interpreter_util.cc:1147] Create Variable global_scatter_0.tmp_0 locally, which pointer is 0x55f126c82050 type is 7
I0910 15:22:39.896271 53686 scope.cc:203] Create variable local_expert_count
I0910 15:22:39.896274 53686 interpreter_util.cc:1147] Create Variable local_expert_count locally, which pointer is 0x55f126c82530 type is 7
I0910 15:22:39.896279 53686 scope.cc:203] Create variable local_input_buf
I0910 15:22:39.896282 53686 interpreter_util.cc:1147] Create Variable local_input_buf locally, which pointer is 0x55f128219fb0 type is 7
I0910 15:22:39.896287 53686 scope.cc:203] Create variable split_0.tmp_0
I0910 15:22:39.896291 53686 interpreter_util.cc:1147] Create Variable split_0.tmp_0 locally, which pointer is 0x55f126c82fc0 type is 7
I0910 15:22:39.896294 53686 scope.cc:203] Create variable split_0.tmp_1
I0910 15:22:39.896297 53686 interpreter_util.cc:1147] Create Variable split_0.tmp_1 locally, which pointer is 0x55f126c83460 type is 7
I0910 15:22:39.896302 53686 scope.cc:203] Create variable split_1.tmp_0
I0910 15:22:39.896306 53686 interpreter_util.cc:1147] Create Variable split_1.tmp_0 locally, which pointer is 0x55f126c83920 type is 7
I0910 15:22:39.896309 53686 scope.cc:203] Create variable split_1.tmp_1
I0910 15:22:39.896313 53686 interpreter_util.cc:1147] Create Variable split_1.tmp_1 locally, which pointer is 0x55f126c83de0 type is 7
I0910 15:22:39.896425 53686 interpreter_util.cc:567] Static build: 0
I0910 15:22:39.896430 53686 conditional_block_op_helper.cc:108] Found conditional_block op num: 0, conditional_block_grad op num: 0
I0910 15:22:39.896433 53686 while_op_helper.cc:154] Found while op num: 0, while grad op num: 0
I0910 15:22:39.896438 53686 recurrent_op_helper.cc:259] Found recurrent op num: 0, recurrent grad op num: 0
I0910 15:22:39.896730 53686 operator.cc:2207] op type:feed, expected_kernel_key:{data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.896760 53686 interpreter_util.cc:792] feed : finally selected kernel_key: {data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.896885 53686 operator.cc:2207] op type:feed, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.896894 53686 interpreter_util.cc:792] feed : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.896967 53686 operator.cc:2207] op type:split, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.896975 53686 interpreter_util.cc:792] split : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.897045 53686 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6767c01600), and remaining 0
I0910 15:22:39.897071 53686 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6767c01800), and remaining 0
I0910 15:22:39.897215 53686 operator.cc:2207] op type:concat, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.897225 53686 interpreter_util.cc:792] concat : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.897238 53686 infershape_utils.cc:547] BuildInferMetaContext: op kernel signature - Kernel Signature - name: concat; inputs: X; attributes: axis; outputs: Out
I0910 15:22:39.897284 53686 auto_growth_best_fit_allocator.cc:118] Not found and reallocate 256(0x7f6767c01a00), and remaining 0
I0910 15:22:39.897341 53686 operator.cc:2207] op type:alltoall, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.897356 53686 interpreter_util.cc:792] alltoall : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.897393 53686 alltoall_op.cu.cc:76] new comm_context_manager has rid 0
I0910 15:22:39.897465 53686 nccl_comm_context.cc:150] rank 1 send 2 to 0
I0910 15:22:39.897475 53686 nccl_comm_context.cc:170] rank 1 recv 2 from 0
I0910 15:22:39.897480 53686 nccl_comm_context.cc:150] rank 1 send 2 to 1
I0910 15:22:39.897486 53686 nccl_comm_context.cc:170] rank 1 recv 2 from 1
I0910 15:22:39.900171 53686 alltoall_op.cu.cc:113] new comm_context_manager has rid 0
I0910 15:22:39.900234 53686 operator.cc:2207] op type:split, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.900246 53686 interpreter_util.cc:792] split : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.900324 53686 operator.cc:2207] op type:concat, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.900334 53686 interpreter_util.cc:792] concat : finally selected kernel_key: {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.900346 53686 infershape_utils.cc:547] BuildInferMetaContext: op kernel signature - Kernel Signature - name: concat; inputs: X; attributes: axis; outputs: Out
I0910 15:22:39.900403 53686 operator.cc:2207] op type:global_scatter, expected_kernel_key:{data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.900411 53686 interpreter_util.cc:792] global_scatter : finally selected kernel_key: {data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.900449 53686 tensor_util.cc:309] TensorCopySync 4 from Place(gpu:1) to Place(cpu)
I0910 15:22:39.983577 53686 tensor_util.cc:309] TensorCopySync 4 from Place(gpu:1) to Place(cpu)
I0910 15:22:39.983641 53686 global_scatter_op.cu.cc:112] new comm_context_manager has ring_id 0
I0910 15:22:39.983664 53686 nccl_comm_context.cc:150] rank 1 send 6 to 0
I0910 15:22:39.983671 53686 nccl_comm_context.cc:170] rank 1 recv 2 from 0
I0910 15:22:39.983698 53686 nccl_comm_context.cc:150] rank 1 send 6 to 0
I0910 15:22:39.983704 53686 nccl_comm_context.cc:170] rank 1 recv 2 from 0
I0910 15:22:39.983714 53686 nccl_comm_context.cc:150] rank 1 send 4 to 0
I0910 15:22:39.983721 53686 nccl_comm_context.cc:170] rank 1 recv 4 from 0
I0910 15:22:39.983729 53686 nccl_comm_context.cc:150] rank 1 send 6 to 0
I0910 15:22:39.983736 53686 nccl_comm_context.cc:170] rank 1 recv 6 from 0
I0910 15:22:39.983796 53686 operator.cc:2207] op type:fetch_v2, expected_kernel_key:{data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(cpu)]; library_type[PLAIN]}
I0910 15:22:39.983821 53686 interpreter_util.cc:792] fetch_v2 : finally selected kernel_key: {data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(cpu)]; library_type[PLAIN]}
I0910 15:22:39.983840 53686 scope.cc:203] Create variable global_scatter_0.tmp_0_device_Place(gpu:1)_Place(cpu)
I0910 15:22:39.983848 53686 data_transfer.cc:398] Create Variable global_scatter_0.tmp_0_device_Place(gpu:1)_Place(cpu) locally, which pointer is 0x55f128232cf0Variable Type 7
I0910 15:22:39.983886 53686 data_transfer.cc:441] Insert memcpy_d2h with global_scatter_0.tmp_0(Place(gpu:1)) -> global_scatter_0.tmp_0_device_Place(gpu:1)_Place(cpu)(Place(cpu)).
I0910 15:22:39.983919 53686 infershape_utils.cc:547] BuildInferMetaContext: op kernel signature - Kernel Signature - name: memcpy_d2h; inputs: X; attributes: dst_place_type; outputs: Out
I0910 15:22:39.983956 53686 operator.cc:2207] op type:memcpy_d2h, expected_kernel_key:{data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:1)]; library_type[PLAIN]}
I0910 15:22:39.983986 53686 tensor_utils.cc:57] TensorCopy 7, 2 from Place(gpu:1) to Place(cpu) |
我先在本地复现一下 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"has ring_id attr.")); | ||
|
||
stream = comm_ctx->GetStream(); | ||
nranks = comm_ctx->GetRank(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nranks = comm_ctx->GetSize();
,rank
是卡在comm_ctx
的序号,nranks
是comm_ctx
共有多少张卡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,谢谢~
@@ -20,6 +20,9 @@ limitations under the License. */ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "paddle/phi/core/distributed/comm_context_manager.h"
@@ -13,11 +13,13 @@ See the License for the specific language governing permissions and | |||
limitations under the License. */ | |||
|
|||
#include "paddle/fluid/operators/collective/global_scatter_op.h" | |||
#include "paddle/phi/core/distributed/comm_context_manager.h" | |||
|
|||
#if defined(PADDLE_WITH_NCCL) || defined(PADDLE_WITH_RCCL) | |||
#include "paddle/fluid/platform/collective_helper.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global_scatter_op.h里新增的几个头文件,一起放到这里。因为不是所有情况都会编译NCCL的库,需要放在#if defined(PADDLE_WITH_NCCL) || defined(PADDLE_WITH_RCCL)
的宏判断里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是将头文件中#if defined(PADDLE_WITH_NCCL) || defined(PADDLE_WITH_RCCL)
这段宏定义放在.cu.cc下?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
头文件已经有这一段宏了(18行),把global_scatter_op.h新include的几个头文件放到这段宏定义里。PR-CI-PY3挂应该就是这个原因。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
头文件中已经有了这一段,再将其加到.cu.cc中是否冗余了?我看了一些op,有的是加在了头文件有的是加在了.cu.cc,这里是否能做一个统一?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
都放在.cu.cc下面吧,注意用条件宏判断是否include,一些情况下是不会编译NCCL及相关代码的,会导致CI挂掉
PTAL @GhostScreaming |
@@ -103,34 +142,62 @@ struct GlobalScatterFunctor<phi::GPUContext, T> { | |||
} | |||
|
|||
auto recv_ptr = 0; | |||
auto send_buf = x->data<T>(); | |||
auto recv_buf = out->mutable_data<T>(out_dims, place); | |||
out->mutable_data<T>(out_dims, place); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以删除,不然PR-CI-APPROVE过不了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里我理解out需要申请显存,无法删除。看了PR-CI-APPROVE的报错,是否可以用phi::DeviceContext::Alloc()
代替
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯嗯,也是可以的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR-CI-APPROVAL中提到的phi::DeviceContext::Alloc()
应该是适用于phi算子,但是现在global_scatter
还是旧的算子体系,所以还是得用mutable_data
。申请一下豁免合入~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…e#57161) * [NewComm] No.3 compatiable upgrade for op * fix * fix
PR types
Others
PR changes
APIs
Description
compatiable upgrade for
global_scatter
op#57102