We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
目前处于分布式训练保存ckpt,可以读取 可以再训练的状态, 当user_id数量极多的情况下模型参数非常大 应该是多个ps来扛,这个时候想要导出user_embedding 不知有没有例子可以参考 ,我尝试在examples/graphsage/run_graphsage.py的基础上进行了 分布式train的修改是ok的, 不过在此基础上直接调用model_estimator.infer() 好像并不行,wo
进行infer时 代码、执行语句、日志分别是
代码主要部分 `tf_config={ 'cluster': {'chief': chief_hosts, 'worker': worker_hosts, 'ps': ps_hosts}, 'task': {'type': job_name, 'index': task_index} } if job_name == 'worker' and task_index == 0: tf_config['task'] = {"index": 0, "type": "chief"} .... .... model = graphSage的例子 config = tf.estimator.RunConfig(log_step_count_steps=None) model_estimator = NodeEstimator(model, params, config)
if flags_obj.run_mode == 'train': model_estimator.train_and_evaluate() elif flags_obj.run_mode == 'evaluate': model_estimator.evaluate() elif flags_obj.run_mode == 'infer': model_estimator.infer() else: raise ValueError('Run mode not exist!') ` 执行语句 python run_graphsage_distribute_new.py --job_name 'start_euler' --shard_idx 0 --shard_num ${shard_num} --data_dir ${data_dir} --zk_addr ${zk_addr} --zk_path ${zk_path} python run_graphsage_distribute_new.py --job_name 'start_euler' --shard_idx 1 --shard_num ${shard_num} --data_dir ${data_dir} --zk_addr ${zk_addr} --zk_path ${zk_path}
python run_graphsage_distribute_new.py --job_name 'ps' --shard_num ${shard_num} --task_index 0 --ps_hosts ${ps_hosts} --worker_hosts ${worker_hosts} --chief_hosts ${chief_hosts} --zk_addr ${zk_addr} --zk_path ${zk_path} # ps 刚启动到这个ps 时就已经有问题了(目前仅1个ps节点)
日志图片
The text was updated successfully, but these errors were encountered:
No branches or pull requests
目前处于分布式训练保存ckpt,可以读取 可以再训练的状态, 当user_id数量极多的情况下模型参数非常大 应该是多个ps来扛,这个时候想要导出user_embedding 不知有没有例子可以参考 ,我尝试在examples/graphsage/run_graphsage.py的基础上进行了 分布式train的修改是ok的, 不过在此基础上直接调用model_estimator.infer() 好像并不行,wo
进行infer时 代码、执行语句、日志分别是
代码主要部分
`tf_config={
'cluster': {'chief': chief_hosts, 'worker': worker_hosts, 'ps': ps_hosts},
'task': {'type': job_name, 'index': task_index}
}
if job_name == 'worker' and task_index == 0:
tf_config['task'] = {"index": 0, "type": "chief"}
....
....
model = graphSage的例子
config = tf.estimator.RunConfig(log_step_count_steps=None)
model_estimator = NodeEstimator(model, params, config)
if flags_obj.run_mode == 'train':
model_estimator.train_and_evaluate()
elif flags_obj.run_mode == 'evaluate':
model_estimator.evaluate()
elif flags_obj.run_mode == 'infer':
model_estimator.infer()
else:
raise ValueError('Run mode not exist!')
`
执行语句
python run_graphsage_distribute_new.py --job_name 'start_euler' --shard_idx 0 --shard_num ${shard_num} --data_dir ${data_dir} --zk_addr ${zk_addr} --zk_path ${zk_path}
python run_graphsage_distribute_new.py --job_name 'start_euler' --shard_idx 1 --shard_num ${shard_num} --data_dir ${data_dir} --zk_addr ${zk_addr} --zk_path ${zk_path}
python run_graphsage_distribute_new.py --job_name 'ps' --shard_num ${shard_num} --task_index 0 --ps_hosts ${ps_hosts} --worker_hosts ${worker_hosts} --chief_hosts ${chief_hosts} --zk_addr ${zk_addr} --zk_path ${zk_path} # ps
刚启动到这个ps 时就已经有问题了(目前仅1个ps节点)
日志图片
The text was updated successfully, but these errors were encountered: