Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Why execute_batch_tasks function was called twice in sirmordred.py ? #435

Closed
heming6666 opened this issue Mar 19, 2020 · 9 comments · Fixed by #436
Closed

[Question] Why execute_batch_tasks function was called twice in sirmordred.py ? #435

heming6666 opened this issue Mar 19, 2020 · 9 comments · Fixed by #436

Comments

@heming6666
Copy link
Contributor

As mentioned in this issue, I went through the code of sirmordred.py. And I found that the execute_batch_tasks function would be called with the same params two times , which means that all tasks would be executed two times when the attribute update was set to False. Is there any specific reason?

if not self.conf['general']['update']:
    sleep_for = self.conf['sortinghat']['sleep_for'] if self.conf.get('sortinghat', None) else 1
    self.execute_batch_tasks(all_tasks_cls,
                                sleep_for,
                                self.conf['general']['min_update_delay'])
    self.execute_batch_tasks(all_tasks_cls,
                                sleep_for,
                                self.conf['general']['min_update_delay'])
    break
@valeriocos
Copy link
Member

Thank you @heming6666 ! I checked the history of that file, but it isn't clear why this call is duplicated. It was added here: 99e2b28.

Can you run to execution of sirmordred (the first one with the two statements and second one with just one) and check if the results are consistent?

In order to run mordred you have to execute the script at https://github.com/chaoss/grimoirelab-sirmordred/tree/master/bin with a setup.cfg as input:
sirmordred --config <foo>/setup.cfg

@heming6666
Copy link
Contributor Author

Ok! I will check that and update the result here.

@valeriocos
Copy link
Member

Thanks!

@heming6666
Copy link
Contributor Author

heming6666 commented Mar 19, 2020

It seems that everything is still working fine after I remove the duplicate call.

Firstly, I run mordred with the two statements. And I got the correct dashboard and logs, as is shown below.

image

2020-03-20 00:00:16,219 - sirmordred.sirmordred - INFO - -----------------Going to call execute_batch_tasks function for the first time--------------------
2020-03-20 00:00:16,219 - sirmordred.sirmordred - DEBUG - backend_tasks = [<class 'sirmordred.task_collection.TaskRawDataCollection'>, <class 'sirmordred.task_enri
ch.TaskEnrich'>]
2020-03-20 00:00:16,219 - sirmordred.sirmordred - DEBUG - global_tasks = [<class 'sirmordred.task_projects.TaskProjects'>]
2020-03-20 00:00:16,220 - sirmordred.task_manager - DEBUG - [git] Task starts
2020-03-20 00:00:16,220 - sirmordred.task_manager - DEBUG - [<class 'sirmordred.task_collection.TaskRawDataCollection'>, <class 'sirmordred.task_enrich.TaskEnrich'
>]
2020-03-20 00:00:16,222 - sirmordred.task_manager - DEBUG - [github] Task starts
2020-03-20 00:00:16,222 - sirmordred.task_manager - DEBUG - [<class 'sirmordred.task_collection.TaskRawDataCollection'>, <class 'sirmordred.task_enrich.TaskEnrich'
>]
2020-03-20 00:00:16,223 - sirmordred.task_manager - DEBUG - [Global tasks] Task starts
2020-03-20 00:00:16,223 - sirmordred.task_manager - DEBUG - [<class 'sirmordred.task_projects.TaskProjects'>]
2020-03-20 00:00:16,223 - sirmordred.task_manager - DEBUG - [Global tasks] Tasks will be executed in this order: [<sirmordred.task_projects.TaskProjects object at
0x7f4b3e21ac18>]
2020-03-20 00:00:16,223 - sirmordred.sirmordred - INFO - TaskProjects will be executed on Fri, 20 Mar 2020 00:01:56
2020-03-20 00:00:16,248 - sirmordred.task_manager - DEBUG - [git] Tasks will be executed in this order: [<sirmordred.task_collection.TaskRawDataCollection object a
t 0x7f4b634a9cc0>, <sirmordred.task_enrich.TaskEnrich object at 0x7f4b3e20f518>]
2020-03-20 00:00:16,248 - sirmordred.task_manager - DEBUG - [github] Tasks will be executed in this order: [<sirmordred.task_collection.TaskRawDataCollection objec
t at 0x7f4b3e20ffd0>, <sirmordred.task_enrich.TaskEnrich object at 0x7f4b3e20fda0>]
2020-03-20 00:00:17,224 - sirmordred.task_manager - DEBUG - [Global tasks] Tasks started: <sirmordred.task_projects.TaskProjects object at 0x7f4b3e21ac18>
2020-03-20 00:00:17,224 - sirmordred.task_projects - INFO - Reading projects data from  /projects.json
2020-03-20 00:00:17,224 - sirmordred.task_manager - DEBUG - [Global tasks] Tasks finished: <sirmordred.task_projects.TaskProjects object at 0x7f4b3e21ac18>
2020-03-20 00:00:17,249 - sirmordred.task_manager - DEBUG - [github] Tasks started: <sirmordred.task_collection.TaskRawDataCollection object at 0x7f4b3e20ffd0>
2020-03-20 00:00:17,249 - sirmordred.task_collection - INFO - [github] collection phase starts
2020-03-20 00:00:17,249 - sirmordred.task_projects - DEBUG - List of repos for github: ['https://github.com/heming6666/grimoirelab-test'] (raw=True)
......
2020-03-20 00:00:31,252 - sirmordred.task_enrich - INFO - [git] studies phase end
2020-03-20 00:00:31,252 - sirmordred.task_enrich - INFO - [git] autorefresh for studies not active
2020-03-20 00:00:31,252 - sirmordred.task_manager - DEBUG - [git] Tasks finished: <sirmordred.task_enrich.TaskEnrich object at 0x7f4b3e20f518>
2020-03-20 00:00:31,252 - sirmordred.task_manager - DEBUG - [git] Task is exiting
2020-03-20 00:00:31,252 - sirmordred.sirmordred - DEBUG - [thread:main] No exceptions in threads queue. Let's continue ..
2020-03-20 00:00:31,252 - sirmordred.sirmordred - DEBUG - [thread:main] All threads (and their tasks) are finished
2020-03-20 00:00:31,253 - sirmordred.sirmordred - INFO - -----------------Going to call execute_batch_tasks function for the second time --------------------
2020-03-20 00:00:31,253 - sirmordred.sirmordred - DEBUG - backend_tasks = [<class 'sirmordred.task_collection.TaskRawDataCollection'>, <class 'sirmordred.task_enri
ch.TaskEnrich'>]
2020-03-20 00:00:31,253 - sirmordred.sirmordred - DEBUG - global_tasks = [<class 'sirmordred.task_projects.TaskProjects'>]
2020-03-20 00:00:31,253 - sirmordred.task_manager - DEBUG - [git] Task starts
2020-03-20 00:00:31,253 - sirmordred.task_manager - DEBUG - [<class 'sirmordred.task_collection.TaskRawDataCollection'>, <class 'sirmordred.task_enrich.TaskEnrich'
>]
2020-03-20 00:00:31,255 - sirmordred.task_manager - DEBUG - [github] Task starts
2020-03-20 00:00:31,255 - sirmordred.task_manager - DEBUG - [<class 'sirmordred.task_collection.TaskRawDataCollection'>, <class 'sirmordred.task_enrich.TaskEnrich'
>]
2020-03-20 00:00:31,256 - sirmordred.task_manager - DEBUG - [Global tasks] Task starts
2020-03-20 00:00:31,256 - sirmordred.task_manager - DEBUG - [<class 'sirmordred.task_projects.TaskProjects'>]
2020-03-20 00:00:31,256 - sirmordred.task_manager - DEBUG - [Global tasks] Tasks will be executed in this order: [<sirmordred.task_projects.TaskProjects object at
0x7f4b3ce9a470>]
2020-03-20 00:00:31,256 - sirmordred.sirmordred - INFO - TaskProjects will be executed on Fri, 20 Mar 2020 00:02:11
2020-03-20 00:00:31,353 - sirmordred.task_manager - DEBUG - [github] Tasks will be executed in this order: [<sirmordred.task_collection.TaskRawDataCollection objec
t at 0x7f4b3e21a668>, <sirmordred.task_enrich.TaskEnrich object at 0x7f4b3e21a278>]
2020-03-20 00:00:31,354 - sirmordred.task_manager - DEBUG - [git] Tasks will be executed in this order: [<sirmordred.task_collection.TaskRawDataCollection object a
t 0x7f4b4938de48>, <sirmordred.task_enrich.TaskEnrich object at 0x7f4b3e20f2e8>]
2020-03-20 00:00:32,257 - sirmordred.task_manager - DEBUG - [Global tasks] Tasks started: <sirmordred.task_projects.TaskProjects object at 0x7f4b3ce9a470>
2020-03-20 00:00:32,257 - sirmordred.task_projects - INFO - Reading projects data from  /projects.json
2020-03-20 00:00:32,258 - sirmordred.task_manager - DEBUG - [Global tasks] Tasks finished: <sirmordred.task_projects.TaskProjects object at 0x7f4b3ce9a470>
2020-03-20 00:00:32,355 - sirmordred.task_manager - DEBUG - [github] Tasks started: <sirmordred.task_collection.TaskRawDataCollection object at 0x7f4b3e21a668>
2020-03-20 00:00:32,355 - sirmordred.task_collection - INFO - [github] collection phase starts
2020-03-20 00:00:32,355 - sirmordred.task_projects - DEBUG - List of repos for github: ['https://github.com/heming6666/grimoirelab-test'] (raw=True)
......
2020-03-20 00:00:46,302 - sirmordred.task_manager - DEBUG - [git] Tasks finished: <sirmordred.task_enrich.TaskEnrich object at 0x7f4b3e20f2e8>
2020-03-20 00:00:46,306 - sirmordred.task_manager - DEBUG - [git] Task is exiting
2020-03-20 00:00:46,306 - sirmordred.sirmordred - DEBUG - [thread:main] No exceptions in threads queue. Let's continue ..
2020-03-20 00:00:46,307 - sirmordred.sirmordred - DEBUG - [thread:main] All threads (and their tasks) are finished
2020-03-20 00:00:46,307 - sirmordred.sirmordred - INFO - Finished SirMordred engine ...

Secondly, I run it again with just only one statements. And everything was still working fine. The logs were silimar with the previous one.

2020-03-20 00:26:35,086 - sirmordred.sirmordred - INFO - ------------------Going to call execute_batch_tasks function for the first time--------------------
2020-03-20 00:26:35,086 - sirmordred.sirmordred - DEBUG - backend_tasks = [<class 'sirmordred.task_collection.TaskRawDataCollection'>, <class 'sirmordred.task_enri
ch.TaskEnrich'>]
2020-03-20 00:26:35,086 - sirmordred.sirmordred - DEBUG - global_tasks = [<class 'sirmordred.task_projects.TaskProjects'>]
2020-03-20 00:26:35,087 - sirmordred.task_manager - DEBUG - [git] Task starts
2020-03-20 00:26:35,087 - sirmordred.task_manager - DEBUG - [<class 'sirmordred.task_collection.TaskRawDataCollection'>, <class 'sirmordred.task_enrich.TaskEnrich'
>]
2020-03-20 00:26:35,089 - sirmordred.task_manager - DEBUG - [github] Task starts
2020-03-20 00:26:35,089 - sirmordred.task_manager - DEBUG - [<class 'sirmordred.task_collection.TaskRawDataCollection'>, <class 'sirmordred.task_enrich.TaskEnrich'
>]
......
2020-03-20 00:26:52,199 - grimoire_elk.enriched.enrich - INFO - [git] study onion end
2020-03-20 00:26:52,199 - sirmordred.task_enrich - INFO - [git] studies phase end
2020-03-20 00:26:52,199 - sirmordred.task_enrich - INFO - [git] autorefresh for studies not active
2020-03-20 00:26:52,199 - sirmordred.task_manager - DEBUG - [git] Tasks finished: <sirmordred.task_enrich.TaskEnrich object at 0x7f933a9e2f28>
2020-03-20 00:26:52,199 - sirmordred.task_manager - DEBUG - [git] Task is exiting
2020-03-20 00:26:52,200 - sirmordred.sirmordred - DEBUG - [thread:main] No exceptions in threads queue. Let's continue ..
2020-03-20 00:26:52,200 - sirmordred.sirmordred - DEBUG - [thread:main] All threads (and their tasks) are finished
2020-03-20 00:26:52,200 - sirmordred.sirmordred - INFO - Finished SirMordred engine ...

@valeriocos
Copy link
Member

Sorry for the late reply @heming6666, and thank you for taking the time to have a look at this issue.
When you executed the second test (the one with just a statement), did you delete all raw and enriched indexes?

@heming6666
Copy link
Contributor Author

Yes. I even removed all the data of elasticSearch by the following cammand:

rm -rf elasticsearch-6.1.4

Then I re-installed elasticsearch and restart it. Here's the logs of elasticsearch corresponding to second test:

[2020-03-20T00:25:17,782][INFO ][o.e.n.Node               ] [] initializing ...
[2020-03-20T00:25:17,936][INFO ][o.e.e.NodeEnvironment    ] [NADo0j8] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [17gb], net total_space [39.2gb
], types [rootfs]
[2020-03-20T00:25:17,937][INFO ][o.e.e.NodeEnvironment    ] [NADo0j8] heap size [1015.6mb], compressed ordinary object pointers [true]
[2020-03-20T00:25:17,939][INFO ][o.e.n.Node               ] node name [NADo0j8] derived from node ID [NADo0j89QwCjv9ur3NV71g]; set [node.name] to override
[2020-03-20T00:25:17,940][INFO ][o.e.n.Node               ] version[6.1.4], pid[417], build[d838f2d/2018-03-14T08:28:22.470Z], OS[Linux/3.10.0-1062.12.1.el7.x86_64
/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_242/25.242-b08]
[2020-03-20T00:25:17,940][INFO ][o.e.n.Node               ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+Use
CMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.ne
tty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true,
-XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/home/haiming/elasticsearch-6.1.4, -Des.path.conf=/home/haiming/elasticsearch-6.1.4/config]
[2020-03-20T00:25:19,743][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [aggs-matrix-stats]
[2020-03-20T00:25:19,744][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [analysis-common]
[2020-03-20T00:25:19,744][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [ingest-common]
[2020-03-20T00:25:19,744][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [lang-expression]
[2020-03-20T00:25:19,744][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [lang-mustache]
[2020-03-20T00:25:19,750][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [lang-painless]
[2020-03-20T00:25:19,750][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [mapper-extras]
[2020-03-20T00:25:19,750][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [parent-join]
[2020-03-20T00:25:19,750][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [percolator]
[2020-03-20T00:25:19,750][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [reindex]
[2020-03-20T00:25:19,751][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [repository-url]
[2020-03-20T00:25:19,751][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [transport-netty4]
[2020-03-20T00:25:19,751][INFO ][o.e.p.PluginsService     ] [NADo0j8] loaded module [tribe]
[2020-03-20T00:25:19,751][INFO ][o.e.p.PluginsService     ] [NADo0j8] no plugins loaded
[2020-03-20T00:25:22,675][INFO ][o.e.d.DiscoveryModule    ] [NADo0j8] using discovery type [zen]
[2020-03-20T00:25:23,602][INFO ][o.e.n.Node               ] initialized
[2020-03-20T00:25:23,603][INFO ][o.e.n.Node               ] [NADo0j8] starting ...
[2020-03-20T00:25:23,854][INFO ][o.e.t.TransportService   ] [NADo0j8] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2020-03-20T00:25:23,873][WARN ][o.e.b.BootstrapChecks    ] [NADo0j8] max file descriptors [65535] for elasticsearch process is too low, increase to at least [6553
6]
[2020-03-20T00:25:26,986][INFO ][o.e.c.s.MasterService    ] [NADo0j8] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {NADo0j8}{NADo0j89QwCjv9ur
3NV71g}{c9rMD1pdR-eG1dSGbyc2Pw}{127.0.0.1}{127.0.0.1:9300}
[2020-03-20T00:25:26,998][INFO ][o.e.c.s.ClusterApplierService] [NADo0j8] new_master {NADo0j8}{NADo0j89QwCjv9ur3NV71g}{c9rMD1pdR-eG1dSGbyc2Pw}{127.0.0.1}{127.0.0.1
:9300}, reason: apply cluster state (from master [master {NADo0j8}{NADo0j89QwCjv9ur3NV71g}{c9rMD1pdR-eG1dSGbyc2Pw}{127.0.0.1}{127.0.0.1:9300} committed version [1]
 source [zen-disco-elected-as-master ([0] nodes joined)]])
[2020-03-20T00:25:27,043][INFO ][o.e.h.n.Netty4HttpServerTransport] [NADo0j8] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2020-03-20T00:25:27,043][INFO ][o.e.n.Node               ] [NADo0j8] started
[2020-03-20T00:25:27,094][INFO ][o.e.g.GatewayService     ] [NADo0j8] recovered [0] indices into cluster_state
......
[2020-03-20T00:26:51,049][INFO ][o.e.c.m.MetaDataMappingService] [NADo0j8] [git_test/g5hCxtXMRK2X1bFCKeNzHA] update_mapping [items]
[2020-03-20T00:26:51,188][INFO ][o.e.c.m.MetaDataCreateIndexService] [NADo0j8] [git_aoc_test-enriched] creating index, cause [api], templates [], shards [5]/[1], m
appings [items]
[2020-03-20T00:26:51,651][INFO ][o.e.c.m.MetaDataCreateIndexService] [NADo0j8] [git_onion_test-enriched] creating index, cause [api], templates [], shards [5]/[1],
 mappings [item]

@valeriocos
Copy link
Member

Thank you @heming6666 ! can you submit a PR?

@heming6666
Copy link
Contributor Author

Sure. I will send a PR to remove the duplicate call.
Thank you @valeriocos very much !

@valeriocos
Copy link
Member

Great, thanks @heming6666

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants