Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix subtle bug in mocking processor_agent in our tests
Some of the scheduler tests tried to prevent DAG processor processing DAGs from "tests/dags" directory by setting processor_agent to Mock object: ```python self.job_runner.processor_agent = mock.MagicMock() ``` This, in connection with scheduler job cleaning all the tables and approach similar to: ```python dag = self.dagbag.get_dag("test_retry_handling_job") dag_task1 = dag.get_task("test_retry_handling_op") dag.clear() dag.sync_to_db() ``` Allowed the test to run in isolated space where only one or few DAGS were present in the DB. This probably worked perfectly in the past, but after some changes in how DAGFileProcessor works this did not prevent DAGFileProcessor from running when _execute method in scheduler_job_runner has been executed, and standalone dag processor was not running, the processor_agent has been overwritten by a new DagFileProcessor in the `_execute` method of scheduler_job_runner. ```python if not self._standalone_dag_processor: self.processor_agent = DagFileProcessorAgent( dag_directory=Path(self.subdir), max_runs=self.num_times_parse_dags, processor_timeout=processor_timeout, dag_ids=[], pickle_dags=pickle_dags, async_mode=async_mode, ) ``` This led to a very subtle race condition which was more likely on machines with multiple cores/faster disk (so for example it led to apache#35204 which appeared on self-hosted (8 core) runners and did not appear on Public (2-core runners) or it could appear on an 8 core ARM Mac but not appear on 6 core Intel Mac (only on sqlite) If the DAGFileProcessor managed to start and spawn some parsing processes and grab the DB write access for sqlite and those processes managed to parse some of the DAG files from tests/dags/ folder, those DAGs could have polutted the DAGs in the DB - leading to undesired effects (for example with test hanging while the scheduler job run attempted to process an unwanted subdag and got deadlocked in case of apache#35204. The solution to that is to only set the processor_agent if not set already. This can only happen in unit tests when the `processor_agent` sets it to Mock object. For "production" the agent is only set once in the `_execute` methods so there is no risk involved in checking if it is not set already. Fixes: apache#35204
- Loading branch information