Replies: 2 comments
-
Hi @nkruskamp! Can you take a look at this guide that explains provisional nodes. I believe the |
Beta Was this translation helpful? Give feedback.
-
Hi @tobiasraabe this is great, and exactly what I needed to get the tasks running. Another situation I have to address is when a task has an unknown number of inputs that are not saved in the same folder, but rather have to be specified separately. Using a parameter function to create the dictionary of arguments, each task run could have 1 - n number of input files with the exact paths listed. My initial test is to pass the input files as a list, and equal length list of sheet names (or other needed variables), and that seems to work, but I wanted to see if there was a more pytask way to accomplish this? Thanks again for your help!
|
Beta Was this translation helpful? Give feedback.
-
Hi All, I'm looking for some help on structuring my tasks correctly when I have a set or unknown number of input files.
I have in the past worked with dask using a set of parquet files on disk. Or using duckdb to treat a folder of files as a database. Sometimes I will specific the number of partitions to create, and sometimes I have dask do the partitioning. I think this is separate from the dask + pytask option that does distribution computing of tasks.
So my questions is: What is the best way in the pytask workflow to pass these "distributed" data to a task?
I've included a toy example (I'm not sure it would actually run, but hopefully gets the point across) of a basic workflow that takes a collection of input CSVs, uses dask to read them, partition them, and save them to disc. Then a function that would do some sort of analysis and output results. So in between the first and second task, is this the best approach or is there something else?
Beta Was this translation helpful? Give feedback.
All reactions