Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insights and opportunities related to helping Kedro impact more users #2901

Closed
yetudada opened this issue Aug 7, 2023 · 1 comment
Closed

Comments

@yetudada
Copy link
Contributor

yetudada commented Aug 7, 2023

Introduction

What is this?

We have conducted extensive research to understand people's motivations for using or not using Kedro. We want to improve Kedro to provide more value to data scientists, data engineers, machine learning engineers, and other users. We've compiled the research insights and potential improvement ideas in this GitHub issue so that we can prioritise concepts that make Kedro an attractive option for users across roles and skill levels.

In part, this issue addresses: kedro-org/kedro-viz#1448

What's in the scope of this work?

Our next step is to conduct value testing to identify the most impactful concepts. This user-centred process should guide us toward three high-potential solutions that can meaningfully solve pain points based on evidence directly from Kedro users.

What terminology will I be using?

To focus our research, we will look at two representative user profiles that encompass vital segments of the data science community:

  • "Notebook-focussed users" primarily use notebooks for analysis and may be less familiar with IDE-based coding workflows; this could include many data analysts and data scientists.
  • "IDE-focussed users" are comfortable using IDEs for development, indicating intermediate to advanced software engineering skills; this may include some data scientists, machine-learning engineers, and data engineers.

It's helpful to define two existing ways of using Kedro, to ensure we have a shared understanding when discussing Kedro's architecture:

  • "Using Kedro as a framework" encapsulates using the framework (project template, session, context and CLI) and library components (ConfigLoader, Pipeline, Runner, Data Catalog and Datasets).
  • "Using Kedro as a library" refers to using one or more of Kedro's library components (ConfigLoader, Pipeline, Runner, Data Catalog and Datasets) in Python scripts or notebooks. Here, a user is leveraging Kedro modular components for their capabilities and is not using the framework.

What are some of our learnings?

IDE-focussed users want to adopt Kedro in an existing use case

IDE-focussed users will try to learn how to use Kedro by refactoring an existing use case into a project that uses Kedro. Their objective is to learn how to leverage Kedro as a framework or to adopt Kedro in stages by incorporating library components into their work.

IDE-focussed users want to incorporate Kedro in an existing project template

IDE-focussed users leverage internal project templates provided by CookieCutter or tools that provide project templates like Poetry, Hydra and DVC. This user group might bypass Kedro because of the high switching cost when adopting Kedro's project template or the challenges with integrating Kedro with those tools. We recommend that our users start from a Kedro project template or starter, and this may not be possible.

IDE-focussed users want to choose the features included in the project templated generated by Kedro

IDE-focussed users have a lot of opinions about how they want their project template to be structured. There was a lot of variance on #208. Suppose an IDE-focussed user has committed to the project template created by Kedro. In that case, they still want the flexibility to choose which features are enabled and visible in their personalised template.

IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it

Kedro is positioned as an all-or-nothing overhaul. Our users will choose not to use Kedro when placed on a collaborative project and are the only ones that want to use Kedro. Most of these perspectives are associated with adopting the framework.

Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories

Our project template has a lot of software engineering concepts embedded in it, some more necessary than others. It is reasonable to expect that a notebook-focussed user, unfamiliar with this paradigm from software engineering frameworks, would need help understanding what each directory and file does - either by using our documentation or speaking to an expert user of Kedro. This user group also needed help understanding the role of configuration, and some preferred writing their code in a single file, a notebook.

Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework

Users assume an all-or-nothing use of the Kedro framework and do not realise they can use the Data Catalog as a stand-alone item. Our documentation for Kedro as a data registry is a very unpopular page, but we also do not talk about this functionality at all with our users.

IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework

Kedro's modular architecture provides opportunities to delight users by incrementally integrating specific components like the Data Catalog. For example, IDE-focused users have used the Catalog to empower analysts on their teams. Additionally, users who found the framework restrictive or just wanted to use Kedro for data exploration have benefited from the Data Catalog.

IDE-focussed users workaround our ConfigLoader's assumptions

IDE-focussed users run into errors because our ConfigLoader requires a conf directory, makes users place their configuration in conf/base and needs conf/local to be present. We expected our users to make ConfigLoaders without these assumptions, but we have yet to see evidence that they have done this. Our users choose to use other tools instead of our ConfigLoader or have workarounds for the errors that we create. We've assumed that users would always start from a Kedro project, and that's not always true.

How are we trying to address these insights?

We have compiled a table of adoption opportunities, consolidated past and future concepts and solutions, and new ideas to build on learnings. This table catalogues challenges identified through user research and outlines the rationale behind solutions we have prototyped or proposed to address each obstacle.

What are some of our learnings? User profile focus What potential solutions or past approaches could address these learnings? How does this concept help our users? What are known limitations?
IDE-focussed users want to adopt Kedro in an existing use case IDE kedro init (#2512) kedro init assumes that IDE-focussed users want to adopt the framework. It only adds files to an existing project so that kedro recognises it as a project, these files are detailed in our architecture overview. Users will still need to make significant changes to their code, e.g. create pure Python functions, create a Python package for src, take out hard-coded configuration values, remove I/O, figure out how to integrate tools e.g. MLflow or DVC, and more.
IDE-focussed users want to adopt Kedro in an existing use case IDE Use Kedro as a library (in part in #2819) Users would leverage Kedro's library components rather than adopting the framework. It's possible to do this today with our Data Catalog, Pipeline and Runner. The ConfigLoader has framework assumptions and requires using the conf structure from the project template. Users would not be able to use the CLI or Kedro-Viz.
IDE-focussed users want to incorporate Kedro in an existing project template IDE kedro init (#2512) kedro init allows users to add the minimum files required for us to recognise that it's a Kedro project. This design will not address integration between the tools, e.g. look at Databricks' MLOPs stack and try to add files for Kedro to this. Nor will it provide flexibility for customising the project template created by Kedro (#2553).
IDE-focussed users want to incorporate Kedro in an existing project template IDE Starters Starters enable merging project templates from different tools and help with integration but introduce a maintenance burden for users (#1961).
IDE-focussed users want to incorporate Kedro in an existing project template IDE Use Kedro as a library (in part in #2819) Users would leverage Kedro's library components rather than adopting the framework. It's possible to do this today with our Data Catalog, Pipeline and Runner. The ConfigLoader has framework assumptions and requires using the conf structure from the project template. Users would not be able to use the CLI or Kedro-Viz.
IDE-focussed users want to choose the features included in the project templated generated by Kedro IDE Utility modules (#2388), also known as Kedro Incremental Starters (#2054) For IDE-focussed users, this design assumes the user has adopted the framework and wants to limit the features (and therefore folders and files) we include in their project template. This design does not create more users with this profile, because they are already using our framework, but helps with their user experience of Kedro.
IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it IDE + notebook Use Kedro as a library (in part in #2819) This might help increase the adoption of Kedro as a library where users are still determining if they can adopt the framework for their collaborative work. Rather than all-or-nothing, users would leverage Kedro's library components rather than adopting the framework. Users would not be able to use the CLI or Kedro-Viz.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Use Kedro as a library (in part in #2819) This concept does not help with learning new software engineering concepts; our users would still need to do this. However, it does make it possible to avoid the IDE (a known challenge for this user profile). They also would not get overwhelmed by the project template. However, users could not use the CLI or use Kedro-Viz.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Utility modules (#2388), also known as Kedro Incremental Starters (#2054) This solution might make the project template more manageable to new users because it has fewer files and folders. Users opt-in for features to understand why specific files and folders get added to their projects.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Export nodes from a notebook Very few people used this solution, even when they were aware of it. They did not find it troubling to copy code into their project template. It's a user experience improvement feature and does aid the adoption of Kedro. This design also assumes that you know how Kedro comes together - awareness of nodes.py in the project template - and that you're not intimidated by the framework.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Make it possible to use Kedro-Viz without the framework (kedro-org/kedro-viz#1459); similar to Kedro-Light This idea builds on using Kedro as a library, and all it additionally allows users to do is visualise their Kedro pipeline. This idea inherits the benefits and downsides of using Kedro as a library.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Kedro-Jupyter plugin (Slide 20) Exporting code from a notebook (kedro jupyter convert) into a Kedro project was part of this concept; it's essentially a "framework in the notebook". Users did not use kedro jupyter convert. This plugin idea was one of the worst-rated ideas in our Kedro IDE exploratory concept tests because users wondered how to revert the code from the framework into the notebook and whether it would always work.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Refactor Jupyter notebooks into Kedro projects, using for example GenAI (#2820) Gen AI would guide users as they learn how to convert their Jupyter notebooks into framework use cases. The limitation of this idea is that we need to know how the notebook/s are structured. Users have created video walkthroughs on how to do this.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Pipeline Builder UI also known as Kedro Lab, Pipeline Builder GUI or even two-way communication between framework & viz The concept would allow users to create nodes, pipelines and reuse code without exposure to the framework. A benefit is that they would get a Kedro framework project. Users were unsure how this would compete with other tooling like Alteryx; it was one of the lowest-rated ideas in the Kedro IDE exploratory concept tests.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Standalone Data Catalog also known as Mini-Kedro This solution was supposed to make it easy for people to use Kedro for EDA (Data Catalog and ConfigLoader) by generating a mini-project template with the conf directory and a notebook. It also thought about a journey into the full Kedro project template. We don't have evidence to suggest this feature is well adopted; users solely leverage the Data Catalog (and even use alternative libraries for loading configuration), @Galileo-Galilel created a custom starter for his teams, and it does expect that users should have a partial understanding of the project template (conf).
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Kedro Plugins proposed by @noklam Gives users the ability to use the DataCatalog and ConfigLoader as standalone tools. Requires more information.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Supporting the reporting use case on Kedro-Viz also known as the Parameter Editor (Slide 18) This design assumes there is an existing Kedro framework user, and the user of this feature might not be a Kedro framework user and they want to tweak parameters to get insights. This design requires part of the team to know the framework still, but not everyone needs to know it.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories Notebook Rename the directories under conf Users would still have to know much about the framework, specifically the project template, so this only helps a little. We discovered that base and local confused some users, but we never completed the rename because the results were inconclusive. @idanov considered allowing users to choose their own names (#770).
Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework Notebook Use Kedro as a library (in part in #2819) This is possible; it needs to be promoted. Users need to know they can use the Data Catalog without worry - the ConfigLoader is out of scope.
IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework IDE + notebook Move AbstractDataset to kedro-datasets (#2409) IDE-focussed users know how to use the Data Catalog this way, but this user group wants the AbstractDataset to exist in kedro-datasets. They don't want to import kedro or install dependencies related to kedro when leveraging this functionality (#1758).
IDE-focussed users workaround our ConfigLoader's assumptions IDE + notebook Make it easier to use the ConfigLoader (#2819) This would allow users to leverage the ConfigLoader in their work, especially with the DataCatalog or Parameters. Once again, this promotes using Kedro as a library, which might translate into something other than framework adoption. But this should help notebook-focussed users with simple projects adopt some best-practice, and they will not need an IDE.
@merelcht
Copy link
Member

I have moved this to the Kedro wiki, because this is not an issue we would take on as Sprint work as is: https://github.com/kedro-org/kedro/wiki/Insights-and-opportunities-related-to-helping-Kedro-impact-more-users

@merelcht merelcht closed this as not planned Won't fix, can't repro, duplicate, stale Mar 27, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in Kedro-Viz Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Status: Done
Development

No branches or pull requests

2 participants