Skip to content

Commit

Permalink
fix(dswm): Fix typos again
Browse files Browse the repository at this point in the history
  • Loading branch information
MarcusElwin committed Aug 26, 2023
1 parent 55cb21f commit 7327213
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 20 deletions.
2 changes: 1 addition & 1 deletion ds-with-mac/content/_index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Welcome to DS with Mac
subtitle: I'm a Data Scientist turned Product Manager, that works with ML / AI powered data products. On this website I will share my thoughs, learnings and inspirations. All opinions here are my own.
subtitle: I'm a Data Scientist turned Product Manager, that works with ML / AI powered data products. On this website I will share my thoughts, learnings and inspirations. All opinions here are my own.
seo_title: DS with Marc | A blog about data products and ML systems

primary_cta_page: "about"
Expand Down
2 changes: 1 addition & 1 deletion ds-with-mac/content/about/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ I'm a tech and people interested recovering data scientist turned product manage

## My Experience

I'm a Senior Data Scientist turned Product Manager, living in Stockholm, :flag-se: that have been working with Data Science, Machine Learning and ML Systems for the past 5+ years in a mix of companies and industries ranging from retail to fintech. NLP and LLM are some of my current focus areas as well as learning the ropes of *product management*.
I'm a Senior Data Scientist turned Product Manager, living in Stockholm, :flag-se: that have been working with Data Science, Machine Learning and ML Systems for the past 5+ years in a mix of companies and industries ranging from *retail* to *fintech*. NLP and LLMs are some of my current focus areas as well as learning the ropes of *product management*.

I also have experience from other types of ML use cases such as:
* Demand forecasting,
Expand Down
36 changes: 18 additions & 18 deletions ds-with-mac/content/posts/testing-ml/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ author: Marcus Elwin

draft: false
date: 2023-08-20T12:58:11+02:00
lastmod: 2023-08-24T19:21:11+02:00
lastmod: 2023-08-26T13:03:11+02:00
expiryDate:
publishDate:

Expand Down Expand Up @@ -48,23 +48,23 @@ Also check my [GitHub](https://github.com/MarcusElwin/demo-testing-ml) repositor
[^1]: The above quote is excerpted from Huyen, C. (2022). Designing machine learning systems.
{{< /notice >}}

The image below shows a high-level overview of what an *ML-system* is:
The image below shows a high-level overview of what a *ML-system* is:

![ML System](/ml-system.png "Overview of ML System adopted from Huyen, C. (2022). Designing machine learning systems.")

Starting with the *input* part of the system, one can see the following components:
* *ML System users*: this can be both external and internal such as end-users or internal teams or other *ML systems*.
* *Business Requirements*: depending on the company this might be from a *product manager*, *business translator* or other internal *stakeholders* as e.g. marketing.
* *Business Requirements*: depending on the company this might be coming from a *product manager*, *business translator* or other internal *stakeholders* as e.g. marketing.
* *ML System developers*: different roles such as *ML Engineer*, *AI Engineer*, *Data Scientist*, *Data Engineer* or *Software Developer*.

As popularized by Google in their 2015 paper [Hidden Technical Debt in Machine Learning Systems](https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf) the actual ML algorithm is a quite small component of the entire system. You may have heard that data scientists tend to spend >= 20% of their time on actual modelling, and <= 80% of their time on other activities such as cleaning of data. This of course varies between different companies, but I have rarely worked at places where modelling has been 100% of my focus.
As popularized by Google in their 2015 paper [Hidden Technical Debt in Machine Learning Systems](https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf) the actual ML algorithm is a quite small component of the entire system. You may have heard that data scientists tend to spend >= **20%** of their time on actual modelling, and <= **80%** of their time on other activities such as cleaning of data. This of course varies between different companies, but I have rarely worked at places where modelling has been 100% of my focus.

Whilst *infrastructure*, *data*, *feature engineering*, *evaluation* and *deployment* are all **vital** components, especially when going from experimentation all the way to production. This is probably one of the reasons why *ML Engineering* has been so popular in the recent years. In my experience the end-to-end (E2E) system design should be thought of already in the earlier stages of developing a ML system to ensure sucess of a ML powered project or product.
Looking inside the "box" of a ML-system *infrastructure*, *data*, *feature engineering*, *evaluation* and *deployment* are all **vital** components, especially when going from experimentation all the way to production. This is probably one of the reasons why *ML Engineering* has been so popular in the recent years. In my experience the end-to-end (E2E) system design should be thought of already in the earlier stages of developing a ML system to ensure sucess of a ML powered project or product.

On another note, here we use the term *ML System* but you might have also heard *data product*:

{{< notice note >}}
Some might se a ML system a form of **data product**. There are many other examples but key thing is that *data* is an important component to building the product experience.
Some might see a ML system a form of **data product**. There are many other examples (*data source*, *a table*, *dashboard*, etc) but key thing here, is *data* which is an important component to building the *product experience*.
{{< /notice >}}


Expand All @@ -80,13 +80,13 @@ However, best practice is to test as much as you can e.g. `test-driven` developm

By design ML systems and ML algorithms are `non-deterministic` and depending on the algorithm you choice it might be hard to exactly understand the inner workings (i.e. **white-box** vs **black-box** approaches). An ML system is not better then what data we feed to it i.e. *Garbage in Garbage Out* (GIGO), and data we use tend to be *biased* in some way or form.

Also with the advent of *Large Language Models* (LLMs) which is making the access to and development of ML powered systems accessible to anyone with API calling skills. Testing and making sure that such a system works (on common problem for LLMs is e.g. *hallucinations*), is imperative.
Also with the advent of *Large Language Models* (LLMs) which is making the access to and development of ML powered systems accessible to anyone with API calling skills. Testing and making sure that such a system works (one common problem for LLMs is e.g. *hallucinations*), is imperative.

## Testing a ML system vs testing a traditional software system

The image below shows some key difference between a *traditional* software system (SW), what some mmight call *software 1.0* and a Machine Learning (ML) powered system, what some would call *software 2.0*:
The image below shows some key difference between a *traditional* software system (SW), what some might call *software 1.0* and a Machine Learning (ML) powered system, what some would call *software 2.0*:
1) In a traditional SW system *data* together with *logic* is used as input to produce a *desired behaviour*.
2) In a ML system *data* together with *desired behaviour* is used to as input to produce some *logic*.
2) In a ML system *data* together with *desired behaviour* is used as input to produce some *logic*.

![SW vs ML System](/sw-vs-ml-system.png "Differences between a traditional SW system and ML system. Note the differences in input and output for the different approaches. Adopted from J. Jordan (2020) Effective testing for machine learning systems.")

Expand Down Expand Up @@ -196,7 +196,7 @@ These type of tests do normally fall into two different groups: *invariance test
### Invariance test(s)
Real-world data might change due to various reason as we eluded to previously. These test aims to test how **stable** and **consistent** the ML model is to **pertubations**. The logic around these types of tests, can also be applied to training a model which is a form of **data augmentation**.

Some tests to consider:
Some tests that could be good to consider:

{{< notice tip >}}
1) Assert that model output consistent to small changes in a *feature* of interest.
Expand All @@ -221,7 +221,7 @@ Then at time time *t+1* the dataset looks like the below instead:

:question: Do you notice any changes here in the underlying data? This type of behaviour is something we want to test and make sure that our model learns to handle in order to be considered *stable* and *consistent*.

[faker](https://faker.readthedocs.io/en/master/) and [factor_boy](https://factoryboy.readthedocs.io/en/stable/) are some good libraries, that I have used to generate dummy data for these type of tests:
Should you test with real data or dummy data? Short anwser is that it depends. I have found that [faker](https://faker.readthedocs.io/en/master/) and [factor_boy](https://factoryboy.readthedocs.io/en/stable/) are some good libraries to generate dummy data for these type of tests:

{{< highlight python "linenos=inline, style=monokai" >}}
import factory
Expand Down Expand Up @@ -294,16 +294,16 @@ def test_amount_invariance(self):
Note that we are using the `assertAlmostEqual` here in the test and allow a deviance of `5%` in predictions in this example. If we would not do so, you would see some *flaky* failed builds in your CI/CD pipeline :tools:.

### Directional Expectations test(s)
Similar to the previous section, these type of tests allows us to define a set of **pertubations** to the input which should have a predictable effect on the model output. Meaning that we would only vary a feature of interest, by keeping everything else the same. Similar to what you would do with e.g. a `partial` depdency plot but applied to testing. The logic around these types of tests, can also be applied to training a model which is a form of **data augmentation**.
Similar to the previous section, these type of tests allows us to define a set of **pertubations** to the model input which should have a predictable effect on the model output. Meaning that we would only vary a feature of interest, by keeping everything else the same. Similar to what you would do with e.g. a `partial` depdency plot but applied to testing. The logic around these types of tests, can also be applied to training a model which is a form of **data augmentation**.

Some tests to consider:
Some tests that could be good to consider:

{{< notice tip >}}
1) Assert that model output is *similar* by increasing a certain *feature* of interest, whilst keeping all other features constant.
2) Assert that model output is *similar* by decreasing a certain *feature* of interest, whilst keeping all other features constant.
{{< /notice >}}

Note the use of *similar* above, as we cannot guarante that the model output will be 100% equal in these case. Instead, on needs to operate on a range of allowable threhsolds e.g. **1-3** standard deviation from the mean or +/- **2,5** p.p. as examples. What you should set as good threhsolds depends on your use case and data.
Note the use of *similar* above, as we cannot guarante that the model output will be 100% equal in these case. Instead, one needs to operate on a range of allowable *threhsolds* e.g. **1-3** standard deviation from the mean or +/- **2,5** p.p. as examples. What you should set as good threhsolds depends on your use case and data.

We build another `DataTypeFakeFactory` for the directional expectatons test:

Expand Down Expand Up @@ -342,7 +342,7 @@ def test_increasing_amount_directional_expectations(self):
{{< / highlight >}}

## Data-drift test(s)
In a real-world setting input data and model output will change (it is not *stationary*), the image below shows an example of drift in income distribution:
In a real-world setting input data and model output will change (it is not *stationary*), the image below shows an example of *drift* in income distribution:

![Drift](/drift.png "Example of data drift or label shift where income distribution in a certain market has drifted to the left between two different time periods, from low income to more people with high income. This can be due to natural reasons such as improved living standards, but needs to be handle by a ML system built for e.g. predicting income or using income as a feature.")

Expand All @@ -361,13 +361,13 @@ Due to the cases above we need ways of identifying when data is *drifting* betwe
* Adding static rules for handling of edge-cases
* Collecting more data to make the sample more representative.

Some tests to consider:
Some tests that could be good to consider:

{{< notice tip >}}
1) Test that the distribution of a certain *feature* has not changed *too much* over two time periods.
{{< /notice >}}

You can replace *feature* with any feature of interest but do note, that we use *test* and *too much* above. This as what it normally boils down to is to evaluate via statistical tests if there has been any *significant* difference in the underlying distribution of the input data or the model predictions. [Albi Detect](https://github.com/SeldonIO/alibi-detect) maintend by the company [Seldon](https://www.seldon.io/) has a quite nice list of drift detection [methods](https://github.com/SeldonIO/alibi-detect#drift-detection).
You can replace *feature* with any feature of interest but do note, that we use *test* and *too much* above. This as what it normally boils down to is to evaluate via *statistical tests* if there has been any *significant* difference in the underlying distribution of the input data or the model predictions. [Albi Detect](https://github.com/SeldonIO/alibi-detect) maintend by the company [Seldon](https://www.seldon.io/) has a quite nice list of drift detection [methods](https://github.com/SeldonIO/alibi-detect#drift-detection).

What I have used before in these scenarios:
{{< notice tip >}}
Expand Down Expand Up @@ -406,7 +406,7 @@ def test_mean_drift(self):
self.assertTrue(True)
{{< / highlight >}}

In the examples below, you can of course replace *mean* with any other statistical metric such as *median*, *variance* etc. The features don't necessary have to be *numerical* in order for you to do *drift* tests. For `non-numerical` features you need to transform them to a distribution via e.g. `binning` or creating `indicator` features. If the drift checks should be alerts in another system or parts of your CI pipeline is up to you, the important take away is that you have a process around it and can get alerted either before deployment or after.
In the examples below, you can of course replace *mean* with any other statistical metric such as *median*, *variance* etc. The features don't necessary have to be *numerical* in order for you to do *drift* tests. For `non-numerical` features you need to transform them to a distribution via e.g. `binning` or creating `indicator` features. If the drift checks should be alerts in another system or parts of your CI pipeline is up to you, the important take away is that you have a process around it will be alerted either before or after deployment. Prefarably both.

Much more can be said about *drift-detection* that might be a topic for another post in the future.

Expand Down

0 comments on commit 7327213

Please sign in to comment.