Enhancement/591 Previewing of OpenAI Embeddings #622

faisal-alvi · 2023-11-24T16:42:50Z

Description of the Change

The Post Classification feature within Language Processing allows for previewing settings changes when utilizing the IBM Watson service provider, but not for OpenAI Embeddings. This PR adds previewing for embeddings.

Closes #591

How to test the Change

Visit the embedding settings and confirm the previewing works.

Changelog Entry

Added - Previewing of OpenAI Embeddings

Credits

Props @jeffpaul @dkotter @faisal-alvi

Checklist:

I agree to follow this project's Code of Conduct.
I have updated the documentation accordingly.
I have added tests to cover my change.
All new and existing tests pass.

faisal-alvi · 2023-11-24T18:47:58Z

@dkotter this is ready for your review.

includes/Classifai/Providers/OpenAI/Embeddings.php

dkotter

Couple things I'm noticing when testing:

The preview box is overlapping the settings for me. I compared it to the Watson preview box and it seems to be rendering at a larger width, not sure why though:

If the feature isn't turned on the preview box shouldn't show. This is how the Watson preview behaves

faisal-alvi · 2023-11-29T10:19:04Z

The preview box was not overlapping for me not sure why, but checking again it was overlapping. I saw there was a negative 5rem of margin applied to the preview box (not sure why but) that does not seem okay so I removed that and the box was in the correct position.

Before

After

feature isn't turned on the preview box shouldn't show

Fixed.

jeffpaul · 2023-12-06T21:24:30Z

@faisal-alvi @dkotter one item I noticed during UAT today was the Embeddings Preview appears to only show sample recommended terms if they were at or above the threshold set (e.g. 75%). It's possible that the demo site didn't have :ALOT: of sample terms so perhaps this was a sample data issue, but one thing that would be good to check is adding some terms that we know should be below the 75% threshold and then seeing if they appear but under the default 75% threshold in the previewer (e.g. 68%, 54%).

A separate item that I noticed was when Embeddings and Watson were both active was that the Watson Taxonomies were showing in the Embeddings preview even if those taxonomies were not selected in the Embeddings settings. So we should ensure the Embeddings preview only shows the taxonomies that are enabled in the settings there (and probably the same for Watson that only the taxonomies set are what are show in the Watson preview).

… the threshold value. After running the calculation, remove any results that are above our threshold. Use a higher level of precision when converting our decimals to percentages so we can more easily see the differences in each value.

dkotter · 2023-12-07T18:59:43Z

one item I noticed during UAT today was the Embeddings Preview appears to only show sample recommended terms if they were at or above the threshold set (e.g. 75%). It's possible that the demo site didn't have :ALOT: of sample terms so perhaps this was a sample data issue, but one thing that would be good to check is adding some terms that we know should be below the 75% threshold and then seeing if they appear but under the default 75% threshold in the previewer (e.g. 68%, 54%).

Couple things I noted (and fixed) while testing this today.

The way we were using the new threshold value in our similarity calculation was incorrect. We were passing that value in and using it to set the maximum value that would be returned. What this means is any value that was above the threshold would still be returned but will have it's value set to the threshold value. This is one reason why the same values were showing a lot (like 75%). I've changed this calculation to function more like it used to and added some code that removes any results that are above the threshold
We were trimming down to 2 decimal places after the calculation and then we round to 2 decimal places when rendering. As mentioned before, the scores we get here are all pretty close so we need a higher level of precision to more accurately see the differences between each item. I've increased our calculation from using 2 decimal places to using 10 decimal places (display still only uses 2) and results are more accurately displayed now:

The results we display is limited to the Number of terms setting. So in UAT, I had this set to 2 which is why we only ever saw 2 results. I thought about removing this limit during previews but for sites with hundreds or thousands of terms, the UI would not be great

A separate item that I noticed was when Embeddings and Watson were both active was that the Watson Taxonomies were showing in the Embeddings preview even if those taxonomies were not selected in the Embeddings settings. So we should ensure the Embeddings preview only shows the taxonomies that are enabled in the settings there (and probably the same for Watson that only the taxonomies set are what are show in the Watson preview).

I tried replicating this today and was not able to; everything seems to be working as expected. I think this was a consequence of having multiple tabs open and switching between features without fully refreshing each time but if I notice it again, I can look to fix

dkotter · 2023-12-07T19:01:50Z

One other thing I was going to note that we may want to look at (probably in a separate PR at this point) is our threshold value goes from 0 to 100. As you can see in the screenshot above, we see values that are pretty close to each other, often separated by a decimal or two. It may make sense to allow more granular control in our threshold picker so you could choose something like 75.5% or 76.25%, as examples. There may also be other ways to more intuitively handle these very small differences in scores.

faisal-alvi · 2023-12-08T12:19:16Z

I was testing the changes and found the threshold limit is no longer respected. We can see the assigned terms even if they do not match the min—required threshold.

I found a false condition that was causing this; fixing it fixed the issue. bcee113

faisal-alvi · 2023-12-08T12:29:49Z

one item I noticed during UAT today was the Embeddings Preview appears to only show sample recommended terms if they were at or above the threshold set (e.g. 75%). It's possible that the demo site didn't have :ALOT: of sample terms so perhaps this was a sample data issue, but one thing that would be good to check is adding some terms that we know should be below the 75% threshold and then seeing if they appear but under the default 75% threshold in the previewer (e.g. 68%, 54%).

@jeffpaul the previewer respects the settings, and that is why it was not displaying the terms below 75%. I have updated the logic, so for the previewer, we will not consider the threshold limit setting and display the terms even if they have threshold value below the set limit.

Example screenshot; when 85% is set as category threshold, 80% for the other taxonomies:

before

after

faisal-alvi · 2023-12-08T12:32:27Z

@dkotter also since the develop is merged here, I can see the E2E tests failing, not sure the exact cause.

dkotter · 2023-12-08T15:44:01Z

@dkotter also since the develop is merged here, I can see the E2E tests failing, not sure the exact cause.

Yeah, tests started failing a couple days ago on the Set up WP environment step. I did some initial digging and I think the problem is GitHub updated the environment we use (ubuntu-latest) and this is breaking things, though not exactly sure how to fix this yet

faisal-alvi added 7 commits November 24, 2023 21:31

run an ajax for preview

83e4215

init dry run for embedding terms

390037f

return terms instead of saving

d07faa8

display the preivew snippet on the tab

3a128ae

check for nonce

2b6e488

handle the client side actions

cd1862d

add e2e test

64a02fb

faisal-alvi added the type:enhancement New feature or request. label Nov 24, 2023

faisal-alvi added this to the 2.6.0 milestone Nov 24, 2023

faisal-alvi self-assigned this Nov 24, 2023

faisal-alvi requested review from dkotter, jeffpaul and a team as code owners November 24, 2023 16:42

faisal-alvi added 4 commits November 24, 2023 22:32

eslint fixes

40ae9a7

fix e2e tests

6f777cf

fix eslint errors

0060ddf

typo fix

35aeebc

dkotter requested changes Nov 27, 2023

View reviewed changes

includes/Classifai/Providers/OpenAI/Embeddings.php Outdated Show resolved Hide resolved

includes/Classifai/Providers/OpenAI/Embeddings.php Outdated Show resolved Hide resolved

faisal-alvi added 3 commits November 28, 2023 15:53

correct the function doc

3e28c0c

dry run with a seprate function

28d14bf

code improvements

38c5b99

faisal-alvi requested a review from dkotter November 28, 2023 11:30

dkotter modified the milestones: 2.6.0, 2.5.0 Nov 28, 2023

dkotter reviewed Nov 28, 2023

View reviewed changes

includes/Classifai/Providers/OpenAI/Embeddings.php Outdated Show resolved Hide resolved

dkotter reviewed Nov 28, 2023

View reviewed changes

includes/Classifai/Providers/OpenAI/Embeddings.php Outdated Show resolved Hide resolved

Update since version

7bd994f

dkotter requested changes Nov 28, 2023

View reviewed changes

Merge branch 'develop' into enhancement/591

928f88d

faisal-alvi added 3 commits November 29, 2023 15:39

remove unnecssary -ve left margin

e86a213

fix the provide class selection

4fd02c1

handle the classification names

d7dc76b

remove unnecessary code

e04e66e

faisal-alvi requested a review from dkotter November 29, 2023 10:26

dkotter previously approved these changes Nov 29, 2023

View reviewed changes

dkotter requested a review from berkod November 29, 2023 16:46

berkod previously approved these changes Dec 4, 2023

View reviewed changes

jeffpaul mentioned this pull request Dec 6, 2023

Release version 2.5.0 #631

Closed

22 tasks

jeffpaul mentioned this pull request Dec 6, 2023

Add support for Embeddings setup for sites with high numbers of terms #633

Closed

1 task

dkotter added 2 commits December 7, 2023 10:50

Merge branch 'develop' into enhancement/591

c79ec83

dkotter dismissed stale reviews from berkod and themself via ae66799 December 7, 2023 18:59

fix the issue

bcee113

dont consider threshold for preview

df0d677

faisal-alvi requested a review from dkotter December 8, 2023 12:30

dkotter approved these changes Dec 8, 2023

View reviewed changes

Merge branch 'develop' into enhancement/591

faff524

jeffpaul mentioned this pull request Dec 8, 2023

Increase precision of Embeddings threshold percentages to allow for 2 decimal places #637

Open

1 task

dkotter merged commit 7e6f0b7 into develop Dec 12, 2023
13 checks passed

dkotter deleted the enhancement/591 branch December 12, 2023 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement/591 Previewing of OpenAI Embeddings #622

Enhancement/591 Previewing of OpenAI Embeddings #622

faisal-alvi commented Nov 24, 2023 •

edited

Loading

faisal-alvi commented Nov 24, 2023

dkotter left a comment

faisal-alvi commented Nov 29, 2023

jeffpaul commented Dec 6, 2023

dkotter commented Dec 7, 2023

dkotter commented Dec 7, 2023

faisal-alvi commented Dec 8, 2023

faisal-alvi commented Dec 8, 2023

faisal-alvi commented Dec 8, 2023

dkotter commented Dec 8, 2023

Enhancement/591 Previewing of OpenAI Embeddings #622

Enhancement/591 Previewing of OpenAI Embeddings #622

Conversation

faisal-alvi commented Nov 24, 2023 • edited Loading

Description of the Change

How to test the Change

Changelog Entry

Credits

Checklist:

faisal-alvi commented Nov 24, 2023

dkotter left a comment

Choose a reason for hiding this comment

faisal-alvi commented Nov 29, 2023

Before

After

jeffpaul commented Dec 6, 2023

dkotter commented Dec 7, 2023

dkotter commented Dec 7, 2023

faisal-alvi commented Dec 8, 2023

faisal-alvi commented Dec 8, 2023

before

after

faisal-alvi commented Dec 8, 2023

dkotter commented Dec 8, 2023

faisal-alvi commented Nov 24, 2023 •

edited

Loading