Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix merge conflicts in feature-openapi #285

Merged
merged 21 commits into from
Apr 3, 2023
Merged

Fix merge conflicts in feature-openapi #285

merged 21 commits into from
Apr 3, 2023

Conversation

adrianwyatt
Copy link
Contributor

Motivation and Context

Conflict resolutions in preparation for feature-openapi merge to main.

dluc and others added 21 commits March 27, 2023 15:54
### Motivation and Context

Add GPT2/GPT3 tokenizer, to allow counting tokens when using OpenAI.

### Description

Add C# port of the tokenizers recommended by OpenAI. See
https://platform.openai.com/tokenizer for more info.
### Bugfix

Minutes and Seconds returning wrong information
)

### Motivation and Context

Remove hardcoded models, allow to use ChatGPT, Image generation like
DallE, any/custom Text Generation, and Text Embedding

Currently SK supports only "text completion" and "text embedding", and
only OpenAI and Azure OpenAI. One cannot use custom LLMs. If one wants
to put a proxy in front of OpenAI with a custom API, SK doesnt' support
it. "Completion" and "Embedding" are not explicitly scoped to "text",
while the implementation supports only text completion and text
embeddings. Also SK has no helpers for DallE and ChatGPT.

### Description

The PR contains the following features and improvements:

* Introduce the concept of Service Collection (without using .NET
ServiceCollection yet, because that would take more work). The service
collection can cointain 4 service types:
  * Text Completion services
  * Text Embedding generators
  * Chat Completion services
  * Image Generation services
* Add client for OpenAI ChatGPT, and example (example 17)
* Add client for OpenAI DallE, and example (example 18)
* Show how to use a custom LLM, e.g. in case of proxies or local
HuggingFace models (example 16)
* Add Secret Manager to syntax examples, to reduce the risk of leaking
secrets used to run the examples
* Add 3 new syntax examples: custom LLM, ChatGPT, DallE
1. Why is this change required?
The current example prompts at the end of the sample don't flow well -
the first prompt is asking for generic ideas while the second prompt
assumes that just one idea was output. The third one, abruptly switches
to asking about a book with no mention of a book in the previous two
prompts and responses.

3. What problem does it solve?
More intuitive and realistic chat experience for a new developer trying
out the sample for the very first time.

5. What scenario does it contribute to?
Better developer experience while conveying the power of related (follow
up) prompts aligned with a more realistic flow.

### Description
Cosmetic, non-code changes ensuring that this particular example also
flows well just like the other examples.
### Motivation and Context
<!--

Please help reviewers and future users, providing the following
information:

1. Why is this change required?
2. What problem does it solve?
3. What scenario does it contribute to?
4. If it fixes an open issue, please link to the issue here.
-->
Most modern operating systems, including macOS, Linux, and Unix, use
`\n` as the line-ending character. Windows, on the other hand, uses
`\r\n` as the line-ending character. However, when sending data over the
internet, `\n` is the standard line-ending character.

Therefore, if you're working on a Windows machine and your text contains
`\r\n`, you should convert it to `\n` before sending the prompt to
OpenAI API. This will ensure that the API can correctly interpret the
line endings in your prompt.

Prompt example:
```
Given a json input and a request. Apply the request on the json input and return the result. Put the result in between <result></result> tags
Input:
{\"name\": \"John\", \"age\": 30}

Request:
name
```

Expected completion: (this is what happens when using `\n` for line
endings)
```
<result>John</result>
```

Actual completion:
```
" to upper case.
<result>\"name\": \"JOHN\", \"age\": 30}</result>
```

### Description
<!--

Describe your changes, the overall approach, the underlying design.

These notes will help understanding how your code works. Thanks!
-->
1. Implemented `NormalizePrompt` method in `OpenAIClientAbstract` class.
2. Added usage of new method in `AzureTextCompletion` and
`OpenAITextCompletion` classes.
3. Added integration tests for verification.
### Motivation and Context

Latest version 0.9 has some new configuration APIs and breaking changes
that require the notebooks to be updated

### Description

Use new SK API in the notebooks
### Motivation and Context

Update README with new API.
Lock notebooks to 0.9 so we can release new APIs breaking notebooks.
I was trying to run the Getting Started Notebook from step 3 on the
[Setting up Semantic
Kernel](https://learn.microsoft.com/en-us/semantic-kernel/get-started)
documentation page.

In the first cell, I got a nuget error:
`error NU1101: Unable to find package ...`

Incorporated a fix from https://stackoverflow.com/a/73961223/620501 into
the troubleshooting section of the readme.
### Motivation and Context

This pull request/PR review is to add the ability for the Semantic
Kernel to persist embeddings/memory to external vector databases like
Qdrant. This submission has modifications and additions to allow for
integration into the current SK Memory architecture and the subsequent
SDK/API of various Vector Databases.

**Why is this change required?**
This change is required in order to allow SK developers/users to persist
and search for embeddings/memory from a Vector Database like Qdrant.

**What problem does it solve?**
Adds capabilities of Long Term Memory/Embedding storage to a vector
databases from the Semantic Kernel.

**What scenario does it contribute to?**
Scenario: Long term, scalability memory storage, retrieval, search, and
filtering of Embeddings.

**If it fixes an open issue, please link to the issue here.**
N/A

### Description

This PR currently includes connection for the Qdrant VectorDB only,
removing the initial Milvus VectorDB addition and VectorDB client
interfaces for consistency across various external vector databases,
which will be provided in forthcoming PR.

- Addition and Modification of Qdrant.Dotnet SDK
- Addition of new namespace Skills.Memory.QdrantDB
- Creating/Adding Qdrant Memory Client class and QdrantMemoryStore.cs.
Adding methods for connecting, retrieving collections and embeddings
from Qdrant vector database in cloud.
### Motivation and Context

Cleaning up tech debt as per plan, last couple of months

* Moved OpenAI code to Connectors namespace
* Label => Service Id
* Backend => Service / Connector
* EmbeddingGenerator => EmbeddingGeneration / TextEmbeddingGeneration
**Motivation and Context**
We want to be able to compress and decompress file.

**Description**
Add the FileCompressionSkill in dotnet samples folder
### Motivation and Context
With the `0.9.61.1-preview` nuget package, the API adding the text
completion model needs to be updated.
### Motivation and Context

[HuggingFace](https://huggingface.co/) is a platform with over 120k
different models, so we need to provide a way to use these models with
Semantic Kernel. While users can implement their own backend
implementation for HuggingFace models, we can provide default
implementation which will be ready for usage out-of-the-box.

This PR is for merging
[experimental-huggingface](https://github.com/microsoft/semantic-kernel/tree/experimental-huggingface)
branch to `main` from new branch in order to avoid a lot of merge
conflicts.

### Description

PR includes:
1. Local inference server for running models locally with Python. 
2. Implemented `HuggingFaceTextCompletion` and
`HuggingFaceTextEmbeddingGeneration` classes.
3. `HuggingFaceTextCompletion` works with local models as well as remote
models hosted on HF servers.
4. Unit and integration tests for new classes.
5. HuggingFace usage example in `KernelSyntaxExamples` project.
### Motivation and Context

Reduce the risk of checking in secrets, removing the ".env" files in the
repo. ".env" is already in .gitignore, but some files were checked in
earlier.


### Description

Rename all .env to .env.example.
… classes (#250)

### Motivation and Context
`MemoryRecord`, along with several useful classes in the
`SemanticKernel.Memory.Collections` namespace currently have an internal
visibility modifier.

This prevents 3rd party libraries from reusing code when creating
implementations of `IEmbeddingWithMetadata<float>`, which is needed in
implementations of `IMemoryStore<TEmbedding>`

Two implementations are currently blocked on this:
* Skills.Memory.Sqlite
* Skills.Memory.CosmosDB

As both of these backing stores are not vector DB stores, they implement
cosine similarity comparisons locally, which is performed using
`TopNCollection` and its dependencies, all of which are currently
internal.

Related community feedback item here:
#202

### Description
Change the following classes from internal to public:

* MemoryRecord
* MinHeap
* Score
* ScoredValue
* TopNCollection
### Motivation and Context
When using a database backed memory store, there is typically a unique
id constraint. The current code will supply the same id when there are
multiple paragraphs that have been summarised for the same file.

### Description
This PR appends a number to the end of each id to guarantee uniqueness.
This change removes unnecessary destructor/finalizer declared by
OpenAIClientAbstract class and inherited by the backend classes
implementing it.

Finalizers are usually used to release/clean **unmanaged** resources
referenced directly through OS handles. Taking into account that neither
of SK SDK backends is using any unmanaged resource directly, the
finalizer is not really needed and having the IDisposable.Dispose method
is enough to release **managed** resources.

There's also a needless loss of performance associated with class
finalizers - every instance of the class that implements the finalizer
will be put into a special Finalize queue by garbage collector (GC),
then the queue got processed by GC, and only on the next GC cycle the
instances will be removed from RAM.
Fix README to reflect the content of the latest nuget (AddAzureOpenAITextCompletion call and required libs)
@adrianwyatt adrianwyatt self-assigned this Apr 3, 2023
@adrianwyatt adrianwyatt requested a review from shawncal April 3, 2023 17:14
@adrianwyatt adrianwyatt merged commit 8e6f87a into microsoft:feature-openapi Apr 3, 2023
dehoward pushed a commit to lemillermicrosoft/semantic-kernel that referenced this pull request Jun 1, 2023
### Motivation and Context
Conflict resolutions in preparation for feature-openapi merge to main.
golden-aries pushed a commit to golden-aries/semantic-kernel that referenced this pull request Oct 10, 2023
### Motivation and Context
We want to run integration tests as part of our CI/CD pipeline

### Description
Run integration tests upon check-ins to merge

### Contribution Checklist
- [ ] The code builds clean without any errors or warnings
- [ ] The PR follows the [Contribution
Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [ ] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄
johnoliver pushed a commit to johnoliver/semantic-kernel that referenced this pull request Jun 5, 2024
### Motivation and Context
Conflict resolutions in preparation for feature-openapi merge to main.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.