Fix merge conflicts in feature-openapi #285

adrianwyatt · 2023-04-03T17:12:40Z

Motivation and Context

Conflict resolutions in preparation for feature-openapi merge to main.

### Motivation and Context Add GPT2/GPT3 tokenizer, to allow counting tokens when using OpenAI. ### Description Add C# port of the tokenizers recommended by OpenAI. See https://platform.openai.com/tokenizer for more info.

#178) Bumps [Microsoft.Azure.Functions.Worker](https://github.com/Azure/azure-functions-dotnet-worker) from 1.10.1 to 1.13.0.

### Bugfix Minutes and Seconds returning wrong information

) ### Motivation and Context Remove hardcoded models, allow to use ChatGPT, Image generation like DallE, any/custom Text Generation, and Text Embedding Currently SK supports only "text completion" and "text embedding", and only OpenAI and Azure OpenAI. One cannot use custom LLMs. If one wants to put a proxy in front of OpenAI with a custom API, SK doesnt' support it. "Completion" and "Embedding" are not explicitly scoped to "text", while the implementation supports only text completion and text embeddings. Also SK has no helpers for DallE and ChatGPT. ### Description The PR contains the following features and improvements: * Introduce the concept of Service Collection (without using .NET ServiceCollection yet, because that would take more work). The service collection can cointain 4 service types: * Text Completion services * Text Embedding generators * Chat Completion services * Image Generation services * Add client for OpenAI ChatGPT, and example (example 17) * Add client for OpenAI DallE, and example (example 18) * Show how to use a custom LLM, e.g. in case of proxies or local HuggingFace models (example 16) * Add Secret Manager to syntax examples, to reduce the risk of leaking secrets used to run the examples * Add 3 new syntax examples: custom LLM, ChatGPT, DallE

1. Why is this change required? The current example prompts at the end of the sample don't flow well - the first prompt is asking for generic ideas while the second prompt assumes that just one idea was output. The third one, abruptly switches to asking about a book with no mention of a book in the previous two prompts and responses. 3. What problem does it solve? More intuitive and realistic chat experience for a new developer trying out the sample for the very first time. 5. What scenario does it contribute to? Better developer experience while conveying the power of related (follow up) prompts aligned with a more realistic flow. ### Description Cosmetic, non-code changes ensuring that this particular example also flows well just like the other examples.

### Motivation and Context  Most modern operating systems, including macOS, Linux, and Unix, use `\n` as the line-ending character. Windows, on the other hand, uses `\r\n` as the line-ending character. However, when sending data over the internet, `\n` is the standard line-ending character. Therefore, if you're working on a Windows machine and your text contains `\r\n`, you should convert it to `\n` before sending the prompt to OpenAI API. This will ensure that the API can correctly interpret the line endings in your prompt. Prompt example: ``` Given a json input and a request. Apply the request on the json input and return the result. Put the result in between <result></result> tags Input: {\"name\": \"John\", \"age\": 30} Request: name ``` Expected completion: (this is what happens when using `\n` for line endings) ``` <result>John</result> ``` Actual completion: ``` " to upper case. <result>\"name\": \"JOHN\", \"age\": 30}</result> ``` ### Description  1. Implemented `NormalizePrompt` method in `OpenAIClientAbstract` class. 2. Added usage of new method in `AzureTextCompletion` and `OpenAITextCompletion` classes. 3. Added integration tests for verification.

### Motivation and Context Latest version 0.9 has some new configuration APIs and breaking changes that require the notebooks to be updated ### Description Use new SK API in the notebooks

### Motivation and Context Update README with new API. Lock notebooks to 0.9 so we can release new APIs breaking notebooks.

I was trying to run the Getting Started Notebook from step 3 on the [Setting up Semantic Kernel](https://learn.microsoft.com/en-us/semantic-kernel/get-started) documentation page. In the first cell, I got a nuget error: `error NU1101: Unable to find package ...` Incorporated a fix from https://stackoverflow.com/a/73961223/620501 into the troubleshooting section of the readme.

### Motivation and Context This pull request/PR review is to add the ability for the Semantic Kernel to persist embeddings/memory to external vector databases like Qdrant. This submission has modifications and additions to allow for integration into the current SK Memory architecture and the subsequent SDK/API of various Vector Databases. **Why is this change required?** This change is required in order to allow SK developers/users to persist and search for embeddings/memory from a Vector Database like Qdrant. **What problem does it solve?** Adds capabilities of Long Term Memory/Embedding storage to a vector databases from the Semantic Kernel. **What scenario does it contribute to?** Scenario: Long term, scalability memory storage, retrieval, search, and filtering of Embeddings. **If it fixes an open issue, please link to the issue here.** N/A ### Description This PR currently includes connection for the Qdrant VectorDB only, removing the initial Milvus VectorDB addition and VectorDB client interfaces for consistency across various external vector databases, which will be provided in forthcoming PR. - Addition and Modification of Qdrant.Dotnet SDK - Addition of new namespace Skills.Memory.QdrantDB - Creating/Adding Qdrant Memory Client class and QdrantMemoryStore.cs. Adding methods for connecting, retrieving collections and embeddings from Qdrant vector database in cloud.

### Motivation and Context Cleaning up tech debt as per plan, last couple of months * Moved OpenAI code to Connectors namespace * Label => Service Id * Backend => Service / Connector * EmbeddingGenerator => EmbeddingGeneration / TextEmbeddingGeneration

**Motivation and Context** We want to be able to compress and decompress file. **Description** Add the FileCompressionSkill in dotnet samples folder

### Motivation and Context With the `0.9.61.1-preview` nuget package, the API adding the text completion model needs to be updated.

### Motivation and Context [HuggingFace](https://huggingface.co/) is a platform with over 120k different models, so we need to provide a way to use these models with Semantic Kernel. While users can implement their own backend implementation for HuggingFace models, we can provide default implementation which will be ready for usage out-of-the-box. This PR is for merging [experimental-huggingface](https://github.com/microsoft/semantic-kernel/tree/experimental-huggingface) branch to `main` from new branch in order to avoid a lot of merge conflicts. ### Description PR includes: 1. Local inference server for running models locally with Python. 2. Implemented `HuggingFaceTextCompletion` and `HuggingFaceTextEmbeddingGeneration` classes. 3. `HuggingFaceTextCompletion` works with local models as well as remote models hosted on HF servers. 4. Unit and integration tests for new classes. 5. HuggingFace usage example in `KernelSyntaxExamples` project.

### Motivation and Context Reduce the risk of checking in secrets, removing the ".env" files in the repo. ".env" is already in .gitignore, but some files were checked in earlier. ### Description Rename all .env to .env.example.

… classes (#250) ### Motivation and Context `MemoryRecord`, along with several useful classes in the `SemanticKernel.Memory.Collections` namespace currently have an internal visibility modifier. This prevents 3rd party libraries from reusing code when creating implementations of `IEmbeddingWithMetadata<float>`, which is needed in implementations of `IMemoryStore<TEmbedding>` Two implementations are currently blocked on this: * Skills.Memory.Sqlite * Skills.Memory.CosmosDB As both of these backing stores are not vector DB stores, they implement cosine similarity comparisons locally, which is performed using `TopNCollection` and its dependencies, all of which are currently internal. Related community feedback item here: #202 ### Description Change the following classes from internal to public: * MemoryRecord * MinHeap * Score * ScoredValue * TopNCollection

### Motivation and Context When using a database backed memory store, there is typically a unique id constraint. The current code will supply the same id when there are multiple paragraphs that have been summarised for the same file. ### Description This PR appends a number to the end of each id to guarantee uniqueness.

This change removes unnecessary destructor/finalizer declared by OpenAIClientAbstract class and inherited by the backend classes implementing it. Finalizers are usually used to release/clean **unmanaged** resources referenced directly through OS handles. Taking into account that neither of SK SDK backends is using any unmanaged resource directly, the finalizer is not really needed and having the IDisposable.Dispose method is enough to release **managed** resources. There's also a needless loss of performance associated with class finalizers - every instance of the class that implements the finalizer will be put into a special Finalize queue by garbage collector (GC), then the queue got processed by GC, and only on the next GC cycle the instances will be removed from RAM.

Fix README to reflect the content of the latest nuget (AddAzureOpenAITextCompletion call and required libs)

### Motivation and Context Conflict resolutions in preparation for feature-openapi merge to main.

### Motivation and Context We want to run integration tests as part of our CI/CD pipeline ### Description Run integration tests upon check-ins to merge ### Contribution Checklist - [ ] The code builds clean without any errors or warnings - [ ] The PR follows the [Contribution Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [ ] All unit tests pass, and I have added new tests where possible - [ ] I didn't break anyone 😄

### Motivation and Context Conflict resolutions in preparation for feature-openapi merge to main.

dluc and others added 21 commits March 27, 2023 15:54

Add OpenAI GPT tokenizer class (#148)

8f6dcb9

### Motivation and Context Add GPT2/GPT3 tokenizer, to allow counting tokens when using OpenAI. ### Description Add C# port of the tokenizers recommended by OpenAI. See https://platform.openai.com/tokenizer for more info.

Bump Microsoft.Azure.Functions.Worker from 1.10.1 to 1.13.0 in /dotnet (

72ebfa2

#178) Bumps [Microsoft.Azure.Functions.Worker](https://github.com/Azure/azure-functions-dotnet-worker) from 1.10.1 to 1.13.0.

Small fix to TimeSkill minutes and seconds (#184)

ed447a6

### Bugfix Minutes and Seconds returning wrong information

Upgrade notebooks to new nuget version (#194)

8bbd432

### Motivation and Context Latest version 0.9 has some new configuration APIs and breaking changes that require the notebooks to be updated ### Description Use new SK API in the notebooks

Update README and lock notebooks to 0.9 (#206)

ab80c7c

### Motivation and Context Update README with new API. Lock notebooks to 0.9 so we can release new APIs breaking notebooks.

Add FileCompression skill (#82)

ad77101

**Motivation and Context** We want to be able to compress and decompress file. **Description** Add the FileCompressionSkill in dotnet samples folder

Update API for getting-started-notebook (#221)

b86e2dd

### Motivation and Context With the `0.9.61.1-preview` nuget package, the API adding the text completion model needs to be updated.

Remove files that could contain secrets (#244)

bcd740b

### Motivation and Context Reduce the risk of checking in secrets, removing the ".env" files in the repo. ".env" is already in .gitignore, but some files were checked in earlier. ### Description Rename all .env to .env.example.

Update README.md (#263)

cf1d2b4

Fix README to reflect the content of the latest nuget (AddAzureOpenAITextCompletion call and required libs)

Fix nuget, include tokenizer data files into the output dir (#269)

9a825d0

Merge branch 'main' into feature-openapi

eb40c60

adrianwyatt self-assigned this Apr 3, 2023

adrianwyatt requested a review from shawncal April 3, 2023 17:14

shawncal approved these changes Apr 3, 2023

View reviewed changes

adrianwyatt merged commit 8e6f87a into microsoft:feature-openapi Apr 3, 2023

dehoward pushed a commit to lemillermicrosoft/semantic-kernel that referenced this pull request Jun 1, 2023

Fix merge conflicts in feature-openapi (microsoft#285)

ca42f67

### Motivation and Context Conflict resolutions in preparation for feature-openapi merge to main.

johnoliver pushed a commit to johnoliver/semantic-kernel that referenced this pull request Jun 5, 2024

Fix merge conflicts in feature-openapi (microsoft#285)

daf8f34

### Motivation and Context Conflict resolutions in preparation for feature-openapi merge to main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix merge conflicts in feature-openapi #285

Fix merge conflicts in feature-openapi #285

adrianwyatt commented Apr 3, 2023

Fix merge conflicts in feature-openapi #285

Fix merge conflicts in feature-openapi #285

Conversation

adrianwyatt commented Apr 3, 2023

Motivation and Context