-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #281 from Hero2323/main
chore(report): AI Powered License Detection Reports for weeks 8-15 Reviewed-by: shaheem.azmal@siemens.com
- Loading branch information
Showing
15 changed files
with
901 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
--- | ||
title: Week 8 | ||
author: Abdelrahman Jamal | ||
--- | ||
<!-- | ||
SPDX-License-Identifier: CC-BY-SA-4.0 | ||
SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com> | ||
--> | ||
|
||
# Meeting 8 | ||
|
||
*(July 18,2024)* | ||
|
||
## Attendees: | ||
- [Kaushlendra Pratap](https://github.com/Kaushl2208) | ||
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd) | ||
- [Ayush Bhardwaj](https://github.com/hastagAB) | ||
- [Avinal Kumar](https://github.com/avinal) | ||
- [Anupam Ghosh](https://github.com/ag4ums) | ||
- Katharina Ettinger | ||
|
||
## Discussion: | ||
- Improvements on semantic search (somehow) | ||
- test nomos | ||
Discussed several interesting ideas including license compatibility and obligations and so on | ||
|
||
|
||
### Evaluation of Semantic Search on Nomos Test Dataset | ||
|
||
- **Dataset Overview:** | ||
The Nomos test dataset consists of 2054 rows, but I evaluated the algorithm using only 1000 rows to gather initial results. | ||
|
||
- **Initial Results:** | ||
1. **Accuracy:** 65.1% | ||
2. **Coverage:** 58.2% | ||
|
||
**Challenges:** | ||
- My parser isn't perfect, and there are cases where the algorithm matches minor variations of licenses (e.g., AFL-1.2 instead of AFL-1.1), which is counted as incorrect. | ||
- The real accuracy is likely higher, in the range of 70-75%, considering these minor variations. | ||
|
||
- **Next Steps:** | ||
1. Refine the parser to better handle license versions and variations. | ||
2. Reevaluate accuracy with a more comprehensive dataset to improve these metrics. | ||
|
||
### Work on License Obligations | ||
|
||
- **Introduction to License Obligations:** | ||
License obligations refer to the specific legal requirements imposed by open-source licenses to ensure compliance. I began working on this aspect to explore its potential in the project. | ||
|
||
**Key Concepts:** | ||
OSADL (Open Source Automation Development Lab) has developed the "Open Source License Obligations Checklists" project, which helps organizations comply with open-source licenses by: | ||
1. **Encoding Obligations:** Defining what actions are required or prohibited by different licenses. | ||
2. **Creating Checklists:** Structured lists of obligations for various licenses. | ||
3. **Evaluating Compatibility:** Assessing how different licenses interact and whether they can be used together. | ||
|
||
- **Progress:** | ||
I started by using OSADL’s checklist as a framework for obligations. Here’s a [link](https://osadl.org/Access-to-raw-data.oss-compliance-raw-data-access.0.html) to OSADL's obligations for various licenses. | ||
|
||
**Example Obligation for the Academic Free License v2.0 (AFL-2.0):** | ||
``` | ||
USE CASE Source code delivery OR Binary delivery | ||
YOU MUST Reference License text | ||
YOU MUST Search License acceptance | ||
ATTRIBUTE Reasonable | ||
IF Software modification | ||
YOU MUST Forward Copyright notices | ||
YOU MUST Forward Patent notice | ||
YOU MUST Forward Trademark notice | ||
YOU MUST Forward License notice | ||
YOU MUST Provide Modification notice | ||
YOU MUST NOT Promote | ||
IF Modified work Under Original license | ||
EITHER | ||
YOU MUST Include Source code Of Modified work | ||
ATTRIBUTE Machine-readable | ||
OR | ||
YOU MUST Provide Delayed source code delivery Of Modified work | ||
ATTRIBUTE Machine-readable | ||
ATTRIBUTE Via Internet | ||
ATTRIBUTE No profit | ||
ATTRIBUTE Duration As long as distributed | ||
YOU MUST Reference Source code | ||
ATTRIBUTE Below Copyright notices | ||
PATENT HINTS Yes | ||
``` | ||
### Experimenting with License Obligation Conversion via LLM | ||
|
||
- **Objective:** | ||
I attempted to evaluate whether an LLM could generate license obligations similar to OSADL’s checklist through prompt engineering. | ||
|
||
- **Initial Results:** | ||
Using prompt engineering, I tested multiple LLMs to generate obligations directly from license texts. However, the results were inconsistent due to: | ||
1. **Ambiguity in Interpretation:** OSADL's obligations are highly detailed and rely on specific legal interpretations. These are difficult for an LLM to replicate without detailed context. | ||
2. **LLM Limitations:** While the LLM could generate obligations, they were not structured or detailed enough to meet the OSADL standard. | ||
|
||
|
||
- **Challenges and Next Steps:** | ||
1. The LLM produced obligations, but the results did not meet the precision required, mainly due to different interpretations. | ||
2. Moving forward, I plan to refine the prompt further and explore additional ways to guide the LLM towards creating obligations more aligned with legal standards. | ||
|
||
## Conclusions and Next Steps | ||
|
||
- **Further Work on Obligations:** | ||
I will continue working on converting licenses into obligations, refining the prompt-engineering process and exploring different approaches to improve the LLM’s performance. | ||
|
||
- **Other Tasks:** | ||
1. Begin acknowledging licenses from notice files, extending the current capability of license identification. | ||
2. Continue refining the parser and matching algorithm to better handle edge cases and improve overall accuracy and speed. | ||
|
||
## Additional Discussions and Potential Advancements | ||
|
||
In this meeting, several potential directions forward were discussed regarding obligations and other interesting license-related applications: | ||
|
||
1. **Validate Obligation Rule Against License Text**: | ||
* Provide an obligation rule (e.g., "YOU MUST Provide Modification report") to the LLM. | ||
* Ask the LLM to check if this rule exists within the license text and provide an explanation. | ||
|
||
2. **Potential Advancements** | ||
|
||
1. **Create Human-Readable Obligations from Machine-Readable Obligations**: | ||
* Convert OSADL's structured, machine-readable obligations into a more user-friendly format. | ||
|
||
2. **Check License Compatibility Using Obligations or License Text**: | ||
* Explore using obligations or the raw license text to determine if different licenses are compatible with each other. | ||
|
||
3. **Convert License Text to Obligations Using LLM**: | ||
* Continue refining the prompt engineering process to improve the LLM's ability to generate obligations directly from license text. | ||
|
||
These potential advancements offer various avenues for further exploration and development within the project. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
title: Week 9 | ||
author: Abdelrahman Jamal | ||
--- | ||
<!-- | ||
SPDX-License-Identifier: CC-BY-SA-4.0 | ||
SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com> | ||
--> | ||
|
||
# Meeting 9 | ||
|
||
*(July 25,2024)* | ||
|
||
## Attendees: | ||
None | ||
|
||
## Discussion: | ||
No meeting was held this week due to my travel commitments with family. | ||
|
||
|
||
|
Oops, something went wrong.