Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(report): AI Powered License Detection Reports for weeks 8-15 #281

Merged
merged 1 commit into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/2024/license-detection/updates/2024-05-30.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Abdelrahman Jamal
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <email.here>
SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com>
-->

# Meeting 1
Expand Down
2 changes: 1 addition & 1 deletion docs/2024/license-detection/updates/2024-06-06.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Abdelrahman Jamal
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <email.here>
SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com>
-->

# Meeting 2
Expand Down
2 changes: 1 addition & 1 deletion docs/2024/license-detection/updates/2024-06-13.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Abdelrahman Jamal
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <email.here>
SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com>
-->

# Meeting 3
Expand Down
2 changes: 1 addition & 1 deletion docs/2024/license-detection/updates/2024-06-20.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Abdelrahman Jamal
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <email.here>
SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com>
-->

# Meeting 4
Expand Down
2 changes: 1 addition & 1 deletion docs/2024/license-detection/updates/2024-06-27.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Abdelrahman Jamal
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <email.here>
SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com>
-->

# Meeting 5
Expand Down
2 changes: 1 addition & 1 deletion docs/2024/license-detection/updates/2024-07-04.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ author: Abdelrahman Jamal
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <email.here>
SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com>
-->

# Meeting 6
Expand Down
4 changes: 2 additions & 2 deletions docs/2024/license-detection/updates/2024-07-11.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ author: Abdelrahman Jamal
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <email.here>
SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com>
-->

# Meeting 7

*(July 4,2024)*
*(July 11,2024)*

## Attendees:
- [Kaushlendra Pratap](https://github.com/Kaushl2208)
Expand Down
130 changes: 130 additions & 0 deletions docs/2024/license-detection/updates/2024-07-18.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
---
title: Week 8
author: Abdelrahman Jamal
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com>
-->

# Meeting 8

*(July 18,2024)*

## Attendees:
- [Kaushlendra Pratap](https://github.com/Kaushl2208)
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)
- [Ayush Bhardwaj](https://github.com/hastagAB)
- [Avinal Kumar](https://github.com/avinal)
- [Anupam Ghosh](https://github.com/ag4ums)
- Katharina Ettinger

## Discussion:
- Improvements on semantic search (somehow)
- test nomos
Discussed several interesting ideas including license compatibility and obligations and so on


### Evaluation of Semantic Search on Nomos Test Dataset

- **Dataset Overview:**
The Nomos test dataset consists of 2054 rows, but I evaluated the algorithm using only 1000 rows to gather initial results.

- **Initial Results:**
1. **Accuracy:** 65.1%
2. **Coverage:** 58.2%

**Challenges:**
- My parser isn't perfect, and there are cases where the algorithm matches minor variations of licenses (e.g., AFL-1.2 instead of AFL-1.1), which is counted as incorrect.
- The real accuracy is likely higher, in the range of 70-75%, considering these minor variations.

- **Next Steps:**
1. Refine the parser to better handle license versions and variations.
2. Reevaluate accuracy with a more comprehensive dataset to improve these metrics.

### Work on License Obligations

- **Introduction to License Obligations:**
License obligations refer to the specific legal requirements imposed by open-source licenses to ensure compliance. I began working on this aspect to explore its potential in the project.

**Key Concepts:**
OSADL (Open Source Automation Development Lab) has developed the "Open Source License Obligations Checklists" project, which helps organizations comply with open-source licenses by:
1. **Encoding Obligations:** Defining what actions are required or prohibited by different licenses.
2. **Creating Checklists:** Structured lists of obligations for various licenses.
3. **Evaluating Compatibility:** Assessing how different licenses interact and whether they can be used together.

- **Progress:**
I started by using OSADL’s checklist as a framework for obligations. Here’s a [link](https://osadl.org/Access-to-raw-data.oss-compliance-raw-data-access.0.html) to OSADL's obligations for various licenses.

**Example Obligation for the Academic Free License v2.0 (AFL-2.0):**
```
USE CASE Source code delivery OR Binary delivery
YOU MUST Reference License text
YOU MUST Search License acceptance
ATTRIBUTE Reasonable
IF Software modification
YOU MUST Forward Copyright notices
YOU MUST Forward Patent notice
YOU MUST Forward Trademark notice
YOU MUST Forward License notice
YOU MUST Provide Modification notice
YOU MUST NOT Promote
IF Modified work Under Original license
EITHER
YOU MUST Include Source code Of Modified work
ATTRIBUTE Machine-readable
OR
YOU MUST Provide Delayed source code delivery Of Modified work
ATTRIBUTE Machine-readable
ATTRIBUTE Via Internet
ATTRIBUTE No profit
ATTRIBUTE Duration As long as distributed
YOU MUST Reference Source code
ATTRIBUTE Below Copyright notices
PATENT HINTS Yes
```
### Experimenting with License Obligation Conversion via LLM

- **Objective:**
I attempted to evaluate whether an LLM could generate license obligations similar to OSADL’s checklist through prompt engineering.

- **Initial Results:**
Using prompt engineering, I tested multiple LLMs to generate obligations directly from license texts. However, the results were inconsistent due to:
1. **Ambiguity in Interpretation:** OSADL's obligations are highly detailed and rely on specific legal interpretations. These are difficult for an LLM to replicate without detailed context.
2. **LLM Limitations:** While the LLM could generate obligations, they were not structured or detailed enough to meet the OSADL standard.


- **Challenges and Next Steps:**
1. The LLM produced obligations, but the results did not meet the precision required, mainly due to different interpretations.
2. Moving forward, I plan to refine the prompt further and explore additional ways to guide the LLM towards creating obligations more aligned with legal standards.

## Conclusions and Next Steps

- **Further Work on Obligations:**
I will continue working on converting licenses into obligations, refining the prompt-engineering process and exploring different approaches to improve the LLM’s performance.

- **Other Tasks:**
1. Begin acknowledging licenses from notice files, extending the current capability of license identification.
2. Continue refining the parser and matching algorithm to better handle edge cases and improve overall accuracy and speed.

## Additional Discussions and Potential Advancements

In this meeting, several potential directions forward were discussed regarding obligations and other interesting license-related applications:

1. **Validate Obligation Rule Against License Text**:
* Provide an obligation rule (e.g., "YOU MUST Provide Modification report") to the LLM.
* Ask the LLM to check if this rule exists within the license text and provide an explanation.

2. **Potential Advancements**

1. **Create Human-Readable Obligations from Machine-Readable Obligations**:
* Convert OSADL's structured, machine-readable obligations into a more user-friendly format.

2. **Check License Compatibility Using Obligations or License Text**:
* Explore using obligations or the raw license text to determine if different licenses are compatible with each other.

3. **Convert License Text to Obligations Using LLM**:
* Continue refining the prompt engineering process to improve the LLM's ability to generate obligations directly from license text.

These potential advancements offer various avenues for further exploration and development within the project.
22 changes: 22 additions & 0 deletions docs/2024/license-detection/updates/2024-07-25.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Week 9
author: Abdelrahman Jamal
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Abdelrahman Jamal <abdelrahmanjamal5565@gmail.com>
-->

# Meeting 9

*(July 25,2024)*

## Attendees:
None

## Discussion:
No meeting was held this week due to my travel commitments with family.



Loading