Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Documentation : General Meeting & Overhauling Scheduler Design Discussion Meeting #219

Closed
wants to merge 11 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/2024/scheduler/_category_.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
"label": "Scheduler Overhaul",
"position": 3
}
{
"label": "Scheduler Overhaul",
"position": 3
}
75 changes: 44 additions & 31 deletions docs/2024/scheduler/index.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,44 @@
---
sidebar_position: 1
title: Introduction
slug: /2024/scheduler/
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Aditya Singh <email.here>
-->

## Author

[Aaditya Singh](https://github.com/aadsingh)

## Contact info

- [Email](mailto:email.here)
- [LinkedIn](https://linkedin.com/in/my-user)

## Project title

Scheduler overhaul

## What's the project about?

Insert Text Here

## What should be done?

What are the plans for the project?
---
sidebar_position: 1
title: Introduction
slug: /2024/scheduler/
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Aditya Singh <email.here>
-->

## Author

[Aaditya Singh](https://github.com/Aaditya-Singh78)

## Contact info

- [Email](mailto:singh.aaditya889@gmail.com)
- [LinkedIn](https://www.linkedin.com/in/aadi-singh/)
- [Twitter](https://twitter.com/__Aadityasingh)

## OVERHAULING SCHEDULER DESIGN
The Fossology Scheduler, essential for managing tasks in the Fossology Application, is hindered by its outdated C programming, which lacks error handling and the ability to manage multiple tasks at once. The current design doesn't effectively manage job execution, especially for jobs that need to run independently for each upload but instead block other unrelated tasks.

![Linear Queue](./static/img/linear_queue.png)


## What's the project about?

The project aims to overhaul the Fossology Scheduler, which is critical for managing tasks within the Fossology Application. The current scheduler, developed in C, lacks robust error handling and does not efficiently manage multiple tasks, especially when tasks need to run independently & mutually exclusive. This leads to significant issues in job execution, where independent jobs block each other unnecessarily. The proposal focuses on transitioning the scheduler from C Programming language to Go language, integrating scheduling algorithms & architecture to enhance flexibility, error handling, and task management.

## What should be done?

The plan for the project involves several key steps:

1. **Refactoring the Current Scheduler**: The existing C-based scheduler will be refactored into Go. This shift not only improves readability and maintainability but also integrates built-in exception handling which C lacks. Go's ability for achieving concurrency, better memory management & dynamic handling of garbage collection, will provide a more robust foundation for the scheduler.

2. **Improving Queue Management**: The current linear queue system will be replaced with a more sophisticated management system using a Priority based scheduling algorithm. This approach will allow to adapt data based stratergy for building premptive type or non-prempitive scheduling.

3. **Adopting new Architecture**: Adopting a Multi-Level Feedback Queue approach enhances the system by incorporating the principles of a finite state machine, which adds the capability of integrating data pre-processing for thorough error analysis.

4. **Extensive Testing and Documentation**: Alongside development, the project will include comprehensive testing using Go's library Documentation will be updated to reflect the new scheduler system, ensuring that all features and functionalities are well-documented for current and future development needs.

5. **Deployment and Monitoring**: Before full integration, the new scheduler will be deployed in a staged environment to monitor its performance and make necessary adjustments based on real-world usage.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/2024/scheduler/static/img/bucket.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/2024/scheduler/static/img/linear_queue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 0 additions & 25 deletions docs/2024/scheduler/updates/2023-05-30.md

This file was deleted.

90 changes: 90 additions & 0 deletions docs/2024/scheduler/updates/2024-05-30.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
title: Weekly Updates (Community Bonding)
author: Aaditya Singh
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Aditya Singh <singh.aaditya889@gmail.com>
-->

# Meeting 1

*(May 30,2024)*

## Attendees:

- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)

- [Gaurav Mishra](https://github.com/GMishx)

- [Kaushlendra Pratap](https://github.com/Kaushl2208)

- [Ayush Bhardwaj](https://github.com/hastagAB)

- [Samuel Dushimimana](https://github.com/dushimsam)

- [Soham Banerjee](https://github.com/soham4abc)

- [Shreya Singh](https://github.com/SinghShreya05)

- [Abdelrahman jamal](https://github.com/Hero2323)

- [Aaditya Singh](https://github.com/Aaditya-Singh78)

- [Abhishek Kumar](https://github.com/abhi-kumar17871)

- [Akash Sah](https://github.com/Akashsah2003)

- [Divij Sharma](https://github.com/dvjsharma)

- [Rajul Jha](https://github.com/rajuljha)

- [Valens Niyonsenga](https://github.com/valens200)

## Discussion:

### Mentors

- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd): General updates regarding project. The contributors shall update the GSoC Documentation tool before joining the thursday call.

- [Gaurav Mishra](https://github.com/GMishx): Explains about the GSoC Documentation tool. Different rules to contribute the code and documentation.

### Contributors

- [Abdelrahman jamal](https://github.com/Hero2323)

- Basic code to parse file text from fossology and prompt it to different LLM's. LLM is giving license names and also gives multiple licenses, empty in case of no license found.

- Checking which LLM is best.

- [Shreya Gautam](https://github.com/ShreyaGautamm)

- Absent.

- [Aaditya Singh](https://github.com/Aaditya-Singh78)

- Fossology installation done. created a pull request to enhance cross-platform compatibility and line encoding issues.

- Discussed with mentor to solve other issue with mimetype agent.

- [Abhishek Kumar](https://github.com/abhi-kumar17871)

- Exploring and working on SPDX 3.0 JSON format will be opening the pull request soon.

- [Akash Sah](https://github.com/Akashsah2003)

- I was looking into ways to store the license expressions. Found Abstract syntax as one way. i have shared the document to mentors to take a call.

- [Divij Sharma](https://github.com/dvjsharma)

- No major updates i was looking into ways to implement OAUTH via API.

- [Rajul Jha](https://github.com/rajuljha)

- In call with mentors we have discussed on general milestone for CI/CD project. Working on line number extraction and differencial scans.

- [Valens Niyonsenga](https://github.com/valens200)

- I Dont have many updates i have gone through the idea regarding Monk based text scanning. need to discuss further on how to start.

153 changes: 153 additions & 0 deletions docs/2024/scheduler/updates/2024-05-31.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
---
title: OVERHAULING SCHEDULER DESIGN (Discussion)
author: Aaditya Singh
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2024 Aditya Singh <singh.aaditya889@gmail.com>
-->

# Meeting 2

*(May 31,2024)*

## Attendees:

- [Anupam Ghosh](https://github.com/ag4ums)

- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)

- [Gaurav Mishra](https://github.com/GMishx)

- [Kaushlendra Pratap](https://github.com/Kaushl2208)

- [Avinal](https://github.com/avinal)

- [Aaditya Singh](https://github.com/Aaditya-Singh78)

## Discussion:

### Contributor

- [Aaditya Singh](https://github.com/Aaditya-Singh78): Reading through the Job Scheduler [Documentation](https://github.com/fossology/fossology/wiki/Job-Scheduler) led to the frame current Scheduler.
**Current Scheduler Design**
---

The Schematic frame out of Documentation illustrates about architectural overview of Job Scheduler.

![currentDesign](../static/img/currentScheduler_Design.png)

*Architectural Overview:*

It is based on Client-Server model, where:

1. **Scheduler Port:** It serves as the communication hub for clients and the scheduler.
2. **Main Thread:** It is responsible for job queuing, job allocation, and event management.
3. **Worker Threads:** They are implemented to handle specific tasks asynchronously, increasing throughput and reducing response times.

*Key Components:*

1. **Scheduler:** Coordinates all job scheduling operations and serves as the entry point for job requests.
2. **Asynchronous Workers:** Handle tasks in parallel, significantly improving processing time.
3. **Logging and Monitoring:** Integrated logging system for real-time monitoring and troubleshooting.

### Does the Current Scheduler Design align with Job Scheduler [Documentation](https://github.com/fossology/fossology/wiki/Job-Scheduler)?

#### Mentor
- [Gaurav Mishra](https://github.com/GMishx): Yes, it aligns with current architecture of job scheduler.


**New Scheduler Design**
---

The revised schematic of Job Scheduler design focuses on an efficient multithreaded approach.that illustrates architectural overview of Job Scheduler.

![AlteredDesign](../static/img/AlteredScheduler_Design.png)

*Architectural Overiew:*

1. **Main Thread:** Coordinates with various components of the scheduler and ensures that tasks are handed over to the appropriate threads for execution.

2. **Worker Thread:** Handles interactions between different agents (or services) that are part of the scheduler. This thread ensures that all components are synchronized and operate without conflicts.

3. **Event Queue:** Handles all system-level events and ensures proper event handling and error logging.

*Key Components:*

1. **Scheduler:** Acts as the central command that receives tasks from the client. It uses a round-robin technique for managing tasks, ensuring a fair and efficient distribution of CPU time among tasks.

2. **Queue Storage:** This component is responsible for holding the tasks before they are processed. It operates under the FIFO (First In, First Out) principle but is managed dynamically to adapt to varying workload conditions.


> ⚠️ **Disclaimer**: Discovered discrepancies through post-meeting analysis.

**Trade-off**
---

|*Old Job Scheduler* |*New Job Scheduler* |
|--------------------|--------------------|
Easier to Maintain | Difficult to Maintain |
Centralised Monitoring | De-Centralised Monitoring |
Static resource management | Dynamic resource management |


### What are the benefits of New Scheduler Design? Will it be Effective?

#### Mentor
- [Gaurav Mishra](https://github.com/GMishx):
Given the complexity of the project's tasks, a simple time-based Round-robin setup might not be enough. A finite state machine architecture would likely be more suitable as it facilitates easier tracking and management of the project. This approach is worth considering.

- [Anupam Ghosh](https://github.com/ag4ums): What specific requirements or functionalities are you looking to address with the job scheduler?


- [Gaurav Mishra](https://github.com/GMishx): A significant issue that could be opened as project idea for a scheduler project concern involves no support nuanced control over job execution, specifically for jobs that should run **mutually exclusive** per upload but end up blocking other unrelated tasks. This flaw can cause inefficiencies in job processing and **resource allocation**.

**Introduced idea of Utilising Bucket**

The approach I've suggested of creating dedicated queues or "buckets" for each upload, where each one manages its priority tasks effectively. This structure enables precise task management like "nomos" or "copyright" for specific uploads. There is also a plan for a universal queue that handles tasks not linked to any specific upload, such as routine maintenance, ensuring these operations are executed without disrupting the individual upload processes.

![bucket](../static/img/bucket.png)

##### Reference:

> ⚠️ **Disclaimer:** Learned additional details about wfx after the meeting.


###### **[WFX](https://github.com/siemens/wfx)**:
Worflow Executor by [Siemens](https://github.com/siemens/) is a versatile, lightweight workflow executor optimized for managing jobs as part of a workflow system. It is designed around a model where workflows are treated as finite-state machines, and it can dynamically manage and progress these workflows in coordination with client systems. Here are the

Key features relevant to a scheduler system:

1. State Management: wfx manages workflows as state machines, ensuring that each job progresses through its defined states efficiently.

2. Modularity: Its core is designed to be compact yet scalable, allowing it to handle diverse workflows without becoming cumbersome.

3. Cross-Platform Efficiency: Developed in Go, wfx offers high performance and is compatible with multiple operating systems, making it suitable for various deployment environments.

4. Dynamic Configuration: Workflows can be loaded or unloaded at runtime, supporting dynamic changes to the scheduler without downtime.

5. Persistence Support: Includes built-in support for databases like SQLite, PostgreSQL, and MySQL, which is crucial for maintaining the state and history of workflows over time.

### How to Start Working on FOSSology Issue [#2742](https://github.com/fossology/fossology/issues/2742): Incorrect MIME Type Recognition for Text Files?

The issue [#2742](https://github.com/fossology/fossology/issues/2742) in the FOSSology project addresses a problem with incorrect MIME type recognition for text files. The issue is centered around the system's handling of MIME types, which are not being accurately recognized or applied, potentially affecting file handling and operations within FOSSology.

#### Mentor
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd): Found error, as user have selected the option to "Ignore SCM files (Git, SVN, TFS) and files with particular Mimetype," which causes these files to be omitted accordingly. Conversely, this functionality will be inactive if the "Ignore SCM files (Git, SVN, TFS) and files with particular Mimetype" option is not enabled.

### [Wfx](https://github.com/siemens/wfx) Project Approach

#### Mentor

- [Kaushlendra Pratap](https://github.com/Kaushl2208): Expressed interest in holding an internal discussion about the [wfx](https://github.com/siemens/wfx) project approach.

### Mentor
- [Anupam Ghosh](https://github.com/ag4ums):
Could you share what key achievements you expect to reach this week?

### Contributor

- [Aaditya Singh](https://github.com/Aaditya-Singh78): Milestone to Achieve within this week are:
1. Re-write code from written c programming language to Go Programming Language.
2. Queue Implementation in Golang.
Loading