LASSO - an Observatorium for the Dynamic Selection, Analysis and Comparison of Software.
LASSO's platform enables scalable software code analysis and observation of big code
. It provides mass analysis of sofware code combining dynamic and static program analysis techniques to observe functional (i.e., behavioural) and non-functional properties about software code. It is primarily used to conduct active research in software engineering (contact us), but can also be used by practitioners.
Based on these capabilities, LASSO can be used to realize reusable code analysis services using a dedicated pipeline language, LSL (LASSO Scripting Language). This includes services like -
- code search and retrieval (interface-driven code search, test-driven code search)
- N-version assessment based on alternative implementations either retrieved via code search or generated using generative AI
- automated (unit) test generation
- test-driven software experimentation as a service
- benchmarking of tools/techniques
- ...
LASSO's core building blocks consist of several well-defined concepts and data structures
- Sequence Sheet Notation (SSN) - for representing tests (sequences)
- Stimulus Response Matrices (SRM) - for creating input configurations of system and test pairs, and for storing arbitrary observations (inputs/outputs) once executed by the special test driver for mass execution of code called the arena
- Stimulus Response Hypercubes - for enabling offline analysis of runtime observations stored in SRMs using popular data analytics tools
The platform is realized in Java using Spring Boot (https://spring.io/projects/spring-boot), while its architecture is realized on top of Apache Ignite (https://ignite.apache.org/). The platform's architecture, therefore, is distributed by design. It follows the manager/worker architecture style. The platform can be accessed via its website (RESTful API) and a webapp GUI.
There are ways to get started:
- Get started with our quickstart guide (see quickstart.md)
- Read details about LASSO's core concepts, data structures and platform in recent publications (further down)
Read our quickstart.md guide to get started with the LASSO platform.
The scripts of the tool demo can be found in script_examples/benchmarking_llms
A preprint of the tool demo paper is available on arxiv.
The (Groovy) DSL command documentation can be found dsl and examples of the DSL commands can be found in LSLLanguageSystemTest.groovy.
Examples of the language are provided in systemtests.
A comprehensive description of LASSO and its core concepts and data structures is provided in
@phdthesis{madoc64107,
title = {LASSO - an observatorium for the dynamic selection, analysis and comparison of software},
year = {2023},
author = {Marcus Kessel},
address = {Mannheim},
language = {Englisch},
abstract = {Mining software repositories at the scale of 'big code' (i.e., big data) is a challenging activity. As well as finding a suitable software corpus and making it programmatically accessible through an index or database, researchers and practitioners have to establish an efficient analysis infrastructure and precisely define the metrics and data extraction approaches to be applied. Moreover, for analysis results to be generalisable, these tasks have to be applied at a large enough scale to have statistical significance, and if they are to be repeatable, the artefacts need to be carefully maintained and curated over time. Today, however, a lot of this work is still performed by human beings on a case-by-case basis, with the level of effort involved often having a significant negative impact on the generalisability and repeatability of studies, and thus on their overall scientific value.
The general purpose, 'code mining' repositories and infrastructures that have emerged in recent years represent a significant step forward because they automate many software mining tasks at an ultra-large scale and allow researchers and practitioners to focus on defining the questions they would like to explore at an abstract level. However, they are currently limited to static analysis and data extraction techniques, and thus cannot support (i.e., help automate) any studies which involve the execution of software systems. This includes experimental validations of techniques and tools that hypothesise about the behaviour (i.e., semantics) of software, or data analysis and extraction techniques that aim to measure dynamic properties of software.
In this thesis a platform called LASSO (Large-Scale Software Observatorium) is introduced that overcomes this limitation by automating the collection of dynamic (i.e., execution-based) information about software alongside static information. It features a single, ultra-large scale corpus of executable software systems created by amalgamating existing Open Source software repositories and a dedicated DSL for defining abstract selection and analysis pipelines. Its key innovations are integrated capabilities for searching for selecting software systems based on their exhibited behaviour and an 'arena' that allows their responses to software tests to be compared in a purely data-driven way. We call the platform a 'software observatorium' since it is a place where the behaviour of large numbers of software systems can be observed, analysed and compared.},
url = {https://madoc.bib.uni-mannheim.de/64107/}
}
See publications.md for more on LASSO and the platform.
See pipelines.md for a few example LSL pipelines.
A core design principle of the LASSO platform is to conduct extensive software analytics in external, popular data analytics tools. The platform, therefore, stores tracing data and reports in its distributed database using tabular representations.
Use our jupyterlab playground to explore and manipulate SRMs with Python pandas (https://pandas.pydata.org/)
See analytics.md how resulting SRMs can be analyzed in Python and R.
The platform is designed to scale software code analysis and observation for big code.
See distributed.md for instructions how to set up a LASSO cluster.
See development.md for developer details (extension points, system tests etc.).
See known_issues.md for know issues, and security.md for security concerns.
Note: For up-to-date information, have a look at the modules here
pom.xml
- parent POM of all modules (sets global properties and versions)evosuite-maven-plugin
- customized version of EvoSuite's maven pluginrandoop-maven-plugin
- customized version of a Randoop's maven pluginlasso-maven-extension
- LASSO's Maven Spy for maven-based test drivers (reports events etc.)ranking
- Ranking module that offers preference-based ranking of software components based on multiple objectivescrawler
- LASSO's Maven Artifact Crawleranalyzer
- LASSO's Maven Artifact Analyzer (index creation etc.)index-maven-plugin
- LASSO's plug in to index any (built) maven-managed projectscore
- Core module shared with other LASSO modules (mainly contains common data models, interfaces etc.)lql
- LASSO's Query Language (describing interfaces and method signatures)testing-harness
- test generators, test suite minimization etc.datasource-maven
- Query layer for LASSO's executable corpussandbox
- LASSO's sandbox execution environment based on Docker containerizationlsl
- LASSO's domain language (pipeline language)benchmarks
- Temporary module that integrates various (LLM) benchmarksgai
- contains interfaces for generative AI (e.g., OpenAI API etc.)engine
- LASSO's workflow engine and actions APIarena-support
- Arena support module (shared classes)arena
- LASSO Arena module (arena test driver)worker
- LASSO's cluster worker application (web-based using spring-boot)webui
- LASSO's next-generation web application based on Angular 16 and Materialnotebooks
- Jupyterlite distribution to analyze SRMs in browsers as part of the webuiservice
- LASSO's cluster service and manager (web-based using spring-boot)lasso-llm
- Companion module tobenchmarks
(facilities to integrate generated code obtained by MultiPL-E experiment)
Any contributions, including tool/technique integrations, pipeline scripts, improvements etc. are welcome!
To reach out to us, contact Marcus Kessel from the Software Engineering Group @ University of Mannheim (https://www.wim.uni-mannheim.de/atkinson/).
LASSO - an Observatorium for the Dynamic Selection, Analysis and Comparison of Software
Copyright (C) 2024 Marcus Kessel (University of Mannheim) and LASSO contributers
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
A few integrations in LASSO required minor code modifications. The original code is released under following licenses:
- randoop (https://github.com/randoop/randoop) - MIT
- EvoSuite (https://github.com/EvoSuite/evosuite) - LGPL-3.0
- Maven plugin for EvoSuite (https://github.com/EvoSuite/evosuite) - LGPL-3.0
- Maven plugin for randoop (https://github.com/zaplatynski/randoop-maven-plugin) - MIT
- JaCoCo Code Coverage (https://github.com/jacoco/jacoco) - EPL-2.0