First principles datasets #181

gAldeia · 2024-09-03T21:58:25Z

Data comes from two symbolic regression repos:

Miles Cranmer's PySR: https://github.com/MilesCranmer/PySR
Etienne Russeil et al.'s MvSR: https://github.com/erusseil/MvSR-analysis

They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available.

While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms.

The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data.

I still need to write proper metadata for them. My understanding is that opening a PR will trigger a GA that will push some new files to my fork, which I should complete before the new datasets go to revision. Please let me know if there is there anything I got wrong and need to update!

Data comes from two symbolic regression repos: - Miles Cranmer's PySR: https://github.com/MilesCranmer/PySR - Etienne Russeil et al.'s MvSR: https://github.com/erusseil/MvSR-analysis They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available. While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms. The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data. I still need to write proper metadata for them.

CI was failing to parse the contents of these specific ones.

Created by https://github.com/gAldeia/pmlb/actions/runs/11616806556\nfrom f23672c on 2024-10-31

gAldeia and others added 3 commits September 3, 2024 18:48

Re-generated broken datasets

f23672c

CI was failing to parse the contents of these specific ones.

update dataset files

42b29f7

Created by https://github.com/gAldeia/pmlb/actions/runs/11616806556\nfrom f23672c on 2024-10-31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First principles datasets #181

First principles datasets #181

gAldeia commented Sep 3, 2024

First principles datasets #181

Are you sure you want to change the base?

First principles datasets #181

Conversation

gAldeia commented Sep 3, 2024