Kolmogorov-Arnold Networks (KANs)

Kolmogorov-Arnold Networks (KANs)

This is the github repo for the paper "KAN: Kolmogorov-Arnold Networks". Find the documentation here. Here's author's note responding to current hype of KANs.

Kolmogorov-Arnold Networks (KANs) are promising alternatives of Multi-Layer Perceptrons (MLPs). KANs have strong mathematical foundations just like MLPs: MLPs are based on the universal approximation theorem, while KANs are based on Kolmogorov-Arnold representation theorem. KANs and MLPs are dual: KANs have activation functions on edges, while MLPs have activation functions on nodes. This simple change makes KANs better (sometimes much better!) than MLPs in terms of both model accuracy and interpretability. A quick intro of KANs here.

Accuracy

KANs have faster scaling than MLPs. KANs have better accuracy than MLPs with fewer parameters.

Example 1: fitting symbolic formulas

Example 2: fitting special functions

Example 3: PDE solving

Example 4: avoid catastrophic forgetting

Interpretability

KANs can be intuitively visualized. KANs offer interpretability and interactivity that MLPs cannot provide. We can use KANs to potentially discover new scientific laws.

Example 1: Symbolic formulas

Example 2: Discovering mathematical laws of knots

Example 3: Discovering physical laws of Anderson localization

Example 4: Training of a three-layer KAN

Installation

There are two ways to install pykan, through pypi or github.

Installation via github

git clone https://github.com/KindXiaoming/pykan.git
cd pykan
pip install -e .

Installation via pypi

pip install pykan

Requirements

# python==3.9.7
matplotlib==3.6.2
numpy==1.24.4
scikit_learn==1.1.3
setuptools==65.5.0
sympy==1.11.1
torch==2.2.2
tqdm==4.66.2

To install requirements:

pip install -r requirements.txt

Computation requirements

Examples in tutorials are runnable on a single CPU typically less than 10 minutes. All examples in the paper are runnable on a single CPU in less than one day. Training KANs for PDE is the most expensive and may take hours to days on a single CPU. We use CPUs to train our models because we carried out parameter sweeps (both for MLPs and KANs) to obtain Pareto Frontiers. There are thousands of small models which is why we use CPUs rather than GPUs. Admittedly, our problem scales are smaller than typical machine learning tasks, but are typical for science-related tasks. In case the scale of your task is large, it is advisable to use GPUs.

Documentation

The documentation can be found here.

Tutorials

Quickstart

Get started with hellokan.ipynb notebook.

More demos

More Notebook tutorials can be found in tutorials.

Citation

@misc{liu2024kan,
      title={KAN: Kolmogorov-Arnold Networks}, 
      author={Ziming Liu and Yixuan Wang and Sachin Vaidya and Fabian Ruehle and James Halverson and Marin Soljačić and Thomas Y. Hou and Max Tegmark},
      year={2024},
      eprint={2404.19756},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Contact

If you have any questions, please contact zmliu@mit.edu

Author's note

I would like to thank everyone who's intereted in KANs. When I designed KANs and wrote codes, I have math & physics examples (which are quite small scale!) in mind, so did not consider much optimization in efficiency or reusability. It's so honored to receive this unwarranted attention, which is way beyond my expectation. So I accept any crticism from people complaning about the efficiency and resuability of the codes, my apology. My only hope is that you find model.plot() fun to play with :).

For users who are interested in scientific discoveries and scientific computing (the orginal users intended for), I'm happy to hear your applications and collaborate. This repo will continue remaining mostly for this purpose, probably without signifiant updates for efficiency. In fact, there are already implmentations like efficientkan or fouierkan that look promising for improving efficiency.

For users who are machine learning focus, I have to be honest that KANs are likely not a simple plug-in that can be used out-of-the box (yet). Hyperparameters need tuning, and more tricks special to your applications should be introduced. For example, GraphKAN suggests that KANs should better be used in latent space (need embedding and unembedding linear layers after inputs and before outputs). KANRL suggests that some trainable parameters should better be fixed in reinforcement learning to increase training stability.

The most common question I've been asked lately is whether KANs will be next-gen LLMs. I don't have good intuition about this. KANs are designed for applications where one cares about high accuracy and/or interpretability. We do care about LLM interpretability for sure, but interpretability can mean wildly different things for LLM and for science. Do we care about high accuracy for LLMs? I don't know, scaling laws seem to imply so, but probably not too high precision. Also, accuracy can also mean different things for LLM and for science. This subtlety makes it hard to directly transfer conclusions in our paper to LLMs, or machine learning tasks in general. However, I would be very happy if you have enjoyed the high-level idea (learnable activation functions on edges, or interacting with AI for scientific discoveries), which is not necessariy the future, but can hopefully inspire and impact many possible futures. As a physicist, the message I want to convey is less of "KANs are great", but more of "try thinking of current architectures critically and seeking fundamentally different alternatives that can do fun and/or useful stuff".

I would like to welcome people to be critical of KANs, but also to be critical of critiques as well. Practice is the only criterion for testing understanding (实践是检验真理的唯一标准). We don't know many things beforehand until they are really tried and shown to be succeeding or failing. As much as I'm willing to see success mode of KANs, I'm equally curious about failure modes of KANs, to better understand the boundaries. KANs and MLPs cannot replace each other (as far as I can tell); they each have advantages in some settings and limitations in others. I would be intrigued by a theoretical framework that encompasses both and could even suggest new alternatives (physicists love unified theories, sorry :).

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.github/workflows		.github/workflows
.ipynb_checkpoints		.ipynb_checkpoints
docs		docs
kan		kan
pykan.egg-info		pykan.egg-info
tutorials		tutorials
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hellokan.ipynb		hellokan.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kolmogorov-Arnold Networks (KANs)

Accuracy

Interpretability

Installation

Computation requirements

Documentation

Tutorials

Citation

Contact

Author's note

About

Releases

Packages

Languages

License

slainhedden/pykan

Folders and files

Latest commit

History

Repository files navigation

Kolmogorov-Arnold Networks (KANs)

Accuracy

Interpretability

Installation

Computation requirements

Documentation

Tutorials

Citation

Contact

Author's note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages