Skip to content

A NPM CLI app that converts complex project directory structures and files into a streamlined file (or set of flat files), optimized for processing with In-Context Learning (ICL) prompts; prompts are included.

License

Notifications You must be signed in to change notification settings

samestrin/llm-prepare

Repository files navigation

llm-prepare

Star on GitHub Fork on GitHub Watch on GitHub

Version 1.0.18 License: MIT Built with Node.js

llm-prepare converts complex project directory structures and files into a single flat file or set of flat files, facilitating processing using In-Context Learning (ICL) prompts.

This Node.js tool recursively scans a project directory based on provided arguments (at least a directory). It generates a simplified layout view that includes all directories and matching files. Additionally, it combines this layout view with the aggregated text content of the entire project. By default, the aggregated file content is stripped of comments and unnecessary whitespace. The tool supports output compression to reduce token use and can handle large projects by chunking the output. Example prompts are included for guidance.

What is In-Context Learning (ICL)

In-Context Learning (ICL) enables a model to perform tasks by interpreting context provided within a prompt, eliminating the need for additional training or fine-tuning.

Learn more about In-Context Learning (ICL)

v0.1.8

  • Configurable via JSON: Use the --config option to load a JSON configuration file containing pre-defined arguments and paths to include.

Features

  • Layout View: Provides an ASCII layout view of your project.
  • Directory Traversal: Recursively scan through the project directory.
  • Custom File Filtering: Include files based on specified patterns.
  • Ignore Support: Automatically respects .ignore files to exclude specific files or directories.
  • Output Consolidation: Generates a single flat file consolidated view of file contents and directory structure.
  • Multifile Output: Generates multiple flat files from a consolidated view of file contents and directory structure based on a provided chunk size.
  • Optionally Remove Layout View: Optionally remove the layout view from the output.
  • Optionally Include Comments: Optionally include comments in the output.

Example Prompts

  • Code Review: Interactive code review with a simulated senior software engineer.
  • Generate MySQL Create Table: Generate a MySQL CREATE TABLE statement based on your provided CSV content.
  • Question and Answer: Interactive question and answer session powered by your project code.
  • Readme Generation: A simulated senior technical writer generates a README.md based on your project code.
  • Simple Add Comments: A set of simple prompts that generate comments based on your project code (C#, Javascript, PHP, Python, Ruby, Rust, and TypeScript).
  • Technical Document Generation: A simulated senior technical writer generates technical documentation based on your project code.
  • Test Generation: Interactive test generation with a simulated senior software engineer and simulated QA.
Plus many more (including new CSV oriented prompts). All example prompts have been tested with ChatGPT GPT-4.

Dependencies

  • Node.js: The script runs in a Node.js environment.
  • fs-extra: An extension of the standard Node.js fs module, providing additional methods and promise support.
  • ignore: Used to handle .ignore files similar to .gitignore.
  • istextorbinary: Determines whether a given file contains text or binary data.
  • open: Opens URLs in your default browser.
  • yargs: Helps in building interactive command line tools, by parsing arguments and generating an elegant user interface.
  • yargs/helpers: Provides utility methods for yargs.

Installing Node.js

Before installing, ensure you have Node.js and npm (Node Package Manager) installed on your system. You can download and install Node.js from Node.js official website.

Installing llm-prepare

To install and use llm-prepare, follow these steps:

Clone the Repository: Begin by cloning the repository containing the llm-prepare to your local machine.

git clone https://github.com/samestrin/llm-prepare/

Navigate to your project's root directory and run:

npm install

To make llm-prepare available from any location on your system, you need to install it globally. You can do this using npm.

Run the following command in your project directory:

npm link

This will create a global symlink to your script. Now, you can run the script using llm-prepare from anywhere in your terminal.

Platform-Specific Installation Instructions

macOS and Linux

The provided installation steps should work as-is for both macOS and Linux platforms.

Windows

For Windows, ensure that Node.js is added to your PATH during the installation. The npm link command should also work in Windows PowerShell or Command Prompt, allowing you to run the script globally.

Usage

To run the script, you need to provide one mandatory argument: the path to the project directory (--path-name or -p).

Common Usage Examples:

Basic Usage:

This will process all files in the specified project directory, respecting any .ignore files, and output the consolidated content and structure to your console. Defaults the file pattern to "*"

llm-prepare -p "/path/to/project"

or

llm-prepare --path "/path/to/project"

Specify a File Pattern:

This will process all JavaScript files in the specified project directory, respecting any .ignore files, and output the consolidated content and structure to your console.

llm-prepare -p "/path/to/project" -f "*.js"

Specify an Output Filename:

This will process all files in the specified project directory, respecting any .ignore files, and output the consolidated content and structure to output.txt.

llm-prepare -p "/path/to/project" -o "output.txt"

If you don't specific a filename, this will process all files in the specified project directory, respecting any .ignore files, and output the consolidated content and structure to project.txt. The filename is auto-generated based on the top level directory in the path-name variable.

llm-prepare -p "/path/to/project" -o

You may optionally set the LLM_PREPARE_OUTPUT_DIR environment variable. If the LLM_PREPARE_OUTPUT_DIR environment variable is set, the output files are written to that directory.

Specify Custom Ignore Rules (Through the command line):

If you don't want to include specific files or directories, you can specify the rules using --custom-ignore-string.

llm-prepare -p "/path/to/project" -o --custom-ignore-string "*docs*,*test*"

Specify Custom Ignore Rules (Using a file):

If you don't want to include specific files or directories, you can specify the rules using an external and --custom-ignore-filename. Use .gitignore file formatting.

llm-prepare -p "/path/to/project" -o --custom-ignore-filename "/path/to/.ignorefile"

Specify Custom Ignore Rules (Using a file):

If you don't want to include specific files or directories, you can specify the rules using an external and --custom-ignore-filename. Use .gitignore file formatting.

llm-prepare -p "/path/to/project" -o --custom-ignore-filename "/path/to/.ignorefile"

Using a Configuration File:

You can use a JSON configuration file to predefine the arguments and paths to include in the processing.

Example config.json file:

{
  "args": {
    "output-filename": "output.txt",
    "compress": true
  },
  "include": ["./src/", "./lib/"]
}

To run the script with a config file:

llm-prepare -c "config.json"

Options

      --help                    Show help                              [boolean]
  -p, --path                    Path to the project directory[string] [required]
  -f, --file-pattern            Pattern of files to include, e.g., '\.js$' or
                                '*' for all files        [string] [default: "*"]
  -o, --output-filename         Output filename                         [string]
  -i, --include-comments        Include comments? (Default: false)     [boolean]
  -c, --compress                Compress? (Default: false)             [boolean]
      --chunk-size              Maximum size (in kilobytes) of each file[number]
  -s, --suppress-layout         Suppress layout in output (Default: false)
                                                                       [boolean]
      --default-ignore          Use a custom default ignore file        [string]
      --ignore-gitignore        Ignore .gitignore file in the root of the
                                project directory                      [boolean]
      --show-default-ignore     Show default ignore file               [boolean]
      --show-prompts            Show example prompts in your browser   [boolean]
      --custom-ignore-string    Comma-separated list of ignore patterns [string]
      --custom-ignore-filename  Path to a file containing ignore patterns
                                                                        [string]
      --config                  Path to the config file                 [string]
  -v, --version                 Display the version number             [boolean]

In-Context Learning (ICL)

"While finetuning with full datasets is still a powerful option if the data vastly exceeds the context length, our results suggest that long-context ICL is an effective alternative– trading finetuning-time cost for increased inference-time compute. As the effectiveness and efficiency of using very long model context lengths continues to increase, we believe long-context ICL will be a powerful tool for many tasks."

In-Context Learning (ICL) allows a Large Language Model (LLM) to perform tasks by interpreting the context provided within the prompt without additional training or fine-tuning. This approach differs significantly from previous methods where models were explicitly trained on a specific task using vast datasets. Instead, ICL leverages the model's pre-trained knowledge base—a comprehensive understanding accumulated during its initial extensive training phase.

As the token size—or the amount of data that an LLM can process and generate in a single instance—has dramatically increased, the value of ICL has become even more significant. This increase in token size allows LLMs to handle longer and more complex inputs and outputs, which enhances their ability to understand and generate sophisticated text.

In-Context Learning (ICL) Prompts

In-Context Learning (ICL) prompts guide a large language model (LLM) in performing tasks by providing relevant context within the input prompt. These prompts typically include examples, instructions, or patterns that help the model understand how to generate appropriate responses. Here are some characteristics and examples of ICL prompts:

Characteristics of ICL Prompts:

  • Contextual Examples: Provide examples within the prompt to demonstrate the desired output format.
  • Detailed Instructions: Offer clear and detailed instructions on how to process the input and generate the output.
  • Patterns and Templates: Use patterns or templates that the model can follow to ensure consistency in the responses.
  • Relevant Data: Include any necessary data or background information that helps the model understand the context of the task.
  • Incremental Guidance: Sometimes, step-by-step guidance is included to lead the model through complex tasks.

Contribute

Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes or improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Share

Twitter Facebook LinkedIn