Getting Started with mlx-lm for Python

1 Introduction to mlx-lm

mlx-lm is a powerful Python library designed for running and fine-tuning large language models (LLMs) on Apple silicon. It is built on top of Apple’s MLX framework, which is optimized for high-performance machine learning on Apple hardware. mlx-lm makes it easy to download, run, and customize a wide range of LLMs.

This guide will walk you through the process of setting up your environment, installing the necessary packages, and running a pre-trained model.

2 Environment Setup

We will use uv to create a virtual environment and manage our dependencies.

Code

# Initialize a new virtual environment
uv init

Code

# Install the required packages
uv add mlx-lm nbclient nbformat

Let’s verify the Python version we are using.

Code

import sys
print(sys.version)

3 Running a Pre-trained Model

Now that our environment is set up, we can download and run a pre-trained LLM.

3.1 Load the Model and Tokenizer

We will use the load function from mlx_lm to download and load the model and its corresponding tokenizer. In this example, we are using a 4-bit quantized version of the Mistral-7B-Instruct-v0.3 model from the mlx-community Hugging Face organization.

Code

from mlx_lm import load, generate

# Load the pre-trained model and tokenizer
model, tokenizer = load("mlx-community/Mistral-7B-Instruct-v0.3-4bit")

3.2 Generate Text

Once the model and tokenizer are loaded, we can use the generate function to generate text based on a prompt.

Code

# Define the prompt
prompt = "Write a story about Einstein"

# Format the prompt using the chat template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

# Generate the text
text = generate(model, tokenizer, prompt=formatted_prompt, verbose=True)
text

4 Conclusion

mlx-lm provides a simple and efficient way to work with large language models on Apple silicon. Its integration with the MLX framework ensures optimal performance, making it an excellent choice for developers and researchers who want to leverage the power of LLMs on their Apple devices.

5 References

--- title: "Getting Started with mlx-lm for Python" execute: warning: false error: false eval: false format: html: toc: true toc-location: right code-fold: show code-tools: true number-sections: true code-block-bg: true code-block-border-left: "#31BAE9" --- ![](images/clipboard-2481905032.png) # Introduction to mlx-lm `mlx-lm` is a powerful Python library designed for running and fine-tuning large language models (LLMs) on Apple silicon. It is built on top of Apple's MLX framework, which is optimized for high-performance machine learning on Apple hardware. `mlx-lm` makes it easy to download, run, and customize a wide range of LLMs. This guide will walk you through the process of setting up your environment, installing the necessary packages, and running a pre-trained model. # Environment Setup We will use `uv` to create a virtual environment and manage our dependencies. ```{python} #| eval: false # Initialize a new virtual environment uv init ``` ```{python} #| eval: false # Install the required packages uv add mlx-lm nbclient nbformat ``` Let's verify the Python version we are using. ```{python} import sys print(sys.version) ``` # Running a Pre-trained Model Now that our environment is set up, we can download and run a pre-trained LLM. ## Load the Model and Tokenizer We will use the `load` function from `mlx_lm` to download and load the model and its corresponding tokenizer. In this example, we are using a 4-bit quantized version of the Mistral-7B-Instruct-v0.3 model from the `mlx-community` Hugging Face organization. ```{python} from mlx_lm import load, generate # Load the pre-trained model and tokenizer model, tokenizer = load("mlx-community/Mistral-7B-Instruct-v0.3-4bit") ``` ## Generate Text Once the model and tokenizer are loaded, we can use the `generate` function to generate text based on a prompt. ```{python} # Define the prompt prompt = "Write a story about Einstein" # Format the prompt using the chat template messages = [{"role": "user", "content": prompt}] formatted_prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) # Generate the text text = generate(model, tokenizer, prompt=formatted_prompt, verbose=True) text ``` # Conclusion `mlx-lm` provides a simple and efficient way to work with large language models on Apple silicon. Its integration with the MLX framework ensures optimal performance, making it an excellent choice for developers and researchers who want to leverage the power of LLMs on their Apple devices. # References - [mlx-lm on GitHub](https://github.com/ml-explore/mlx-lm) - [mlx-vlm on GitHub](https://github.com/Blaizzy/mlx-vlm)