Getting Started with mlx-lm for Python

1 Introduction to mlx-lm

mlx-lm is a powerful Python library designed for running and fine-tuning large language models (LLMs) on Apple silicon. It is built on top of Apple’s MLX framework, which is optimized for high-performance machine learning on Apple hardware. mlx-lm makes it easy to download, run, and customize a wide range of LLMs.

This guide will walk you through the process of setting up your environment, installing the necessary packages, and running a pre-trained model.

2 Environment Setup

We will use uv to create a virtual environment and manage our dependencies.

Code
# Initialize a new virtual environment
uv init
Code
# Install the required packages
uv add mlx-lm nbclient nbformat

Let’s verify the Python version we are using.

Code
import sys
print(sys.version)

3 Running a Pre-trained Model

Now that our environment is set up, we can download and run a pre-trained LLM.

3.1 Load the Model and Tokenizer

We will use the load function from mlx_lm to download and load the model and its corresponding tokenizer. In this example, we are using a 4-bit quantized version of the Mistral-7B-Instruct-v0.3 model from the mlx-community Hugging Face organization.

Code
from mlx_lm import load, generate

# Load the pre-trained model and tokenizer
model, tokenizer = load("mlx-community/Mistral-7B-Instruct-v0.3-4bit")

3.2 Generate Text

Once the model and tokenizer are loaded, we can use the generate function to generate text based on a prompt.

Code
# Define the prompt
prompt = "Write a story about Einstein"

# Format the prompt using the chat template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

# Generate the text
text = generate(model, tokenizer, prompt=formatted_prompt, verbose=True)
text

4 Conclusion

mlx-lm provides a simple and efficient way to work with large language models on Apple silicon. Its integration with the MLX framework ensures optimal performance, making it an excellent choice for developers and researchers who want to leverage the power of LLMs on their Apple devices.

5 References