本地运行AI模型

Run AI model on local machine

AI
R
Python
Author

Tony D

Published

March 18, 2025

Running LLM model on local machine with Ollama,huggingface and more

Ollama

Download and install the Ollama app

https://ollama.com/download

and open the app on computer

Run LLM model on Ollama

download pacakge check connection

Code
install.packages("ollamar")
Code
library(ollamar)
test_connection() 

download model

Code
ollamar::pull("llama3.1")

list downloaded model

Code
list_models()

show model detail

Code
#ollamar::show("llama3.1")

run model

Code
resp <- generate("llama3.1", "tell me a 5-word story")
resp
Code
# get just the text from the response object
resp_process(resp, "text")
Code
# get the text as a tibble dataframe
resp_process(resp, "df")

download model

Code
!ollama pull llama3.1

run model

Code
!ollama run llama3.1 "tell me a 5-word story"

Run in Python

Code
!pip install ollama
Code
from ollama import chat
from ollama import ChatResponse
Code
import ollama

download model

Code
ollama.pull("llama3.1")

show downloaded model

Code
ollama.list()

Run model

Code
ollama.chat(model='llama3.1', messages=[{'role': 'user', 'content': 'who are you?'}])

create a model with prompt

Code
ollama.create(model='Mario', from_='llama3.1', system="You are Mario from Super Mario Bros.")
Code
ollama.chat(model='Mario', messages=[{'role': 'user', 'content': 'who are you?'}])

delete model

Code
status = ollama.delete('example')
status

hugging face

Using mlx DeepSeek-R1-4bit as example :https://huggingface.co/mlx-community/DeepSeek-R1-4bit

Using python 3.11

Code
Sys.setenv(RETICULATE_PYTHON = "/Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11")
library(reticulate)
use_python("/Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11")
Code
from platform import python_version
print(python_version())
Code
!pip3.11 install mlx-lm
Code
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/DeepSeek-R1-4bit")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
response

Terminal

runing mlx_whisper as example

Download model

Code
!brew install ffmpeg
!pip install mlx-whisper
Code
!mlx_whisper xxxx.mp3 --model mlx-community/whisper-turbo --language 'Chinese' --initial-prompt '以下是普通話的句子,請以繁體輸出'

mlx_whisper xxxx.mp3 –model mlx-community/whisper-turbo –language ‘Chinese’ –initial-prompt ‘以下是普通話的句子,請以繁體輸出’

Code
command=paste0("mlx_whisper '",file_name,"' --model mlx-community/whisper-turbo --language 'Chinese' --initial-prompt '以下是普通話的句子,請以繁體輸出'")

command
Code
import os

command
os.system(command)

mall pacakge

https://mlverse.github.io/mall/