本地运行AI模型

Run AI model on local machine

AI
R
Python
Author

Tony D

Published

March 18, 2025

Running AI model on local machine with Ollama, huggingface and more

1.Ollama

Download and install the Ollama app

https://ollama.com/download

and open the app on computer

Run LLM model on Ollama

download pacakge check connection

Code
pak::pak("ollamar")
pak::pkg_deps_tree("ollamar")
Code
library(ollamar)
test_connection() 

download model

Code
ollamar::pull("llama3.1")

list downloaded model

Code
list_models()

show model detail

Code
ollamar::show("gemma3")

run model

Code
resp <- generate("gemma3", "tell me a 5-word story")
resp
Code
# get just the text from the response object
resp_process(resp, "text")
Code
# get the text as a tibble dataframe
resp_process(resp, "df")

using multiple models

Code
(list_models())$name
Code
models_name=(list_models())$name[-1]
models_name
Code
input_prompt="tell me a 5-word story"
Code
all_model=c()

for (i in models_name){
  resp <- generate(i, input_prompt)
  #print(paste0("Model: ", i))
  print(resp_process(resp, "text"))
  #resp_process(resp, "df")
  all_model=rbind(all_model, resp_process(resp, "df"))
}
Code
all_model
Code
!ollama pull llama3.1
Code
!ollama run llama3.1 "tell me a 5-word story"

Run in Python

install package

Code
!pip install ollama

local pacakge

Code
import json
import pandas as pd
from pandas import json_normalize


from ollama import chat
from ollama import ChatResponse
import ollama

download model

Code
#ollama.pull('llama3.2:1b')

list all download model

Code
ollama_model=ollama.list()
Code
# Extracting data from the ListResponse
data = []
for model in ollama_model.models:
    model_data = {
        'model': model.model,
        'modified_at': model.modified_at,
        'digest': model.digest,
        'size': (model.size/1000000000),
        'parent_model': model.details.parent_model,
        'format': model.details.format,
        'family': model.details.family,
        'families': model.details.families,
        'parameter_size': model.details.parameter_size,
        'quantization_level': model.details.quantization_level
    }
    data.append(model_data)

# Convert the list of dictionaries into a pandas DataFrame
ollama_model_df = pd.DataFrame(data)

# Show the DataFrame
print(ollama_model_df)

slow model detail

Code
ollama.show('deepseek-r1:7b-qwen-distill-q4_K_M')

delete model

Code
#ollama.delete('llama3.2:1b')

run model

Code
response: ChatResponse=ollama.chat(model='deepseek-r1:7b-qwen-distill-q4_K_M', messages=[
  {'role': 'system', 
  'content': '你是一个诗人,你只能输出中文'},
  
  {'role': 'assistant', 
  'content': ''},
  
  {'role': 'user', 
  'content': 'give me a 3 lines story'}
  ])
Code
print(response.message.content)
Code
response: ChatResponse =ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
Code
print(response.message.content)

create model

Code
ollama.create(model='example_model', from_='llama3.2', system="You are Mario from Super Mario Bros.")

push model to ollama

Code
ollama.push('user/example_model')

2.hugging face

DeepSeek-R1-Distill-Qwen-1.5B as example:

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

using pipeline

Code
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

messages = [
    {"role": "user", "content": "Who are you?"},
]


pipe(messages)

Load model directly

Code
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

dir_peline = pipeline("text-generation", model=model, tokenizer=tokenizer)

text = "my text for named entity recognition here."

dir_peline(text)

Terminal

https://github.com/allenai/olmocr

https://github.com/gradio-app/gradio

https://www.youtube.com/watch?v=XF3Q_ZjwfaI

runing mlx_whisper as example

Code
command=paste0("mlx_whisper '",file_name,"' --model mlx-community/whisper-turbo --language 'Chinese' --initial-prompt '以下是普通話的句子,請以繁體輸出'")

command
Code
import os

command
os.system(command)

vLLM

mall pacakge

https://mlverse.github.io/mall/