Running AI model on local machine with Ollama, huggingface and more

1.Ollama

Download and install the Ollama app

https://ollama.com/download

and open the app on computer

Run LLM model on Ollama

download pacakge check connection

Code

pak::pak("ollamar")
pak::pkg_deps_tree("ollamar")

Code

library(ollamar)
test_connection()

download model

Code

ollamar::pull("llama3.1")

list downloaded model

Code

list_models()

show model detail

Code

ollamar::show("gemma3")

run model

Code

resp <- generate("gemma3", "tell me a 5-word story")
resp

Code

# get just the text from the response object
resp_process(resp, "text")

Code

# get the text as a tibble dataframe
resp_process(resp, "df")

using multiple models

Code

(list_models())$name

Code

models_name=(list_models())$name[-1]
models_name

Code

input_prompt="tell me a 5-word story"

Code

all_model=c()

for (i in models_name){
  resp <- generate(i, input_prompt)
  #print(paste0("Model: ", i))
  print(resp_process(resp, "text"))
  #resp_process(resp, "df")
  all_model=rbind(all_model, resp_process(resp, "df"))
}

Code

all_model

Code

!ollama pull llama3.1

Code

!ollama run llama3.1 "tell me a 5-word story"

Run in Python

install package

Code

!pip install ollama

local pacakge

Code

import json
import pandas as pd
from pandas import json_normalize


from ollama import chat
from ollama import ChatResponse
import ollama

download model

Code

#ollama.pull('llama3.2:1b')

list all download model

Code

ollama_model=ollama.list()

Code

# Extracting data from the ListResponse
data = []
for model in ollama_model.models:
    model_data = {
        'model': model.model,
        'modified_at': model.modified_at,
        'digest': model.digest,
        'size': (model.size/1000000000),
        'parent_model': model.details.parent_model,
        'format': model.details.format,
        'family': model.details.family,
        'families': model.details.families,
        'parameter_size': model.details.parameter_size,
        'quantization_level': model.details.quantization_level
    }
    data.append(model_data)

# Convert the list of dictionaries into a pandas DataFrame
ollama_model_df = pd.DataFrame(data)

# Show the DataFrame
print(ollama_model_df)

slow model detail

Code

ollama.show('deepseek-r1:7b-qwen-distill-q4_K_M')

delete model

Code

#ollama.delete('llama3.2:1b')

run model

Code

response: ChatResponse=ollama.chat(model='deepseek-r1:7b-qwen-distill-q4_K_M', messages=[
  {'role': 'system', 
  'content': '你是一个诗人，你只能输出中文'},
  
  {'role': 'assistant', 
  'content': ''},
  
  {'role': 'user', 
  'content': 'give me a 3 lines story'}
  ])

Code

print(response.message.content)

Code

response: ChatResponse =ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

Code

print(response.message.content)

create model

Code

ollama.create(model='example_model', from_='llama3.2', system="You are Mario from Super Mario Bros.")

push model to ollama

Code

ollama.push('user/example_model')

2.hugging face

Run in Python with hugging face transformer

DeepSeek-R1-Distill-Qwen-1.5B as example:

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

using pipeline

Code

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

messages = [
    {"role": "user", "content": "Who are you?"},
]


pipe(messages)

Load model directly

Code

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

dir_peline = pipeline("text-generation", model=model, tokenizer=tokenizer)

text = "my text for named entity recognition here."

dir_peline(text)

Terminal

https://github.com/allenai/olmocr

https://github.com/gradio-app/gradio

https://www.youtube.com/watch?v=XF3Q_ZjwfaI

runing mlx_whisper as example

Code

command=paste0("mlx_whisper '",file_name,"' --model mlx-community/whisper-turbo --language 'Chinese' --initial-prompt '以下是普通話的句子,請以繁體輸出'")

command

Code

import os

command
os.system(command)

vLLM

mall pacakge

https://mlverse.github.io/mall/

--- title: "本地运行AI模型" subtitle: "Run AI model on local machine" author: "Tony D" date: "2025-03-18" categories: - AI - R - Python image: "images/1719563355.png" execute: warning: false error: false eval: false --- Running AI model on local machine with Ollama, huggingface and more # 1.Ollama ![](images/clipboard-3590619259.png) ## Download and install the Ollama app https://ollama.com/download and open the app on computer ## Run LLM model on Ollama ::: panel-tabset ### Run in R with ollamar pacakge #### download pacakge check connection ```{r} pak::pak("ollamar") pak::pkg_deps_tree("ollamar") ``` ```{r} library(ollamar) test_connection() ``` download model ```{r} ollamar::pull("llama3.1") ``` list downloaded model ```{r} list_models() ``` show model detail ```{r} ollamar::show("gemma3") ``` run model ```{r} resp <- generate("gemma3", "tell me a 5-word story") resp ``` ```{r} # get just the text from the response object resp_process(resp, "text") ``` ```{r} # get the text as a tibble dataframe resp_process(resp, "df") ``` using multiple models ```{r} (list_models())$name ``` ```{r} models_name=(list_models())$name[-1] models_name ``` ```{r} input_prompt="tell me a 5-word story" ``` ```{r} all_model=c() for (i in models_name){ resp <- generate(i, input_prompt) #print(paste0("Model: ", i)) print(resp_process(resp, "text")) #resp_process(resp, "df") all_model=rbind(all_model, resp_process(resp, "df")) } ``` ```{r} all_model ``` ### Run in R with ellmer package ### Run in terminal ```{python} !ollama pull llama3.1 ``` ```{python} !ollama run llama3.1 "tell me a 5-word story" ``` #### Run in Python install package ```{python} !pip install ollama ``` local pacakge ```{python} import json import pandas as pd from pandas import json_normalize from ollama import chat from ollama import ChatResponse import ollama ``` download model ```{python} #ollama.pull('llama3.2:1b') ``` list all download model ```{python} ollama_model=ollama.list() ``` ```{python} # Extracting data from the ListResponse data = [] for model in ollama_model.models: model_data = { 'model': model.model, 'modified_at': model.modified_at, 'digest': model.digest, 'size': (model.size/1000000000), 'parent_model': model.details.parent_model, 'format': model.details.format, 'family': model.details.family, 'families': model.details.families, 'parameter_size': model.details.parameter_size, 'quantization_level': model.details.quantization_level } data.append(model_data) # Convert the list of dictionaries into a pandas DataFrame ollama_model_df = pd.DataFrame(data) # Show the DataFrame print(ollama_model_df) ``` slow model detail ```{python} ollama.show('deepseek-r1:7b-qwen-distill-q4_K_M') ``` delete model ```{python} #ollama.delete('llama3.2:1b') ``` run model ```{python} response: ChatResponse=ollama.chat(model='deepseek-r1:7b-qwen-distill-q4_K_M', messages=[ {'role': 'system', 'content': '你是一个诗人，你只能输出中文'}, {'role': 'assistant', 'content': ''}, {'role': 'user', 'content': 'give me a 3 lines story'} ]) ``` ```{python} print(response.message.content) ``` ```{python} response: ChatResponse =ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}]) ``` ```{python} print(response.message.content) ``` create model ```{python} ollama.create(model='example_model', from_='llama3.2', system="You are Mario from Super Mario Bros.") ``` push model to ollama ```{python} ollama.push('user/example_model') ``` ::: # 2.hugging face ![](images/clipboard-1081827289.png) ::: panel-tabset ## Run in Python with hugging face transformer DeepSeek-R1-Distill-Qwen-1.5B as example: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B ### using pipeline ```{python} # Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages) ``` ### Load model directly ```{python} # Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B") model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B") dir_peline = pipeline("text-generation", model=model, tokenizer=tokenizer) text = "my text for named entity recognition here." dir_peline(text) ``` ::: # Terminal ::: panel-tabset ## run in Terminal https://github.com/allenai/olmocr https://github.com/gradio-app/gradio https://www.youtube.com/watch?v=XF3Q_ZjwfaI ## run in R code runing mlx_whisper as example ```{r} command=paste0("mlx_whisper '",file_name,"' --model mlx-community/whisper-turbo --language 'Chinese' --initial-prompt '以下是普通話的句子,請以繁體輸出'") command ``` ## run in Python code ```{python} import os command os.system(command) ``` ::: # vLLM # mall pacakge https://mlverse.github.io/mall/