# AI local

why we need to run AI on our own computers?

speed: local proessing reduces latency, making devices more responsive
privary: data stays on the device, enhancing user privacy
offline: devices can function without internet connectivity, providing consistent performance

# meta llama 3.1

# setup

macbook pro: 15.0 Beta, m2 max chip, 96GB

we will be using ollamap, so the first step is to install and run it

open terminal and run allama pull llama3 to download the 8B chat model, with a size of about 4.7GB

or run allama pull llama3:70b about 39GB

# runing

run ollama run llama3 or ollama run llama3:70b

you can to ask question and chart with llama3 model

# used memory

llama 8b

befor 8b

after 8b

answer 8b

llama 70b

befor 70b

after 70b

answer 70b

# open webui

The terminal is not convenient, so we need a ui

if the llama is on your computer, you use this command

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

open website http://localhost:3000/

# unsloth

fine tuning

# step 1


[
    {
        "instruction":"你是谁",
        "input": "",
        "output": "我是王东东"
    }
]

For example Fine-tuning.json

# step 2

open https://github.com/unslothai/unsloth

find Finetune for Free

click Llama 3.1 (8B) Start for free

# step 3

website on the left have a files, create a folder data

upload json file Fine-tuning.json

find data Prep

dataset = load_dataset("yahma/alpaca-cleaned", split = "train") replace dataset = load_dataset("/content/data/", split = "train")

unsloth_data_prep

# step 4

find GGUF / llama.cpp Conversion


# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

replace


if Ture: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

unsloth_gguf

# step 5

not required，but i suggest you do it（why？because direct download is too slow）

website on the left have a Mount Drive, click it

scroll to code location and click + Code


import shutil
local = '/content/model-unsloth-Q4_K_M.gguf'
google_drive = '/content/drive/My Drive/model-unsloth-Q4_K_M.gguf'
shutil.copy(local, google_drive)
print('copy success')

unsloth_google_drive

# step 6

select Runtime > Run all, wait about a half hours

unsloth_file_path

go to google drive website and click my drive. find model-unsloth-Q4_K_M.gguf and dowload

warning： step 5 will be a privacy pop-up window

# other

Ollamap website (opens new window)

meta llama (opens new window)

Calculating GPU memory for serving LLMs (opens new window)

open webUI github (opens new window)

open webUI document (opens new window)

unsloth (opens new window)

← docker proxy Model Context Protocol (MCP) →