How to Install and Use GPT OSS Models Locally on Windows or Ubuntu

GPT OSS (Open Source GPT) refers to open-source alternatives to OpenAI’s GPT models. These models are developed by communities and organizations and can be downloaded and run locally — perfect for developers, researchers, or anyone who wants AI offline and under their control.

Popular Open-Source GPT Models

Below is a list of widely used open-source GPT-style models:

Model Publisher Notes
GPT-J EleutherAI 6B parameters, great general-purpose model
GPT-Neo EleutherAI Lightweight models (1.3B, 2.7B)
GPT-NeoX EleutherAI Large-scale 20B model
LLaMA Meta AI High-performance models, includes LLaMA 2 and 3
Mistral Mistral.ai Efficient and powerful newer model
Phi-2 Microsoft Lightweight, runs on CPU or small GPUs
OpenChat, OpenAssistant Community Chat-focused, instruction-tuned models

Recommended Method: Install GPT OSS with Ollama

Ollama is a powerful tool that simplifies installing and running GPT OSS models like LLaMA, Mistral, and more on both Windows and Ubuntu.

Install Ollama

On Ubuntu

curl -fsSL https://ollama.com/install.sh | sh

On Windows

  1. Visit https://ollama.com
  2. Download and run the Windows installer

Run Your First Model

ollama run mistral

Replace mistral with other models like llama2, phi, or llama3.

Useful Commands

ollama list         # List installed models
ollama pull llama3  # Download and install LLaMA 3 model

Alternative: Text Generation Web UI

If you want a customizable interface with more extensions, try Text Generation Web UI.

Installation

  1. Clone the repository:
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
  1. Start the installer:
# On Ubuntu
bash start_linux.sh

# On Windows
start_windows.bat

Then open http://localhost:7860/ in your browser.

Hardware Requirements

Model Size Recommended Hardware
Small (e.g., Phi-2, Mistral 7B) 8–16 GB RAM, optional GPU
Medium (LLaMA 2 13B) 24–32 GB RAM or GPU with ≥12 GB VRAM
Large (20B+) High-end server or cloud instance with ≥40 GB RAM

You can also use quantized models (GGUF) for better performance on limited hardware.

Conclusion

Thanks to projects like Ollama and Text Generation Web UI, it’s now easier than ever to run GPT OSS models locally. Whether you’re building an offline assistant, automating tasks, or experimenting with AI, these tools make powerful language models accessible to everyone.

🔗 Explore Ollama: https://ollama.com

🔗 Browse Models: https://ollama.com/library