GPT Models — From GPT-1 to GPT-4

Name: GridStorm
Availability: InStock

How the GPT series evolved from language modelling to instruction-following assistants via RLHF.

Intermediate · 18 min read

The GPT Evolution

Version	Year	Params	Key Innovation
GPT-1	2018	117M	Pre-train then fine-tune paradigm established
GPT-2	2019	1.5B	Zero-shot transfer — works on unseen tasks
GPT-3	2020	175B	Few-shot in-context learning
InstructGPT	2022	175B	RLHF — human feedback alignment
ChatGPT	2022	~175B	Conversational RLHF fine-tuning
GPT-4	2023	~1.8T MoE	Multimodal, advanced reasoning, 128K context

RLHF — Making GPT Helpful

Flow:

SFT — Fine-tune on human demonstrations
Reward Model — Train to predict human preference
PPO — RL optimizes model against reward model
Aligned Model — Helpful, harmless, honest

from openai import OpenAI

client = OpenAI()

# Basic chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a concise Python expert."},
        {"role": "user",   "content": "Explain list comprehensions in 3 sentences."},
    ],
    temperature=0.3,
    max_tokens=200,
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Part of the NLP & Language Models series on Tekivex. Browse all tutorials or explore our open-source products.