Understand the "Brain" of the operation before we start performing surgery on it.
1. What is an LLM? (No Math)
An LLM (Large Language Model) is not a "knowledge base" or a "search engine." It is a Next-Token Prediction Engine.
The Mental Model: Imagine the world's best auto-complete. If you type "The capital of France is", the model doesn't "know" geography. It calculates that "Paris" is statistically the most likely next word.
2. The 3 Pillars of Control
Tokens: The currency of LLMs. The model processes text in chunks (tokens), not words. Rough math: 1,000 tokens ≈ 750 words.
Context Window: The "Short-Term Memory." It’s how much text the model can look at right now to answer you. If conversation exceeds this limit, the model "forgets" the beginning.
Temperature: The "Creativity Knob."
Temp = 0.0: Precise, deterministic, factual (Use for coding/data extraction).
Temp = 1.0: Creative, random, diverse (Use for brainstorming/poetry).
3. Cloud vs. Local (Ollama)
Cloud (OpenAI/Gemini): Smarter, but data leaves your device. Cost per token.
Local (Ollama): Runs on your laptop. Private. Free. Works offline. Great for testing agents without burning money.