offline-ai-toolkit

$npx mdskill add TerminalSkills/skills/offline-ai-toolkit

Build self-contained AI systems that work offline with local models and knowledge bases

  • Solve the need for AI tools in air-gapped or low-connectivity environments
  • Uses Ollama for local LLMs, SQLite for embedded vector search, and pre-downloaded knowledge
  • Selects appropriate models and knowledge sources based on task requirements and resource constraints
  • Delivers results through a PWA interface or integrates directly with AI agents for automated workflows

SKILL.md

.github/skills/offline-ai-toolkitView on GitHub ↗
---
name: offline-ai-toolkit
description: >-
  Build offline-capable AI systems with local models, embedded knowledge bases,
  and no internet dependency. Use when: building AI tools for offline use,
  creating self-contained knowledge systems, deploying AI in air-gapped
  environments.
license: MIT
compatibility: "Node.js 18+ or Python 3.10+, Ollama"
metadata:
  author: terminal-skills
  version: "1.0.0"
  category: development
  tags: [offline, local-ai, ollama, knowledge-base, edge-ai]
  use-cases:
    - "Build a self-contained AI assistant that works without internet"
    - "Create an offline knowledge base with local LLM for field work"
    - "Deploy AI tools in air-gapped or low-connectivity environments"
  agents: [claude-code, openai-codex, gemini-cli, cursor]
---

# Offline AI Toolkit — Self-Contained AI Systems

## Overview

Build AI systems that work completely offline — local LLMs via Ollama, embedded vector search with SQLite, knowledge bases from pre-downloaded content, and a PWA interface that runs without connectivity. Ideal for field work, air-gapped environments, or privacy-first deployments.

## Instructions

### Step 1: Install Ollama for Local LLMs

```bash
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.1:8b       # General purpose (4.7GB)
ollama pull nomic-embed-text   # Embeddings (274MB)
```

| Use Case | Model | RAM Needed |
|----------|-------|------------|
| General Q&A | llama3.1:8b | 8GB |
| Quick answers | phi3:mini | 4GB |
| Code help | codellama:7b | 8GB |
| Embeddings | nomic-embed-text | 2GB |

### Step 2: Build the Offline Knowledge Base

```python
import requests, sqlite3, os

def init_knowledge_db(db_path='knowledge.db'):
    """Initialize SQLite database for knowledge storage."""
    conn = sqlite3.connect(db_path)
    conn.execute('''CREATE TABLE IF NOT EXISTS documents (
        id INTEGER PRIMARY KEY AUTOINCREMENT, title TEXT,
        source TEXT, content TEXT, category TEXT,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    )''')
    conn.execute('''CREATE TABLE IF NOT EXISTS embeddings (
        id INTEGER PRIMARY KEY, doc_id INTEGER REFERENCES documents(id),
        chunk_text TEXT, embedding BLOB, chunk_index INTEGER
    )''')
    conn.execute('''CREATE VIRTUAL TABLE IF NOT EXISTS fts_documents
        USING fts5(title, content, category)''')
    return conn

def download_wikipedia_articles(topics, conn):
    """Download Wikipedia articles for offline knowledge."""
    for topic in topics:
        url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{topic}"
        try:
            r = requests.get(url, timeout=10)
            data = r.json()
            content = data.get('extract', '')
            if content:
                conn.execute(
                    'INSERT INTO documents (title, source, content, category) VALUES (?, ?, ?, ?)',
                    (data.get('title', topic), f'wikipedia:{topic}', content, 'encyclopedia'))
        except Exception as e:
            print(f"Failed to download {topic}: {e}")
    conn.commit()
```

### Step 3: Generate Embeddings with Ollama

```python
import struct

def get_embedding(text, model='nomic-embed-text'):
    """Get embedding vector from Ollama."""
    response = requests.post('http://localhost:11434/api/embeddings',
                             json={'model': model, 'prompt': text})
    return response.json()['embedding']

def embedding_to_blob(embedding):
    return struct.pack(f'{len(embedding)}f', *embedding)

def blob_to_embedding(blob):
    n = len(blob) // 4
    return list(struct.unpack(f'{n}f', blob))

def embed_all_documents(conn, chunk_size=500):
    """Generate embeddings for all documents in the database."""
    cursor = conn.execute('SELECT id, content FROM documents')
    for doc_id, content in cursor.fetchall():
        words = content.split()
        for i in range(0, len(words), chunk_size):
            chunk = ' '.join(words[i:i + chunk_size])
            if len(chunk.strip()) < 20:
                continue
            emb = get_embedding(chunk)
            conn.execute(
                'INSERT INTO embeddings (doc_id, chunk_text, embedding, chunk_index) VALUES (?, ?, ?, ?)',
                (doc_id, chunk, embedding_to_blob(emb), i // chunk_size))
    conn.commit()
```

### Step 4: Offline Vector Search

```python
import math

def cosine_similarity(a, b):
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = math.sqrt(sum(x * x for x in a))
    norm_b = math.sqrt(sum(x * x for x in b))
    return dot / (norm_a * norm_b) if norm_a and norm_b else 0

def search_knowledge(query, conn, top_k=5):
    """Search the knowledge base using vector similarity."""
    query_emb = get_embedding(query)
    cursor = conn.execute('SELECT doc_id, chunk_text, embedding FROM embeddings')
    results = []
    for row in cursor.fetchall():
        sim = cosine_similarity(query_emb, blob_to_embedding(row[2]))
        results.append({'chunk_text': row[1], 'doc_id': row[0], 'similarity': sim})
    results.sort(key=lambda x: x['similarity'], reverse=True)
    return results[:top_k]
```

### Step 5: RAG with Local LLM

```python
def ask_offline(question, conn, model='llama3.1:8b'):
    """Answer questions using local RAG pipeline."""
    results = search_knowledge(question, conn, top_k=3)
    context = '\n\n'.join([r['chunk_text'] for r in results])
    response = requests.post('http://localhost:11434/api/generate', json={
        'model': model,
        'prompt': f"""Answer using ONLY the context provided.
If the context doesn't contain the answer, say "I don't have information about that."

Context:
{context}

Question: {question}
Answer:""",
        'stream': False
    })
    return {
        'answer': response.json()['response'],
        'sources': [r['chunk_text'][:100] for r in results],
        'model': model
    }
```

## Examples

### Example 1: Build an Offline Field Research Assistant

A wildlife researcher prepares an offline AI assistant before a 2-week trip to a remote area with no connectivity:

```python
# While online: download knowledge and build embeddings
conn = init_knowledge_db('field_research.db')

# Load species identification guides and park documentation
download_wikipedia_articles([
    'Grizzly_bear', 'Gray_wolf', 'Elk', 'Moose',
    'Yellowstone_National_Park', 'Wildlife_tracking',
    'Bear_safety', 'GPS_navigation'
], conn)

# Ingest local field manuals (markdown files on laptop)
for root, _, files in os.walk('./field-manuals'):
    for f in files:
        if f.endswith('.md'):
            with open(os.path.join(root, f)) as fh:
                conn.execute('INSERT INTO documents (title, source, content, category) VALUES (?,?,?,?)',
                             (f, os.path.join(root, f), fh.read(), 'field-manual'))
conn.commit()
embed_all_documents(conn)

# In the field (fully offline):
result = ask_offline("What are the signs of a nearby grizzly bear den?", conn)
# Answer: "Look for excavated hillside entrances, claw marks on nearby trees,
#  matted vegetation, and a strong musky odor. Dens are typically on north-facing
#  slopes at elevations above 6,000 feet..."
```

### Example 2: Air-Gapped Developer Documentation Server

A defense contractor sets up an offline coding assistant for a secure facility with no internet:

```python
conn = init_knowledge_db('dev_docs.db')

# Pre-load language and framework documentation
import os
for doc_dir in ['./docs/python-stdlib', './docs/react-docs', './docs/kubernetes']:
    for root, _, files in os.walk(doc_dir):
        for f in files:
            if f.endswith(('.md', '.txt', '.rst')):
                path = os.path.join(root, f)
                with open(path, 'r', errors='ignore') as fh:
                    conn.execute('INSERT INTO documents (title,source,content,category) VALUES (?,?,?,?)',
                                 (f, path, fh.read(), 'dev-docs'))
conn.commit()
embed_all_documents(conn)

# Developer queries the system (no internet needed):
result = ask_offline("How do I create a Kubernetes CronJob that runs every 6 hours?", conn)
# Answer: "Create a CronJob manifest with schedule '0 */6 * * *' and specify
#  your container image in the jobTemplate spec. Set restartPolicy to OnFailure..."
print(result['sources'])  # Shows which doc chunks were used as context
```

## Guidelines

- **Download everything while online** — models, knowledge content, and embeddings must be prepared beforehand
- **Test offline before deploying** — disconnect WiFi and verify the full pipeline works end-to-end
- **Choose models by hardware** — phi3:mini for 4GB RAM devices, llama3.1:8b for 8GB+, llama3.1:70b for workstations
- **Use FTS as fallback** — SQLite full-text search works when embeddings are unavailable or for exact matches
- **Package for portability** — bundle everything on a USB drive or Docker image for easy deployment
- **Keep knowledge fresh** — sync new content and re-embed when connectivity returns

## References

- [Ollama](https://ollama.ai/) — local LLM runtime
- [SQLite FTS5](https://www.sqlite.org/fts5.html) — full-text search
- [PWA docs](https://web.dev/progressive-web-apps/) — offline-first web apps

More from TerminalSkills/skills