NashTech Blog

GitHub Spec Kit: A Practical Guide to Structured AI Development

1. Introduction

It’s Tuesday night, 11 PM. A developer is staring at code that works but makes no sense. The AI wrote it three weeks ago. All the reasoning is buried somewhere in a chat log. A teammate messages: “Why did you name this function like that?” There’s no good answer.

This happens all the time now. Teams try AI coding tools. At first it feels great. Type a prompt, get working code, ship it. Then maintenance starts. Nobody understands anything. Code reviews become awkward. “Why this approach?” gets answered with “The AI suggested it.”

GitHub Spec Kit tries to fix this. It’s not another code generator. Think of it more like bumpers on a bowling lane. You still throw the ball. You still aim. But you’re less likely to end up in the gutter.

The idea is simple: write down what you’re building BEFORE you prompt the AI. Sounds boring? Maybe. But it saves hours of confusion later.

This guide shows what happened when real teams tried Spec Kit. Actual prompts. Actual bugs. Moments where it worked surprisingly well.

2. What Is Spec Kit?

Spec Kit is GitHub’s toolkit for structured AI coding. They call it “spec-driven development.” Here’s how it works:

Write a spec first. What are you building? Why? What are the limits?

Let AI make a plan. How should it be built? What tech? What architecture?

Break it into tasks. Split the plan into small pieces.

Generate code task by task. AI writes code. You review. You iterate.

There’s also a file called constitution.md. Teams write their coding rules once. “No ORMs.” “Everything needs tests.” “Use TypeScript.” The AI remembers these rules. You don’t have to repeat them in every prompt.

The workflow feels less like chatting and more like following steps. You can skip ahead if you want. But you’ll probably regret it.

3. Why GitHub Built It

GitHub didn’t wake up loving process. They built this because everyone using AI to code hit the same wall.

First it’s magic. You type. You get code. You ship. Amazing. A week later you need to change something. Nobody knows how it works. Your teammate looks at your PR and asks why you did it this way. You shrug. “The AI suggested it.”

Spec Kit fixes three problems:

AI gives different results every time (same prompt, different code) Nobody writes down WHY decisions were made Teams can’t work together when everyone’s having private AI chats

That’s it. GitHub saw the pattern and built a tool.

4. Setup and Installation

Getting Spec Kit running takes about 10 minutes. Maybe 15 if you type slow.

What You Need

Node.js (v18 or newer) Git GitHub account API key for OpenAI, Anthropic, or Gemini

How to Install

Get the code:

git clone https://github.com/github/spec-kit.git
cd spec-kit

Install stuff:

npm install

Takes 2-3 minutes usually.

Add your API key:

cp .env.example .env

Open .env and paste your key:

OPENAI_API_KEY=sk-proj-whatever
# or if you use Claude:
ANTHROPIC_API_KEY=your_key_here

Use whatever AI service you already pay for.

Start a project:

npm run init my-first-project

This makes a folder with: constitution.md for your rules spec.md for specs plan.md for plans tasks/ for task files

Set Up Your Rules

Open constitution.md and write down how you like to code. Here’s an example:

# Project Constitution

## Code Style
- TypeScript everywhere
- Functions need JSDoc comments
- Functions over classes when possible
- Keep it simple

## Testing
- Every function needs a test
- Use Jest
- Aim for 80% coverage

## Architecture
- Keep business logic separate
- No ORMs, use raw SQL
- No circular dependencies
- Files under 300 lines

The AI reads this and follows it. Sounds fake but it actually works.

Check if it’s working:

npm run verify

If you see “Setup complete,” you’re good.

5. How It Works

Spec Kit has four stages. You go in order. Don’t skip.

/specify – Write the Spec

This is where you stop and think. What are you actually building? Why? What are the rules?

You write it down. The AI asks questions. You go back and forth until it’s clear.

Example: You say “build a photo gallery.” The AI asks “What file types?” and “How are photos organized?” Annoying but useful questions.

/plan – Make a Plan

Once your spec is clear, run /plan. The AI writes up the architecture. APIs, database, folders, libraries. Everything you’d normally sketch on a whiteboard.

Cool part: it reads your constitution.md. If you said “no ORMs,” it won’t suggest Prisma.

/tasks – Split Into Tasks

This command breaks the plan into small chunks. Each task should take 30-60 minutes max. Each one makes sense on its own. You can show a task to someone else and they’ll get it right away.

Build It

Now you actually code. Well, the AI codes and you review piece by piece.

Here’s the good part: if something looks wrong, you don’t just patch the code. You go back to the spec or plan. Fix the root problem. Regenerate. Most AI tools are one way streets. Once code exists, you’re stuck. Spec Kit lets you fix things at the design level.

6. Why This Helps

Testing this on a few projects showed something interesting. It didn’t make coding faster. It made it less stressful.

Normally when you build something, you’re making a million tiny decisions. Where does this go? What should this be called? Library or custom code? Every choice takes mental energy.

With Spec Kit, most choices are already made during planning. Building feels more like checking boxes than solving puzzles.

What people noticed:

Caught bugs earlier. Problems showed up during planning, before any code existed. Way easier to fix.

Docs wrote themselves. The spec and plan ARE the docs. No need for a separate README later.

Reviews got easier. Instead of “why did you do this?”, people just read the spec. All the reasoning was already there.

New people caught up faster. Junior devs read the spec and tasks and understood projects in under an hour.

The structure made AI feel less like a black box. More like a tool you actually control.

7. Real Example: Photo Gallery App

To test Spec Kit properly, someone built a photo gallery over a weekend. Upload images, drag to reorder, auto-generate thumbnails, store in SQLite.

Step 1 – The Spec

What they typed:

Create a spec for a photo gallery web app. Users upload photos, see them grouped by date, drag to reorder, thumbnails get generated. Use SQLite.

What the AI wrote:

project: Photo Gallery Web App

goals:
  - Upload, view, organize photos
  - Drag-and-drop reordering
  - Auto-group by upload date
  - Thumbnail generation

constraints:
  - SQLite storage
  - Single-level gallery (no folders)
  - Max image: 10MB
  - Formats: JPG, PNG, WebP

The “Max image: 10MB” wasn’t requested. But it makes sense. Without a limit, someone uploads a 50MB file and everything crashes. The AI caught a bug before any code was written.

Step 2 – The Plan

What they typed:

Make a technical plan. Use Vite + vanilla JS frontend, Node.js + Express backend.

The AI’s plan (shortened):

Backend: Node.js + Express + SQLite

Endpoints:
  - POST /photos (upload + thumbnail)
  - GET /photos (grouped by date)
  - PATCH /photos/:id/order

Frontend: Vite + vanilla JS
  - Upload form with drag-drop
  - Gallery grid
  - Thumbnail renderer

Data Model:
  Photo(id, filename, path, thumbnail_path, upload_date, order_index)

This looked like what you’d sketch yourself. But it took 30 seconds instead of 20 minutes.

Step 3 – The Tasks

Running /tasks gave:

  1. Setup Node + Express server
  2. Setup SQLite with Photo schema
  3. POST /photos with file upload (multer)
  4. Thumbnail generation (sharp)
  5. GET /photos with date grouping
  6. PATCH /photos/:id/order
  7. Build upload form
  8. Build gallery grid with drag-drop
  9. Wire up API calls
  10. Error handling + validation

Each task felt doable. Nothing too big or too vague.

Step 4 – What Happened

Working through the tasks one by one:

Upload API: AI suggested multer. It’s a solid library. Code worked first try. Had to tweak the file path a bit but that’s it.

Thumbnails: Image processing usually has weird bugs. But the AI used sharp (good library) and the code just worked. Thumbnails came out looking good. Right size. Right quality.

Drag-and-drop: Vanilla JS drag-drop is annoying to code. Expected to spend an hour debugging. But the AI’s code worked. Drag a photo, it moves. Had to fix some CSS to make animations smooth, but the core logic was solid.

Bug: At one point the gallery stopped saving new order after dragging. Photos jumped back on refresh. Instead of debugging for hours, they went back to Spec Kit. Refined the task: “Make sure order_index updates in the database after reordering.” AI regenerated that task’s code. Bug gone.

By Sunday evening there was a working gallery. More importantly, there were docs explaining every decision. If someone else needed to work on it later, everything was already written down.

Whole thing took maybe half the usual time.

8. Real Example: To-Do List App

Second test was a to-do list app. Add, edit, delete tasks. Filter by status. Store locally. Pretty standard stuff.

Step 1 – The Spec

What they typed:

Spec a to-do list app. Add/edit/delete tasks. Each task has title, description, deadline, status. Local storage. Filter by status.

AI’s spec:

project: To-Do List App

features:
  - Add, edit, delete tasks
  - Filter by status (pending/completed)
  - Deadlines with optional reminders
  - Mark complete/incomplete

constraints:
  - SQLite for local storage
  - Responsive web UI
  - No authentication (single user)

“Optional reminders” wasn’t requested but it makes sense. Deadlines without reminders are kind of pointless.

Step 2 – The Plan

What they typed:

Plan it with React + Tailwind frontend, Node.js + SQLite backend.

AI’s plan:

Frontend: React + Tailwind
Backend: Node + Express + SQLite

APIs:
  - POST /tasks
  - GET /tasks (optional status filter)
  - PATCH /tasks/:id
  - DELETE /tasks/:id

Data Model:
  Task(id, title, description, deadline, status, created_at)

Nothing weird. Pretty standard setup.

Step 3 – The Tasks

  1. Setup React + Tailwind
  2. Setup Node backend + SQLite
  3. CRUD APIs
  4. Task list UI
  5. Filter dropdown
  6. Connect frontend to backend
  7. Deadline reminders
  8. Validation (no empty titles)

Step 4 – What Happened

Frontend setup: Tailwind config is usually annoying. Spec Kit made a clean setup instantly.

CRUD APIs: Solid boilerplate code. Only changed error messages to be nicer.

Filter by status: Expected this to be tedious. Dropdowns always are. But AI wired it up cleanly in React. Just a few lines.

Deadline reminders: This was interesting. AI first suggested setTimeout on the client side. That breaks when you reload the page. So they refined it: “Reminders need to persist.” AI then proposed node-cron on the backend. That actually made sense.

Bug: Saving a task with empty title crashed the app. AI forgot validation. They went back to the spec. Added “titles can’t be empty.” Regenerated the plan. Validation appeared automatically.

Two evenings later: working to-do app with filtering and reminders. More importantly, the project made sense. Come back in six months and you wouldn’t be confused. Everything’s documented.

9. Comparing Different Ways

After a few projects, here’s how things stack up:

Way of WorkingWhat’s GoodWhat’s Not
Spec KitClear docs, small tasks, easy to maintainTakes time upfront, still kinda new
Just prompt AISuper fast, no setupInconsistent, breaks easily, hard to maintain
Traditional agileEveryone knows it, very thoroughSlow, not built for AI
Multi-agent toolsPowerful, lots of automationComplex, hard to learn

For quick throwaway stuff, just prompt away. But for real projects you’ll maintain, Spec Kit works better.

10. Problems and Limits

Spec Kit isn’t perfect. Real issues:

Takes time to learn. Your brain wants to jump into code. Forcing yourself to spec first feels weird for the first couple projects. Eventually it clicks.

Overkill for small things. Don’t use this for a 50-line script. Not worth it.

AI quality matters. Spec Kit is only as good as your AI model. Bad model means bad plans.

Still experimental. GitHub says this is early stage. Don’t expect tons of docs or a big community yet.

Not magic. You still need to know what you’re building. Spec Kit won’t fix vague requirements. It’ll just show you the problems faster.

11. Conclusion

Spec Kit changed how teams work with AI. But not in the expected way. It didn’t make coding faster. It made code make sense.

The win isn’t speed. The win is clarity. When someone asks “why this way?”, the answer is in the spec. When you come back to old code, you don’t have to reverse engineer your own thinking.

If AI-generated code has burned you before, try Spec Kit. If AI coding feels like productive chaos, try Spec Kit. It won’t fix everything. But it gives you a process for dealing with the mess.

For teams serious about using AI in development, it turns “prompt and hope” into something that actually feels like building software.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top