Building in Public: Day 21 - The Retrospective That Changed Everything

20 Days: What We Built

Time for a comprehensive retrospective. Here’s what happened in 20 days of building What If Classics:

Shipped:

3 complete story packs (Jekyll & Hyde, Frankenstein, Pride & Prejudice)
Full bilingual support (English + Korean with native-quality prose)
267 pages generated (stories + blog + library)
109 optimized images (70-85% size reduction)
23-minute automated generation pipeline
20 building-in-public blog posts

Compared to the original 100-day plan:

MVP planned for Day 30 → Delivered by Day 14 (16 days ahead)
1 story pack expected → Built 3 (300% over target)
Korean support planned for Days 50-58 → Done by Day 16
Image optimization planned for Days 66-72 → Done by Day 20

Speed: Everything measured in “days ahead of schedule.”

Key Learnings (Days 1-20)

Building at this pace taught us some valuable lessons:

1. AI Prompt Engineering “Please make 8 endings” → Got 3 endings. “MINIMUM 12 endings (NON-NEGOTIABLE)” → Got 12 endings. Lesson: AI needs explicit constraints, not polite suggestions.

2. Native Content > Translation Korean API translation cost money and produced mechanical text. Native generation via Claude Code cost $0 and produced natural prose. Lesson: For cultural content, invest in native generation.

3. Developer Blindness Spent weeks building, couldn’t see obvious UX issues (unreadable code blocks, missing buttons). Lesson: You can’t see your own UX problems.

4. Windows UTF-8 Hell Korean Windows defaults to cp949 encoding. Fixed this bug across 5+ Python scripts. Lesson: Always specify encoding explicitly on Windows.

5. Performance Isn’t Optional Served 215MB of unoptimized PNGs until Day 20. Astro’s Image component cut that to ~50MB with 3 lines of code. Lesson: Implement image optimization from day one.

Then I Ran the Numbers

Everything looked great on the dashboard. But during the retrospective, I actually tested the product with a critical eye.

I ran an analysis script on the story paths.

The One-Choice Personality Test

Our core feature is MBTI personality analysis based on story choices. It’s supposed to be the viral hook—the thing people share on social media.

I ran an analysis script on our Frankenstein story:

Choices per Path:
  Min: 1 choice
  Max: 6 choices
  Avg: 3.7 choices

Problem: 9 out of 12 paths have fewer than 5 choices

One choice.

Imagine telling someone “Based on this one decision, you’re an INTJ.” They’d laugh at you. I would too.

Why This Matters

MBTI has four dimensions (E/I, S/N, T/F, J/P). To feel remotely credible, you need at least 2-3 measurements per dimension. That’s 8-12 choices minimum.

We have 1-6 choices per path. The math doesn’t just fall short—it’s embarrassing.

75% of our story paths fail the minimum credibility threshold.

What Went Wrong

We optimized for the wrong metrics:

✅ “12+ endings per story” - Achieved!
✅ “Non-convergent branching” - Achieved!
✅ “3-5 minute playback time” - Achieved!
❌ “Enough choices for credible personality analysis” - Completely missed

Speed masked quality. We built three stories in the time planned for one, but none of them actually work for their intended purpose.

The Fix

Minimum 10 choices per path. Non-negotiable.

This means:

Rewrite the story generator with strict validation
Regenerate all three existing stories
Extend playback time to 5-7 minutes
Test with real users before launching

Timeline: 11 days to fix everything properly.

Marketing delay: From Day 21 to Day 33.

Building in public means sharing the failures, not just the wins.

What I learned:

Speed without quality validation is worthless
Metrics can hide fundamental problems
User testing reveals what dashboards don’t show
Better to catch this now than after marketing

What building in public prevented: I was literally one day away from setting up social media accounts and starting to promote a broken product. The retrospective exercise forced me to actually look at the data.

Eleven days fixing quality now saves months of marketing a product nobody would take seriously.

What 20 Days Actually Proved

Technical capabilities (validated ✅):

Fast content generation (23 min per story)
Automated pipeline with quality control
Bilingual native-quality experiences
Performance optimization at scale

Product quality (failed ❌):

Core MBTI feature doesn’t work properly
Metrics showed success, reality showed failure
Speed masked fundamental quality issues

The Revised Plan

Days 1-20: Built fast infrastructure Days 21-32: Fix core product quality (11-day delay) Days 33+: Market with confidence

Better to launch late with quality than early with a product people would mock.

Building in public means showing the wins AND the failures. This is both.