Building in Public: Day 8 - Story Pack Generator Development (Part 2)

Stories Need Pictures

After getting the story generator working on Day 7, I opened a story page on my phone to see how it looked.

Just text. No images. It felt flat.

For Jekyll & Hyde alone, I’d need 18+ unique illustrations—one for each scene. If I commissioned an artist at $50-100 per illustration, that’s $900-1800 per story pack. For 10 story packs? I’d be spending millions of won on just illustrations.

That wasn’t realistic for a side project.

Using Gemini for Image Generation

I’m using Claude as my primary AI assistant, but it does not support image generation models yet. So I check Google’s Gemini instead.

The pricing looked good—essentially free for my usage with their generous free tier.

More importantly, Gemini supports reference images. You can pass example images alongside your prompt, and it tries to match that style.

This is crucial because I need consistency. If Mr. Hyde looks completely different in every scene, it breaks the story flow.

Cover Images vs Scene Images

I realized I needed two types of images:

Cover images (1:1 square) - Like book covers, with the title and author name as text on the image itself.

Scene images (4:3 landscape) - Illustrations for each story node, without any text.

The tricky part was getting Gemini to understand when to include text and when not to. AI image models are weird about text—if you tell them “no text,” they sometimes try harder to add text. So I had to be very explicit in the prompts.

Building the Generator Scripts

I built two Python scripts with Claude’s help:

generate_illust.py generates a single image (cover or scene).

generate_all_nodes.py reads the story JSON and generates images for all scenes automatically.

The second script was important because manually running commands for 18+ images would be tedious.

# This generates all 18 scene images automatically
python generate_all_nodes.py jekyll-and-hyde

It took about 5-7 minutes to generate all images for Jekyll & Hyde.

Reference Images for Consistency

The key to making this work was creating reference images first:

Style guide: One hero image showing the art style and color palette I want
Character references: Individual images for Dr. Jekyll, Mr. Hyde, Mr. Utterson

I spent about 2 hours creating these reference images. Then the generator automatically uses them when creating scene images.

If a scene mentions “Dr. Jekyll,” the script loads DrJekyll.png as a reference and tells Gemini to match that character’s appearance.

This solved the consistency problem. Now Dr. Jekyll looks the same in every scene, and the overall art style is consistent throughout the story.

Some Issues I Ran Into

Even with “NO TEXT” repeated multiple times in the prompt, occasionally Gemini would add mysterious symbols or text-like marks in the background. Maybe 5% of images needed regeneration.

But regenerating is fast (30 seconds), so it’s not a big problem.

Also, I initially tried to generate images one by one, which was annoying. After the 5th manual command, I realized I needed the batch script. That took an extra hour to build but saved me much more time later.

The Results

Now when I open a What If Classics story on my phone, there are illustrations on every page. The art style is consistent, characters look the same throughout, and it feels like an actual polished product instead of just a text prototype.

And the cost? $0 thanks to Gemini’s free tier.

Next I need to work on the personality reveal screen—that’s the moment users will want to share on social media, so it needs to look good.

To be continued…