Reducing Story Generation from 20 Minutes to 2 with Temporal.io
How durable workflow orchestration and parallel processing cut AI story generation latency by 10x — a practical guide for Python developers.
When building an AI-powered personalized storybook platform, I faced a critical performance challenge: generating a single story took 20 minutes. After implementing Temporal.io for workflow orchestration with parallel processing, I reduced this to just 2 minutes—a 10x improvement. Here’s how I did it.
The Problem: Sequential Processing Bottleneck
The initial story generation pipeline looked like this:
async def generate_story(prompt: str) -> Story:
# Step 1: Generate story text (5 minutes)
story_text = await openai_client.generate(prompt)
# Step 2: Generate chapter images (5 min x 5 chapters = 25 minutes)
images = []
for chapter in story_text.chapters:
image = await image_client.generate(chapter.description)
images.append(image)
# Step 3: Generate narration (5 minutes)
narration = await elevenlabs_client.generate(story_text)
# Step 4: Render final PDF (5 minutes)
pdf = await render_pdf(story_text, images, narration)
return pdf
Total time: ~40 minutes (with 5 chapters)
This sequential approach was a bottleneck. Each step waited for the previous one to complete, even though many operations could run in parallel.
Enter Temporal.io
Temporal.io is a durable execution platform that makes it easy to:
- Orchestrate complex workflows
- Handle failures and retries automatically
- Run activities in parallel
- Maintain workflow state across failures
Solution: Parallelized Workflow
Here’s how I restructured the workflow with Temporal:
Define Activities
from datetime import timedelta
from temporalio import activity, workflow
@activity.defn
async def generate_story_text(prompt: str) -> dict:
"""Generate the story text with chapters"""
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return parse_chapters(response.choices[0].message.content)
@activity.defn
async def generate_chapter_image(chapter: dict) -> str:
"""Generate image for a single chapter"""
response = await openai_client.images.generate(
model="dall-e-3",
prompt=chapter["image_prompt"]
)
return response.data[0].url
@activity.defn
async def generate_narration(text: str) -> bytes:
"""Generate audio narration"""
response = await elevenlabs_client.generate(text=text)
return response.content
@activity.defn
async def render_story_pdf(story: dict, images: list, audio: bytes) -> bytes:
"""Render final PDF"""
return await pdf_generator.render(story, images, audio)
Define the Workflow
from temporalio import workflow
from temporalio.workflow import execute_activity
@workflow.defn
class StoryGenerationWorkflow:
@workflow.run
async def run(self, prompt: str) -> bytes:
# Step 1: Generate story text
story_data = await execute_activity(
generate_story_text,
prompt,
start_to_close_timeout=timedelta(minutes=10)
)
# Step 2: Generate all chapter images IN PARALLEL
image_tasks = [
execute_activity(
generate_chapter_image,
chapter,
start_to_close_timeout=timedelta(minutes=5)
)
for chapter in story_data["chapters"]
]
images = await asyncio.gather(*image_tasks)
# Step 3: Generate narration
narration = await execute_activity(
generate_narration,
story_data["full_text"],
start_to_close_timeout=timedelta(minutes=5)
)
# Step 4: Render final PDF
pdf = await execute_activity(
render_story_pdf,
story_data,
images,
narration,
start_to_close_timeout=timedelta(minutes=5)
)
return pdf
Performance Breakdown
With the parallelized approach:
Story Text Generation: 2 minutes (sequential)
├── Chapter Images: 2 minutes (parallel, not 10!)
├── Narration: 1 minute (can overlap with images)
└── PDF Rendering: 1 minute (final step)
Total: ~6 minutes per story
But we can do even better by splitting chapter generation:
@workflow.defn
class OptimizedStoryGenerationWorkflow:
@workflow.run
async def run(self, prompt: str) -> bytes:
# Start text generation immediately
story_task = execute_activity(
generate_story_text,
prompt,
start_to_close_timeout=timedelta(minutes=10)
)
# Wait for story text
story_data = await story_task
# Fire off all remaining activities simultaneously
results = await asyncio.gather(
# All chapter images in parallel
*[execute_activity(generate_chapter_image, ch, ...)
for ch in story_data["chapters"]],
# Narration in parallel with images
execute_activity(generate_narration, story_data["full_text"], ...),
return_exceptions=True
)
images = results[:-1] # All except narration
narration = results[-1]
pdf = await execute_activity(render_story_pdf, ...)
return pdf
Final result: ~2 minutes per story
Setting Up Temporal with Python
Installation
pip install temporalio
Worker Configuration
from temporalio.worker import Worker
from temporalio.client import Client
async def main():
# Connect to Temporal server
client = await Client.connect("localhost:7233")
# Run the worker
worker = Worker(
client,
task_queue="story-generation-queue",
workflows=[StoryGenerationWorkflow],
activities=[
generate_story_text,
generate_chapter_image,
generate_narration,
render_story_pdf
]
)
await worker.run()
if __name__ == "__main__":
asyncio.run(main())
Starting a Workflow from FastAPI
from fastapi import FastAPI
from temporalio.client import Client
app = FastAPI()
@app.post("/generate-story")
async def create_story(prompt: str):
client = await Client.connect("localhost:7233")
result = await client.execute_workflow(
StoryGenerationWorkflow.run,
prompt,
id=f"story-{uuid.uuid4()}",
task_queue="story-generation-queue",
execution_timeout=timedelta(minutes=15)
)
return {"status": "completed", "pdf_url": result}
Key Benefits of Temporal
1. Automatic Retries
Activities retry automatically on failure:
@activity.defn
async def generate_chapter_image(chapter: dict) -> str:
# Will retry on API failures
with activity.heartbeater():
response = await image_api.generate(chapter)
return response.url
2. Durable Execution
Workflow state persists even if your server crashes:
# Temporal remembers where it left off
@workflow.defn
class ResumableWorkflow:
@workflow.run
async def run(self, prompt: str):
# If this crashes at step 3, it resumes from there
step1_result = await step1()
step2_result = await step2()
step3_result = await step3() # Will resume from here
3. Visibility Dashboard
Temporal provides a web UI to monitor workflow executions, see history, and debug issues.
Production Tips
- Set appropriate timeouts: Each activity should have a realistic timeout
- Use heartbeats: For long-running activities, send heartbeats to avoid timeout
- Handle signals: Allow workflows to be cancelled or updated mid-execution
- Monitor resources: Parallel activities can spike API usage—implement rate limiting
Results
After implementing Temporal.io:
- Story generation: 20 mins → 2 mins (10x faster)
- Throughput: Capable of hundreds of stories per minute
- Reliability: Automatic retries handle API failures gracefully
- Scalability: Easy to scale workers horizontally
Conclusion
Temporal.io transformed our AI story generation from a slow, sequential process into a fast, parallel workflow. The key insight was identifying independent operations and running them concurrently. For any AI pipeline with multiple steps, Temporal is a game-changer.
Want to discuss workflow orchestration or share your own experiences? I’d love to connect!
AI engineer & full-stack developer building LLM products, automation, and RAG pipelines.
Continue reading

Integrating the WhatsApp Business API for Sales Automation
Cutting quotation time from five minutes to under one by wiring the WhatsApp Business API into legacy accounting systems — an automation field guide.

Building Lightning-Fast Sites with Astro and TypeScript
Why Astro is the right tool for content-focused sites, and how it changed the way I think about shipping JavaScript to the browser.

Building RAG Pipelines with pgvector and OpenAI: A Practical Guide
Building production-ready Retrieval-Augmented Generation with PostgreSQL pgvector and OpenAI — and the lessons from hitting 99% retrieval accuracy in production.