Type something to search...
Reducing Story Generation from 20 mins to 2 mins with Temporal.io

Reducing Story Generation from 20 mins to 2 mins with Temporal.io

When building an AI-powered personalized storybook platform, I faced a critical performance challenge: generating a single story took 20 minutes. After implementing Temporal.io for workflow orchestration with parallel processing, I reduced this to just 2 minutes—a 10x improvement. Here’s how I did it.

The Problem: Sequential Processing Bottleneck

The initial story generation pipeline looked like this:

async def generate_story(prompt: str) -> Story:
    # Step 1: Generate story text (5 minutes)
    story_text = await openai_client.generate(prompt)

    # Step 2: Generate chapter images (5 min x 5 chapters = 25 minutes)
    images = []
    for chapter in story_text.chapters:
        image = await image_client.generate(chapter.description)
        images.append(image)

    # Step 3: Generate narration (5 minutes)
    narration = await elevenlabs_client.generate(story_text)

    # Step 4: Render final PDF (5 minutes)
    pdf = await render_pdf(story_text, images, narration)

    return pdf

Total time: ~40 minutes (with 5 chapters)

This sequential approach was a bottleneck. Each step waited for the previous one to complete, even though many operations could run in parallel.

Enter Temporal.io

Temporal.io is a durable execution platform that makes it easy to:

  • Orchestrate complex workflows
  • Handle failures and retries automatically
  • Run activities in parallel
  • Maintain workflow state across failures

Solution: Parallelized Workflow

Here’s how I restructured the workflow with Temporal:

Define Activities

from datetime import timedelta
from temporalio import activity, workflow

@activity.defn
async def generate_story_text(prompt: str) -> dict:
    """Generate the story text with chapters"""
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return parse_chapters(response.choices[0].message.content)

@activity.defn
async def generate_chapter_image(chapter: dict) -> str:
    """Generate image for a single chapter"""
    response = await openai_client.images.generate(
        model="dall-e-3",
        prompt=chapter["image_prompt"]
    )
    return response.data[0].url

@activity.defn
async def generate_narration(text: str) -> bytes:
    """Generate audio narration"""
    response = await elevenlabs_client.generate(text=text)
    return response.content

@activity.defn
async def render_story_pdf(story: dict, images: list, audio: bytes) -> bytes:
    """Render final PDF"""
    return await pdf_generator.render(story, images, audio)

Define the Workflow

from temporalio import workflow
from temporalio.workflow import execute_activity

@workflow.defn
class StoryGenerationWorkflow:
    @workflow.run
    async def run(self, prompt: str) -> bytes:
        # Step 1: Generate story text
        story_data = await execute_activity(
            generate_story_text,
            prompt,
            start_to_close_timeout=timedelta(minutes=10)
        )

        # Step 2: Generate all chapter images IN PARALLEL
        image_tasks = [
            execute_activity(
                generate_chapter_image,
                chapter,
                start_to_close_timeout=timedelta(minutes=5)
            )
            for chapter in story_data["chapters"]
        ]
        images = await asyncio.gather(*image_tasks)

        # Step 3: Generate narration
        narration = await execute_activity(
            generate_narration,
            story_data["full_text"],
            start_to_close_timeout=timedelta(minutes=5)
        )

        # Step 4: Render final PDF
        pdf = await execute_activity(
            render_story_pdf,
            story_data,
            images,
            narration,
            start_to_close_timeout=timedelta(minutes=5)
        )

        return pdf

Performance Breakdown

With the parallelized approach:

Story Text Generation:     2 minutes  (sequential)
├── Chapter Images:        2 minutes  (parallel, not 10!)
├── Narration:             1 minute   (can overlap with images)
└── PDF Rendering:         1 minute   (final step)

Total: ~6 minutes per story

But we can do even better by splitting chapter generation:

@workflow.defn
class OptimizedStoryGenerationWorkflow:
    @workflow.run
    async def run(self, prompt: str) -> bytes:
        # Start text generation immediately
        story_task = execute_activity(
            generate_story_text,
            prompt,
            start_to_close_timeout=timedelta(minutes=10)
        )

        # Wait for story text
        story_data = await story_task

        # Fire off all remaining activities simultaneously
        results = await asyncio.gather(
            # All chapter images in parallel
            *[execute_activity(generate_chapter_image, ch, ...)
              for ch in story_data["chapters"]],
            # Narration in parallel with images
            execute_activity(generate_narration, story_data["full_text"], ...),
            return_exceptions=True
        )

        images = results[:-1]  # All except narration
        narration = results[-1]

        pdf = await execute_activity(render_story_pdf, ...)
        return pdf

Final result: ~2 minutes per story

Setting Up Temporal with Python

Installation

pip install temporalio

Worker Configuration

from temporalio.worker import Worker
from temporalio.client import Client

async def main():
    # Connect to Temporal server
    client = await Client.connect("localhost:7233")

    # Run the worker
    worker = Worker(
        client,
        task_queue="story-generation-queue",
        workflows=[StoryGenerationWorkflow],
        activities=[
            generate_story_text,
            generate_chapter_image,
            generate_narration,
            render_story_pdf
        ]
    )

    await worker.run()

if __name__ == "__main__":
    asyncio.run(main())

Starting a Workflow from FastAPI

from fastapi import FastAPI
from temporalio.client import Client

app = FastAPI()

@app.post("/generate-story")
async def create_story(prompt: str):
    client = await Client.connect("localhost:7233")

    result = await client.execute_workflow(
        StoryGenerationWorkflow.run,
        prompt,
        id=f"story-{uuid.uuid4()}",
        task_queue="story-generation-queue",
        execution_timeout=timedelta(minutes=15)
    )

    return {"status": "completed", "pdf_url": result}

Key Benefits of Temporal

1. Automatic Retries

Activities retry automatically on failure:

@activity.defn
async def generate_chapter_image(chapter: dict) -> str:
    # Will retry on API failures
    with activity.heartbeater():
        response = await image_api.generate(chapter)
        return response.url

2. Durable Execution

Workflow state persists even if your server crashes:

# Temporal remembers where it left off
@workflow.defn
class ResumableWorkflow:
    @workflow.run
    async def run(self, prompt: str):
        # If this crashes at step 3, it resumes from there
        step1_result = await step1()
        step2_result = await step2()
        step3_result = await step3()  # Will resume from here

3. Visibility Dashboard

Temporal provides a web UI to monitor workflow executions, see history, and debug issues.

Production Tips

  1. Set appropriate timeouts: Each activity should have a realistic timeout
  2. Use heartbeats: For long-running activities, send heartbeats to avoid timeout
  3. Handle signals: Allow workflows to be cancelled or updated mid-execution
  4. Monitor resources: Parallel activities can spike API usage—implement rate limiting

Results

After implementing Temporal.io:

  • Story generation: 20 mins → 2 mins (10x faster)
  • Throughput: Capable of hundreds of stories per minute
  • Reliability: Automatic retries handle API failures gracefully
  • Scalability: Easy to scale workers horizontally

Conclusion

Temporal.io transformed our AI story generation from a slow, sequential process into a fast, parallel workflow. The key insight was identifying independent operations and running them concurrently. For any AI pipeline with multiple steps, Temporal is a game-changer.

Want to discuss workflow orchestration or share your own experiences? I’d love to connect!

Related Posts

Building RAG Pipelines with pgvector and OpenAI: A Practical Guide

Building RAG Pipelines with pgvector and OpenAI: A Practical Guide

Retrieval-Augmented Generation (RAG) has become the go-to architecture for building AI applications that need to reason over custom data. After implementing RAG pipelines that achieved 99% accuracy in

Read More
Integrating WhatsApp Business API for Sales Automation

Integrating WhatsApp Business API for Sales Automation

In traditional sales workflows, creating quotations can be a tedious process involving manual data entry, switching between applications, and repetitive copying. By integrating WhatsApp Business API d

Read More