AI Task Testing Cycle

A practical system for evaluating AI progress by testing it against real business tasks every 3-6 months to determine automation readiness.

Core Evaluation Framework

  • Tests AI against 10 specific business tasks currently done by humans
  • Evaluates progress on a spectrum from 0/10 to 10/10 tasks automated
  • Focuses on real business impact rather than abstract benchmarks
  • Considers actual cost savings and human resource implications

Task Automation Assessment Levels

  • Level 1: Cannot do the task at all
  • Level 2: Can do it but not well enough for practical use
  • Level 3: Can do with human oversight/assistance
  • Level 4: Full automation - no human needed

Real Business Examples Tested

  1. E-commerce Photography ($5-10k/month current cost)

    • Product photography
    • Model photography
    • Video content generation
    • Current status: Can do but quality issues (faces morphing, visual artifacts)
  2. Inventory Forecasting

    • Currently requires 2 full-time employees
    • Handles multiple product variations
    • Uses historical and real-time sales data
    • Goal: Replace humans or augment with AI assistance

Success Metrics

  • Real cost reduction
  • Quality comparison to human work
  • Practical implementation feasibility
  • Actual business value delivered
  • Reduction in human workforce needed

Philosophy

  • Focuses on practical business applications over academic benchmarks
  • Tests against actual business costs and needs
  • Evaluates based on real-world implementation feasibility
  • Regular retesting as AI capabilities evolve
SP

Shaan Puri

Host of MFM

Shaan Puri is the Chairman and Co-Founder of The Milk Road. He previously worked at Twitch as a Senior Director of Product, Mobile Gaming, and Emerging Markets. He also attended Duke University.

WebsiteTwitter
Host
Restaurateur
E-commerce