Case Study

AI Infrastructure Product Workflows

Turning backend-heavy AI infrastructure capabilities into usable customer-facing and internal product workflows.

Summary

Role

Frontend lead for AI Services, frontend architecture, product workflow design, API contract collaboration

Scope

Model APIs, deployments, GPU resources, notebooks, storage, AI agents, analytics, billing, permissions, admin operations

Stack

React, TypeScript, REST APIs, dashboard UI, internal tooling

Focus

State modeling, API contracts, permissions, async workflows, operational usability

TL;DR

I helped turn EdgeCloud AI Services from separate infrastructure capabilities into a coherent product system for customers and internal operators.

Context

Theta EdgeCloud exposed AI infrastructure workflows across model APIs, notebooks, GPU resources, storage, agents, usage, billing, permissions, and internal admin operations.

The frontend challenge was turning backend-heavy capabilities into safe customer workflows and reliable internal operator tools.

View public EdgeCloud site →

Problem

The challenge was not just building screens. The product needed clear paths around configuration, validation, permissions, async provisioning states, usage visibility, billing context, and failure recovery.

The main risks showed up as:

Inconsistent statesScattered permissionsOperational handoffs

Ownership

I led most frontend implementation for EdgeCloud AI Services, covering customer-facing AI infrastructure flows and the internal admin dashboard used by support, QA, operations, and engineering.

My work often sat between product intent and platform constraints: deciding what should be visible to users, what needed backend confirmation, what belonged in reusable UI patterns, and where internal tooling needed different assumptions from customer-facing workflows.

Product system map

Product System Map Customer surfaces / shared layer / internal ops

Customer-facing workflows

Model APIsNotebooksGPU ResourcesStorageAI Agents

Shared product layer

Org / Project ContextPermissionsAsync StatesResource LifecycleUsage Context

Internal admin ops

Account LookupFeature FlagsModel / Agent ConfigVM / Storage SettingsPlatform Metrics

Separate infrastructure capabilities became one product system through shared org/project context, permissions, async states, lifecycle patterns, and admin operations.

Representative workflow

GPU lifecycle workflow Permissions / status / project context

Discover

Filter

Check quota

Provision

Monitor

Recover

Cross-cutting support

Org / project context Permissions Deployment status Event metadata Polling ownership Row-level actions

The same surface had to support customer evaluation and operator follow-up without flattening lifecycle detail into a generic table.

Key Decisions

Make workflow states explicit

Model complex flows around clear states such as draft, validating, provisioning, ready, failed, disabled, and permission-blocked instead of treating them as generic loading states.

Design customer and operator paths differently

Customer-facing pages need confidence and guardrails. Internal tools need fast lookup, safe defaults, status visibility, and recovery paths.

Push repeated parsing to the boundary

Work with backend engineers so analytics, billing, configuration, and admin views can share stable response shapes.

Impact

Turned AI Services into a more coherent self-serve product across model APIs, notebooks, GPU resources, storage, agents, and billing.

Built the internal admin dashboard from scratch and made common support, QA, ops, and engineering workflows self-serve.

Created reusable frontend patterns for permissions, async states, configuration forms, validation, and recovery paths.

Improved the boundary between customer-facing product areas and internal operations.

Trade-offs

A more explicit state model adds upfront structure, but makes complex workflows easier to debug and extend.
UI-ready API contracts require backend alignment, but reduce duplicated frontend parsing.
Internal admin tools can move faster than customer-facing pages, but still need permission clarity, auditability, and safe failure handling.

Closing

The core challenge was not rendering AI infrastructure data. It was making complex platform capabilities feel understandable, operable, and trustworthy.