Case Study

← Back to selected work

AI Infrastructure Product Workflows

Turning backend-heavy AI infrastructure capabilities into usable customer-facing and internal product workflows.

Summary

Role

Frontend lead for AI Services, frontend architecture, product workflow design, API contract collaboration

Scope

Model APIs, deployments, GPU resources, notebooks, storage, AI agents, analytics, billing, permissions, admin operations

Stack

React, TypeScript, REST APIs, dashboard UI, internal tooling

Focus

State modeling, API contracts, permissions, async workflows, operational usability

Overview

My role

Led most frontend implementation for EdgeCloud AI Services, turning complex AI infrastructure capabilities into customer-facing workflows and internal admin tools across model APIs, deployments, notebooks, GPU resources, storage, agents, billing, analytics, and operational dashboards.

Context

Theta EdgeCloud exposes complex AI infrastructure workflows across on-demand model APIs, model and deployment management, Jupyter notebooks, GPU nodes and clusters, persistent storage, AI agents, usage analytics, billing, organization / project / user roles, and internal admin operations.

Many of these capabilities start from backend-heavy or operations-heavy concepts: provisioning compute resources, configuring model templates, tracking usage, managing billing context, handling permissions, and supporting operational debugging.

The frontend needed to make these capabilities understandable and safe for external customers, while also giving internal teams reliable tools to support, test, and operate the platform.

View public EdgeCloud site →

Problem

The challenge was not just building screens. The product needed clear workflows around configuration, validation, permissions, async provisioning states, usage visibility, billing context, and failure recovery.

Without strong frontend boundaries, these areas could easily become inconsistent:

  • each page modeling loading, empty, and error states differently
  • permissions being checked in scattered places
  • API responses being parsed differently across views
  • customer-facing and internal workflows becoming tightly coupled
  • support and QA depending on engineering for routine operations
  • internal configuration changes requiring too much manual engineering involvement

Ownership

I led most of the frontend implementation for the EdgeCloud AI Services area, with deep ownership across customer-facing AI infrastructure workflows and internal admin tooling.

I designed and implemented customer-facing workflows for model APIs, deployments, notebooks, GPU resources, storage, AI agents, billing, and analytics.

I also built the internal admin dashboard from scratch and evolved it into a default entry point for support, QA, operations, and engineering workflows. The internal tooling covered model template configuration, organization / project / account lookup and management, feature flags, VM and persistent settings, platform-wide usage metrics, and detailed Grafana chart views.

My work often sat between product intent and platform constraints: deciding what should be visible to users, what needed backend confirmation, what belonged in reusable UI patterns, and where internal tooling needed different assumptions from customer-facing workflows.

Product Surfaces

Customer-facing workflows

  • On-demand model API playgrounds and configuration flows
  • Model and deployment management
  • Jupyter notebook workflows
  • GPU node and cluster management
  • Persistent storage workflows
  • AI agent configuration
  • Usage analytics and billing-related visibility

Internal admin and ops workflows

  • Model template configuration
  • Organization, project, and account lookup / management
  • Feature flag management
  • VM and persistent settings
  • Platform-wide usage metrics
  • Detailed Grafana chart views
  • Support, QA, and operations workflows

Product System Map

Product System Map Shared frontend patterns

Customer-facing workflows

Model APIsDeploymentsNotebooksGPU Nodes / ClustersPersistent StorageAI AgentsBilling / QuotaOrganizations / Projects

Shared product layer

Org / Project ContextPermissionsFeature FlagsAsync StatesResource LifecycleUsage / Billing

Internal admin ops

Account / Org / Project LookupFeature FlagsModel ConfigAI Agents ConfigVM / Persistent SettingsPlatform MetricsQuota Settings

Separate AI infrastructure capabilities became one product system through shared org/project context, permissions, async states, lifecycle patterns, and admin workflows.

Key Decisions

Make workflow states explicit

Model complex flows around clear states such as draft, validating, provisioning, ready, failed, disabled, and permission-blocked instead of treating them as generic loading states.

Keep backend authorization as the source of truth

Use frontend permission gates to improve UX and reduce confusion, but never treat frontend checks as the security boundary.

Separate customer-facing and internal admin workflows

Customer-facing pages optimize for clarity, confidence, and guided actions. Internal tools optimize for scanability, recovery, configuration accuracy, and operational speed.

Push for UI-ready API contracts

Work with backend engineers to avoid repeated frontend parsing and to make analytics, billing, configuration, and admin views easier to extend.

Build reusable patterns for configuration-heavy workflows

Use consistent patterns for forms, validation, async submission, error recovery, empty states, confirmation states, and permission-aware actions.

Optimize internal tools for operator speed

For support, QA, and ops users, prioritize fast lookup, safe defaults, clear status visibility, and recovery paths over decorative UI complexity.

Impact

The work helped turn complex AI infrastructure capabilities into product surfaces that were easier to understand, operate, support, and extend.

  • Led most frontend implementation for AI Services across model APIs, deployments, notebooks, GPU resources, storage, agents, analytics, and billing.
  • Built the internal admin dashboard from scratch and made common support, QA, ops, and engineering workflows self-serve.
  • Created reusable frontend patterns for permissions, async states, configuration forms, validation, and recovery paths.
  • Improved the boundary between customer-facing workflows, internal operations, and API contracts.

Trade-offs

  • A more explicit state model adds upfront structure, but makes complex workflows easier to debug and extend.
  • UI-ready API contracts require backend alignment, but reduce duplicated frontend parsing.
  • Internal admin tools can move faster than customer-facing pages, but still need permission clarity, auditability, and safe failure handling.
  • Customer-facing workflows need more guardrails and explanation, while internal tools need speed, density, and recovery paths.

Closing

The core challenge was not rendering AI infrastructure data. It was making complex platform capabilities feel understandable, operable, and trustworthy.