The Modern AI Stack Explained: Your 2025 Guide

Published on 2025年1月24日

Key takeaways

  • The boom in AI spending requires a modern AI stack to ensure reliability, repeatability, and scalability. Organizations must devote investments to AI infrastructures to support growing AI expectations.

  • Modern AI stacks are layered to strategically deploy compute, data, model, and observability capabilities. When building an AI infrastructure with discrete platforms and applications, interoperability and integration are key requirements.

  • A data catalog serves as an important centralized hub for modern AI stacks, easing the discovery and use of high-quality, governed data and accelerating time-to-value for AI initiatives.

  • AI maturation requires a methodical approach and realistic expectations. A modern AI stack provides the foundation for increasingly complex and advanced AI deployments.

  • Scalability, automation, metrics, and collaboration form the cornerstone of a successful AI infrastructure. 

Introduction

AI spending is booming. Goldman Sachs says global AI capital expenditures will eclipse $1 trillion within the next few years. That includes spending for software applications, services, hardware, and more by organizations and technology companies combined.

However, running AI applications, especially generative AI, is a compute-intensive, data-demanding endeavor. While cloud offerings are available, many organizations choose to build their own AI stacks to manage data, storage, and networking, plus the accompanying governance, maintenance, development, and other requirements.\

In the race to trillion-dollar AI spending, this AI stack portion is expected to account for 10% of the total: Expect to see $100 billion in global AI infrastructure spending by 2028, according to IDC. So, what is all that money buying? Everything from processing power and storage to security and talent.

Let’s dive a bit deeper into the modern AI stack and what the world should expect for $100 billion.

Defining the modern AI stack

Like most technology stacks, the modern AI stack takes a layered approach in supporting the surface of AI applications. The essential components of AI infrastructure include the following:

Layer 1: Compute and models

This layer forms the base foundation for an effective AI infrastructure. AI requires massive amounts of processing power to manage, train, deploy, and execute complex large language models (LLMs) and other AI-based models. It’s critical this layer also be scalable and flexible.

Today, we’re seeing a combination of graphics processing units (GPUs), tensor processing units (TPUs), and neural processing units (NPUs) that can be clustered and optimized for faster training, deep learning, natural language processing, computer vision, and more. Cloud-based vendors are currently popular for compute and model needs.

Solution providers in this layer include Amazon Web Services for compute power, Anthropic and OpenAI for foundational models, and Modular for training.

Layer 2: Data storage

Data is crucial to AI for training and execution, and more advanced AI needs more and more data. With data being so valuable to AI applications, many content providers are clamping down on “data harvesting,” which forces organizations to collect and manage ever-increasing in-house or outsourced data volumes. The challenge is in deploying high-performance data storage infrastructures to meet the data speed, volume, and diversity needs AI requires, plus the related cleansing, pre-processing, integrations, and more. 

Concepts like data lakes and data lakehouses are increasingly popular in this layer of the AI stack. Leading organizations are also leveraging data catalogs to ensure AI uses trusted, traceable, high-quality data.

Application vendors in this layer include Databricks for data lakehouse solutions, Gable for pre-processing, and Nomic for ETL

Layer 3: Model deployment, governance, and orchestration

This AI stack layer enables developers to provide users with access to AI innovations. It includes the tools to implement AI solutions, platforms for agentic AI and purpose-built AI agents, solutions for AI governance, and routing of AI models to AI applications. This layer of the AI stack also includes security and privacy tools and data catalogs to control and govern AI data access.

Solution providers in this layer include Alation for data catalog and governance, Vellum for prompt management, and Orkes for orchestration.

Layer 4: Insights, observability, and optimization

While AI consumes huge amounts of data, it’s also important to gather data on the AI itself to evaluate and improve AI-driven outcomes. This layer provides insights into LLM and application behaviors, performance and speed, threats and security concerns, the costs of using AI applications and related data, and more. Even with the immense potential benefits of AI looming ahead, it’s always a best practice to quantify the return on AI investments, and this layer captures and surfaces those and other insights.

Application vendors in this layer include Humanloop for AI evaluations and AgentOps for agentic AI testing.

Across these AI stack layers, it’s prudent to evaluate the advantages and disadvantages of open-source versus proprietary applications in building out AI infrastructure. Since each layer comprises distinct and sometimes overlapping technologies and applications, interoperability and ease of integration are valuable for connecting data, LLMs, tools, and insights.

A best practice is to consider a robust data catalog to act as a modern AI stack’s central hub. A good data catalog will accelerate AI time-to-value by streamlining data discoverability, improving data quality and team collaboration, and enhancing trust and governance across the entire AI stack.

AI infrastructure maturity and expectations

Deploying AI at the pace and scale expected by today’s organizations takes time and resources and requires a strategic, methodical approach to drive acceptable returns on these considerable investments. Best practices call for a maturity model to evaluate needs, track progress, benchmark against peers, and plan for the future. 

For a modern AI stack, maturity flows from initial experimentation to deploying transformative AI applications at scale across the organization. Of course, it doesn’t happen overnight. Organizations must set realistic expectations to avoid falling into the hype trap and, instead, focus on measurable outcomes and advancing AI infrastructure maturity.

Here is a typical maturity model for a modern AI stack:

Level 1: Ad hoc — Organizations just beginning an AI journey typically see projects pop up as siloed experiments driven by individuals or teams. They build infrastructures as needed and without coordination, and which tend to require considerable manual effort. Little centralization, governance, or orchestration happens at this initial stage of maturity, leading to the key challenges of limited scalability, expensive and unrepeatable deployments, and inconsistent outcomes.

Level 2: Repeatable — As organizations recognize the value of AI solutions, leaders take notice and work to organize efforts. Developers begin to standardize toolsets and define processes, improve version controls and deployment methods, and strive for consistency and repeatability. Rudimentary AI and data governance capabilities are also put in place at this stage. Challenges here include continued siloing of efforts and insufficient data governance that compromises confidence in AI outcomes.

Level 3: Scalable - With more repeatable results, organizations move to a strategically planned and managed effort to scale AI benefits into more areas. Deploying centralized AI platforms and data catalogs provides a solid foundation for modern AI stack layers as detailed above. A comprehensive data catalog is implemented to improve and mature data discovery, lineage, and governance. Initial AI monitoring and observability tools are also deployed to improve ROI and find opportunities for new AI applications. Challenges include managing the increasing complexity of the AI stack and maintaining data security at scale.

Level 4: Optimized — AI stack observability and monitoring tools prove their worth in this stage with teams optimizing AI applications for performance and cost-effectiveness, all to drive increased ROI. Automations are included and, ironically, AI is relied on more and more to improve AI outcomes. Tools like data catalogs and AI platforms are highly integrated to automate processes and deliver comprehensive insights. New challenges arise as organizations work to keep up with constantly advancing AI technologies and complexity, and concerns about AI ethics, biases, and responsibility become more important.

Level 5: Transformative — AI is now deeply embedded in enterprise-wide business processes to drive decisions, enable human-AI collaboration, and intelligently automate end-to-end processes. A data catalog becomes critical to data management from discovery through metadata storage as AI delivers new levels of speed and previously unforeseen growth opportunities. Challenges reach new levels of strategic importance as organizations at this stage balance the societal impacts of AI with the transformative benefits now within reach.

Best practices and tips for AI infrastructures

As organizations begin to develop a modern AI stack suitable for their unique business, it is helpful to consider some best practices along the way. Some AI infrastructure tips include:

  • Stay flexible. AI is constantly changing. Scalability and flexibility are important for AI maturation, so ensure tools and AI infrastructure selections enable future options instead of tying decisions to a single vendor or approach.

  • Be structured. Documentation, processes, version control, and other governance considerations will ensure repeatability and continuous improvement.

  • Embrace automation. AI initiatives add more repetitive tasks to developers and deployment teams, and humans are incapable of parsing the huge volumes of data required for AI success. Automation is a proven way to streamline and offload processes that can slow AI maturity.

  • Rely on metrics. Observability and monitoring, and the related insights, are crucial for improving AI outcomes and ROI. Be sure individual stack components like the data catalog have built-in metrics and measurement capabilities.

  • Work together. Collaboration between data scientists, engineers, and business stakeholders will accelerate progress along the AI maturity model. Important tools like the data catalog facilitate communication and knowledge sharing around data assets and AI effectiveness.

With these and other best practices, along with the maturity model above, organizations can build a strategic plan for creating a modern AI stack.

Using Alation as a key component of the modern AI stack

Every organization has the opportunity to become AI-enabled, and AI proficiency is a competitive imperative in the modern economy. Those who delay AI initiatives will fall behind. Those who move to advance AI maturity by building a modern AI stack will leap ahead of the pack.

A robust data catalog is a crucial component of the modern AI infrastructure. Data catalogs provide a platform to ensure trusted, traceable, high-quality data is readily available to feed the insatiable hunger of AI models and applications. Data catalogs also offer advanced capabilities to control and govern data access that increases trust and facilitates responsible AI practices.

Alation provides a proven component for any AI stack, delivering governed, accurate data organizations rely on to accelerate AI model development. 

Schedule an Alation demo to learn how it helps companies like Discover Financial Services build better AI models, create a thriving data culture, and improve AI literacy across the organization. 

    Contents
  • Key takeaways
  • Introduction
  • Defining the modern AI stack
  • AI infrastructure maturity and expectations
  • Best practices and tips for AI infrastructures
  • Using Alation as a key component of the modern AI stack
Tagged with