Big Data in Business: The Lifecycle of an AI Project

Photo by Philippe D. on Unsplash

*Before reading this piece, make sure to check out the previous parts of my “Big Data in Business” series: 

When I teach my AI and business workshop at Stanford, one of the most surprising and insightful topics is the lifecycle of an AI project. Many assume it’s just another software project (far from it!), and they are completely caught off guard when they realize the difference in pace, costs, and data emphasis. 

An AI project is a big deal. It marks the high point of an organization’s transition into an AI-driven business, so it should come as no surprise when it impacts all aspects of the company. AI projects are multidisciplinary and often massive undertakings, requiring a competent development team and effective business leaders. 

While an AI project never truly “ends” due to the need for adjustments and refinements along the way, there is a point when the product can be launched. This “deployment” stage doesn’t coincide with the end of the project, but rather with the model readiness/maturity stage. This final product can help an organization dramatically increase its value and position within its industry. 

With that said, it’s not an easy task arriving at that end product. If you want to minimize any risks and set your business up for the best chance of success, it’s essential to understand each phase of an AI project’s lifecycle. 

It can be broken down into two stages, each having two phases. 

  • Pilot Stage: Includes the scoping and research phases. 

  • Implementation Stage: Includes the development and deployment phases. 

Let’s break down each one of these stages and phases, what they mean, and how they lead to a successful AI project. 

Stage 1: Pilot Stage

Scoping Phase

The pilot stage of an AI project starts with the scoping phase, which is a crucial step before developing an AI solution. The main goal of the scoping phase is to understand the project needs and constraints, which ensures a higher chance of success for any machine learning and AI algorithms or models. Before any solution can be implemented, leadership and the AI development team must understand the business requirements, define the project goals, and devise work packages for the data scientists and engineers. 

The first focus of the scoping phase should be the business goal or understanding of what the organization wants to achieve with AI. To ensure success, the project needs to understand a few key points: 

  • The exact goal of the business and why money is being allocated toward it.

  • How the company can ensure the project delivers what is expected.

  • The project’s relationship to strategic and tactical business goals.

  • The expected timeline.

  • The amount of money and resources the company is willing to spend. 

These questions help ensure the project stays on track throughout its entire lifecycle. 

Research Phase

The second phase of the pilot stage is the research phase, which is largely focused on data acquisition and exploration. (We can’t perform data analysis with no data!) But it can’t just be any data; a company must gather data from reliable sources, especially since real-life data can be full of human errors like spelling mistakes or wrong labeling. 

Data are everywhere we look. They can be collected from a wide range of sources like databases, web pages, devices like cameras or sensors, public surveys and records, and much more. 

All of these data can be broken down into two main categories: Primary and Secondary. 

  • Primary Data: Primary data are collected directly from the data source, often from data collected for a research project. These data are usually reliable and authentic.

  • Secondary Data: Secondary data are data that have been collected by someone else and made available to others. These data are usually shared publicly, meaning they’re easier to access by both researchers and individuals. That said, secondary data are not as reliable as primary data, and they are usually general and not very specific. 

When it comes to data exploration, it is the first step of data analysis and helps visualize data and derive insights or identify patterns. Data exploration involves operations like data cleaning, which can help find missing values, remove useless data, and perform basic statistical analysis. 

During data exploration, data analysts rely on data visualization and statistical techniques to describe dataset characterizations like size, quantity, and accuracy. These techniques can include both manual analysis and automated data exploration tools that identify relationships between data variables, the dataset structure, outliers, and more. 

Manual data exploration methods can involve writing scripts to analyze raw data or manually filtering data into spreadsheets. On the other hand, automated data exploration tools and software enable data scientists to monitor data sources easily and perform data exploration on large datasets. 

There are many effective tools for exploratory data analysis (EDA), which is the process of “exploring” and understanding data by creating graphs and charts, exploring the distribution of each variable, cleaning data, and identifying outliers or anomalies. 

Some of the most popular EDA tools include: 

  • Polymer Search: EDA tool that enables you to generate insights from data and create interactive databases with AI. 

  • Pandas Profiling: An open-source Python module that allows you to perform EDA and present information on a web-based interactive report. 

  • Data Prep: Python tool that cleans data, prepares data, and performs EDA. You can use this tool to create interactive graphs and distribution charts. 

  • Trifacta: EDA tool with interactive user interface that helps prepare and explore datasets on a cloud data warehouse or cloud data lakehouse. 

  • Other EDA Tools: Some of the other top EDA tools include Rattle (R Package), KNIME, Excel, Rapidminer, and IBM Cognos Analytics.

>>> To learn more about data’s role in your organization’s transformation, make sure to check out part 1 of this series, “Data Requirements for AI-Driven Businesses.” 

Stage 2: Implementation Stage

Development Phase

The third phase of an AI project’s lifecycle, and the beginning of the second stage, is the development phase. It is in this phase that your business will carry out model development. 

But before we dive into the development phase…what exactly is an AI model? And what is AI modeling? 

An AI model is an ensemble of data and algorithms that allows the execution of a cognitive task. Another way to think of it is that a model is a representation of knowledge (data+algorithm) that allows the performing of a cognitive task. 

AI models are the foundation for developing advanced intelligence methodologies like real-time data analysis, predictive analysis, and augmented analysis. These models help automate an organization's logical inference and decision-making processes, improve data analysis, and scale with increasing amounts of data. 

The development phase of an AI project involves the creation of these AI models, which rely on various types of algorithms like linear or logistic regressions to identify patterns and draw conclusions. Once an organization has created its AI model, the next step is to train it, which usually involves processing large amounts of data through the AI model in iterative test loops. The results are then checked for accuracy to ensure the model operates correctly. Training an AI model is hands-on, with engineers modifying and improving the model as it learns. 

AI modeling has two main approaches: 

  • Learning-Based Approach: This approach is based on machine learning experience with the fed datasets, and there are a few different types of learning-based methods. With supervised learning, the training data are labeled before being used to train and test the model. The model then learns the relationship between the input and output data, which enables it to classify new and unseen datasets and predict outcomes. Another method is unsupervised learning, a more hands-off approach involving the model processing huge amounts of data without human oversight. The vast majority of data involved with unsupervised learning are unlabeled, raw data. The last primary method is reinforcement learning, which involves the model interacting with the environment and undergoing a punishment/reward system. This process results in continuous interactions and learning, with the model improving itself. 

  • Rule-Based Approach: A rule-based system relies on rules to represent knowledge. The rules are coded into the system, with the main goal being to capture the knowledge of a human expert in a specialized domain and then embed it within the AI model. These types of AI models are solely based on predetermined rules, unlike those with a learning-based approach.

It’s important to note that not all AI models work right away, and model failure is common. Iteration is fundamental to model development, enabling stakeholders to build and test as they go along. The iterative process of the AI lifecycle is very apparent in this phase, with models being tested and altered until the right one is found for the organization. The accuracy of the model progressively increases before plateauing, at which point the model learning becomes nominal. Once the model reaches its optimal level, it is ready for deployment. 

Deployment Phase

The final phase of an AI project’s lifecycle is the deployment phase. (Starting to get real!) 

Deployment refers to your business integrating the AI model into the existing production environment. This process is crucial. The model must be effectively deployed into production if you want all of the hard work to validate itself. If you cannot extract insights from the model, it is useless. 

For example, the implementation of an AI solution that predicts energy consumption would gather related data, process and analyze them, then send a prediction to a web portal or app, which companies could then view.

Model deployment requires strong coordination between data scientists, software developers, IT teams, and business professionals within a company to ensure the model is reliable. 

The lifecycle of an AI project must be viewed as a continuous process. It does not just end. Just because your AI model is now deployed into the real world and bringing back insights doesn't mean the process is over. 

It is critical that your company monitors, reviews, and ensures that the solution continues to deliver the desired outcomes. There will almost certainly be adjustments needed along the way, and a lot of it will depend on trial and error or customer and staff feedback. The model will also require new data to stay up-to-date. 

And if the business starts to change its needs or operations…you guessed it! The model needs to be updated accordingly. For example, a retail store might add new printing services, providing a brand new aspect of the business that has yet to be implemented with the AI solution. 

Whether the model needs to be updated due to business needs or performance monitoring, you should go back to square one to understand the required changes. You can then identify new and updated data, train the current model to create the next model, and deploy the updated model to start extracting new insights. 

Far More Than a Software Project

It’s vital to recognize the complexity (and simplicity) of each one of the phases of an AI project’s lifecycle. It will help you determine where and if you need to bring in outside help. Many successful AI projects can be owed to an effective artificial intelligence development team, which will help your organization reach new levels. 

Besides an effective team, the second biggest factor leading to a project’s success is a deep level of understanding. You set your company up with the best chance for success with a clear understanding of the different stages and phases of an AI project. When you recognize that an AI project entails far more than your typical software project, the organization’s operations are better prepared and planned, leading to the effective implementation of AI solutions. 

>>> Make sure to look out for the next installment of this series on business applications across industries!

>>> Follow MVYL partner Giancarlo Mori’s blog here.

>>> Follow MVYL on Twitter, LinkedIn, and Instagram for AI-related content.

 

Giancarlo Mori