Optimizing SaaS Customer Lifetime Value (CLTV) requires continuous collection and monitoring of customer behavior data. Traditional data science pipelines lack scalability due to manual feature engineering and misalignment between transactional and analytical systems. This article introduces a Hybrid AI architecture to operationalize predictive modeling enterprise-wide. The proposed end-to-end architecture connects an enterprise data warehouse (Snowflake) and an advanced machine learning sandbox back to the source of truth using a recurring transfer tool and Delta Sharing.
By leveraging AutoML for model creation and a centralized Feature Store for accuracy, the system mitigates data leakage, concept drift, and scalability issues. The pipeline fully automates bi-weekly retraining and daily inference with guaranteed freshness. This framework has led to a 15% increase in upsell identification, a 92% reduction in manual modeling overhead, and processed over 500,000 daily records with sub-hour latency.
Introduction
The customer path is like a map that shows how people interact with us at times. It includes things, like when they buy something from us when we help them set it up when they start to see the benefits when they decide to buy more from us or renew what they have and even when they think about leaving us or actually do leave. Each of these stages leaves a different data trail of hidden signals that, if decoded early on, can forecast the health and future direction of the customer relationship. For example, if you have a user who has bought a product but not started their first project in the 30 days following activation, they are pretty likely to be a churn risk. On the other hand a customer at Value Realization that is logging into some advanced modules to manage their usage may be an ideal candidate for an upsell.
Figure 1: Customer Lifecycle Stages.
1.1 The Signal Extraction Challenge
We have a problem when it comes to finding these important signals. The problem is that it is really hard to do this on a scale. Now people are trying to solve this problem in their own way. They are using analysis, which is often done by a team of data scientists who work alone. This is causing a lot of issues. For example:
• Resource Scarcity: The Signal Extraction Challenge needs special models, for every stage and this has to be done on a large scale. This means we would need a number of data scientists, which is just not possible.
• Data Latency: When the behavioral data gets into the system it takes a time to get into the modeling environment. This happens because the systems do not work well together.
• Training-Serving Skew: The way we make features for the model when we are training it like when we use a Jupyter notebook is often different, from the way we do it when the model is actually being used. This difference affects how well the Data Latency and
Training-Serving Skew of the model works. The Data Latency and Training-Serving Skew problem is caused by the fact that the Data Latency and Training-Serving Skew systems are not integrated properly.
1.2 The Hybrid Framework Solution
To deal with these problems this paper talks about an MLOps framework that can handle a lot of work and uses Automated Machine Learning and Enterprise Data Engineering. The framework we are talking about is called the Hybrid Framework. It uses Snowflake to keep all the records safe. Databricks to do all the complicated calculations. The rest of the paper will go into details, about the Hybrid Framework.
2 System Architecture
The idea behind our system is to keep things by separating the Storage Layer, which we call the System of Record from the Processing Layer, which we call the Intelligence Engine. This way we can make sure our data is safe and we can do complicated calculations quickly. We do not want to slow down our system with database work
1. The System of Record is a Storage Layer that stores our data. It is the word on what our data is and it is designed to keep our data safe when we make changes to it. The System of Record is not meant for trying out things or playing around with our data. It is, for storing it safely.
2. The Intelligence Engine is, like the muscle that makes things work. It is a layer that handles machine learning and big data analytics. It also does real-time transformations. The Intelligence Engine can handle a lot of work when it needs to and it can also slow down when things are not so busy. This helps prevent the database from crashing when it gets many big requests.
3. The Bridge is a connection that helps move data from one place to another really fast. It is called the High-Frequency Transfer Utility. The Bridge moves the data from the vault to the Intelligence Engine so it can be used for analytics.
Figure 2: High-Level Hybrid Architecture.
2.1 The Data Transfer Utility
We use a tool to move the data around. This tool makes sure that the data is updated every hour/day. This helps us understand what is going on faster. We write the data to Databricks Delta tables.
2.2 Data Schema Design
We want our machine learning pipeline to be really good and work well so we make sure to follow rules when it comes to naming and organizing our data. This starts from where we get the data, which’s Snowflake and goes all the way to where we process and use the data, which is Databricks. By being very specific about how our data’s organized we can automate a lot of tasks and make it easier for our data engineers and machine learning teams to work together.
2.2.1 Snowflake Schema (Source)
Snowflake is the data source for the ML lifecycle, storing raw, pre-processed, and final output data. Tables are prefixed with customer_playbook_{project}.
| Naming Convention | Role in ML Lifecycle | Description |
| customer_playbook_{project}_train | Training Data | Comprehensive historical dataset (features, labels, metadata) for stable model training and validation. |
| customer_playbook_{project}_inference | Incremental Scoring Data | Recent, incremental data batches for real-time/near-real-time model scoring. |
| customer_playbook_{project}_output | Model Predictions | Final destination for model results (scores, identifiers, confidence) for downstream use and reporting. |
2.2.2 Databricks Bronze/Silver Schema (Target)
Snowflake data syncs to Databricks, where Spark performs feature engineering, transformation, and inference. Processed, standardized data is stored in the prod_silver_dts_data schema (our Lakehouse’s key layer), with tables optimized for Spark.
| Naming Convention | Role | Description |
| {project_name}_feature_store | Centralized Feature Repository | Core repository for all cleaned, transformed, and engineered features, ensuring consistent use for training and inference, preventing drift. |
| {project_name}_inferred_output | Inference Job Results (Pre Sync) | Temporary Databricks storage for inference output before validation and synchronization to the customer_playbook_output table in Snowflake. |
3 Feature Engineering Strategy: A Foundation for Reliable Predictive Modeling
Feature Engineering Strategy is very important for making predictions. We need to change the way we do things in Machine Learning Operations. Now we make features in a lot of different ways and it is not organized. We want to have a place where we keep all our features like a Feature Store. One big problem we have is that our features are not correct at the time. This is called the “Point-in-
Time” problem. We want our Machine Learning models to be good and work well when we use them. Feature Engineering Strategy and Feature Store are key, to making this happen.
3.1 Point-in-Time Correctness: Preventing Future Peeking
Point-in-Time correctness is really important for models that we can trust. When we make training data to predict something that will happen in the future we have to make sure that every piece of information we use is something we would have known at that time. To make sure, we have Point in-Time correctness we have to link every feature to a date, which is called the as_of_date timestamp. For example the days_since_last_login is figured out from the as_of_date, not the date on the system now. This means the features we have are a copy of what we would know when we make a prediction. The days_since_last_login is basically the days_since_last_login from the as_of_date. We do this so the features are what we would have at the time of the prediction.
3.2 Leakage Detection Protocols
All the features we think about must go through a check to make sure we do not have any data leakage before we put them in the Feature Store. This check is, like a quality control. It includes: 1. Timestamp Validation (PIT Check): we do a Timestamp Validation also known as a PIT Check we make sure that the timestamps for events that happen because of a feature are never later than the target labels as_of_date. This helps prevent leakage. For example we do not want to use a feature that comes from an event that happens after the target event. 2. Correlation Analysis (Suspicious Feature Check): We also do a Correlation Analysis, which is also known as a Feature Check. This helps us find features that’re very closely related to each other or have a lot of mutual information. Sometimes these features can actually be a substitute, for the label we are trying to find. For instance a feature that only exists after the target event is suspicious and needs to be looked into. Usually we have to exclude these features from our analysis.
3. Availability Checks (Historical Consistency Check): Availability Checks are really important. We do these checks to make sure a feature is available all the time for all the things we care about. This is done over the time we are training. We do not want some things to be missing from groups because of changes we made to the way we do things.
3.3 Normalization and Drift Management
To guarantee consistent and reliable behavior between the model’s training phase and its subsequent inference phase a critical aspect of production ML all numeric features undergo standardization. Specifically, we employ Z-score normalization:
z = x− µ/σ
The statistics (µ and σ) must be computed only on the training data and stored as immutable metadata in the Feature Store. During inference, these stored training-time µ and σ are applied to live data. This is crucial to prevent “training-serving skew,” to ensure the model receives data scaled against the exact distribution it was trained on, maintaining predictable production performance.
4 The Hybrid Workflow for Model Development and Deployment
The standard deviation help us make sure the model is working with the right kind of data. Our Methodology is about the Hybrid Workflow for Model Development and Deployment. We use a workflow that has a Human-in-the-Loop. This means our Hybrid Workflow combines the speed of Artificial Intelligence for making models and optimizing hyperparameters with the expertise of humans. Artificial Intelligence does the work of computing so data scientists can focus on defining problems refining features interpreting outputs and planning deployment strategies. This Hybrid Workflow model makes sure we can work fast and still have quality and fair governance.
4.1 Automated Model Generation (AutoML)
Automated Model Generation, which we call AutoML is part of this Hybrid Workflow. Our framework uses Databricks AutoML to create models for us. This means we do not have to choose and adjust the algorithms. It works fast and tries out many different models at the same time to find the best ones. The main models we use are:
• XGBoost: High-performance for structured data, used as a benchmark. • LightGBM: Optimized, distributed gradient boosting for speed with large datasets. • Logistic Regression: An interpretable baseline ensuring simplicity and stability.
Each model is evaluated against a single, pre-defined priority metric (e.g., metric_priority: “AUC_PR”). This strategic automation enables the rapid prototyping and deployment of highly optimized models, drastically reducing the weeks of dedicated data science labor typically required.
4.2 The Model Registry & Promotion Logic
The MLflow Model Registry serves as the single, authoritative source for all models, ensuring reliability and governance through a strict four-stage lifecycle:
1. Experimentation: AutoML generates “Candidate” models, automatically tracking all runs, parameters, and metrics.
2. Staging: The single best-performing Candidate model moves to “Staging” for final, pre production assessment.
3. Validation: Staging models must pass an automated validation job against a held-out test or shadow dataset, ensuring robust generalization.
4. Production: Promotion is conditional: the Staging model must outperform the current Production model on validation metrics, or be the first model for the task.
The system works properly because of the Databricks workflows that run on a schedule. These Databricks workflows are very reliable. They make sure that the predictive model is working well all
the time. The automated pipelines are what make the model work, in real life. Figure 2: Automated Training and Evaluation Pipeline.
5 Implementation & Orchestration
The system’s operational heartbeat is maintained through a highly reliable and scheduled series of Databricks workflows, designed to ensure both timely inference and continuous model performance improvement. These automated pipelines are the backbone of how the predictive model is deployed and managed in production.
5.1 Daily Inference Pipeline
The wf_{project}_inference_daily job generates daily predictions by processing only incremental data for efficiency. The process involves five steps:
1. Incremental Data Ingestion: The Transfer Utility fetches new/updated rows from Snowflake using timestamp columns and stages the data in Databricks.
2. Production Model Retrieval: The approved production model is loaded from the MLflow Model Registry (models:/marketing_{project}_model/Production) to ensure consistency. 3. Prediction Generation: The scoring engine uses the fresh data and production model to generate predictions (labels), probabilities (confidence scores), and model version metadata for traceability.
4. Results Publication: The scored results and metadata are synced back to designated tables in Snowflake.
5. Consumption: The structured data is immediately available to downstream business systems like Salesforce/CRMs to inform marketing and sales strategies.
5.2 Bi-Weekly Training Pipeline
To combat model drift and maintain high predictive accuracy, a dedicated biweekly training and promotion job, wf_{project}_training_biweekly, is executed every 14 days.
The four phases are:
1. Refresh: We do an update of all the old data, including the things we are looking at and the results we want to achieve so that our training information is complete and current. 2. Retrain: The main machine learning process uses a tool to work with the new data and teach many different models, such, as decision trees and neural networks and then pick the ones that work the best based on the standards we have set. We use machine learning to make this happen.
3. Evaluate & Promote: We need to evaluate and promote models. These new candidate models have to go through a lot of testing against the model we are currently using. We use things like AUC and precision and recall and stability to see how well they do. If a new model does a lot better than the one we have now and meets all the rules it will automatically be moved to the Staging area in the MLflow Model Registry. This means it is one step away, from being used in Production.
4. Notify: The system gives us a report on how the best model did what happened with the champion and challenger and if a new model was promoted. This report is sent to our project Slack channel so everyone can see it away.
5.3 Scheduling and SLAs
The whole prediction system works well because we follow the rules we set. We have rules called Service Level Agreements that make sure our data is up to date. These rules are important because they help our business team get the information they need on time which affects how well our sales and marketing teams do their jobs. The daily inference job usually runs before the business day starts like at 6:00 AM CST. This way decision-makers and automated systems have the up, to date information. We use monitoring tools to keep an eye on when the pipelines start and end. If something goes wrong or takes long these tools send out alerts right away. This helps us fix problems quickly and avoid affecting the business much.
Table 1: Operational SLAs and Schedule
| Job Type | Frequency | Time (UTC) | SLA |
| Inference | Daily | 3:00 | < 60 min |
| Training | Bi-Weekly | 2:00 | < 120 min |
| Stats/Metrics | Weekly | 4:00 | < 30 min |
| Data Sync | Hourly | – | < 15 min |
Table 2: Operational Performance Comparison
| Metric | Manual Process | Hybrid Framework |
| Model Dev Time | 3-4 Weeks | 2-3 Days |
| Data Latency | 24+ Hours | < 1 Hour |
| Feature Reuse | 0% (Siloed) | 100% (Shared) |
| Deployment Freq. | Quarterly | Bi-Weekly |
6 Operational Observability
In order to maintain stakeholder trust, we developed a robust operational observability layer. This system offers a real-time, 360-degree view of model health and performance, enabling proactive anomaly resolution. Immediate, pervasive notifications are integrated via Slack, our central communication hub.
6.1 Drift Detection
Our Model needs data to make good predictions. So we have to check for Drift all the time. We look at the statistics for the most important features when the Machine Learning model is being trained and when it is being used. This includes the mean, the deviation, and the skew of the Machine Learning data. The system for Machine Learning then uses a score to see how different the live data is from the data it was trained on. If the score for any of the features being watched goes above a level the system, for Machine Learning sends out a Drift Alert right away for the Machine Learning model. The MLOps team gets notified about problems with the system, such as when the pipeline fails or behaves strangely or when there are errors, with the instrumentation. This helps stop the predictions from getting worse without anyone noticing which keeps the quality of the decisions.
6.2 Alerting Mechanisms
The platform has a way to keep an eye on things. It uses a smart alert system to tell people what is going on. This system has levels and it sends real-time updates to special Slack channels so people can quickly figure out what to do.
• #ml-model-stats: Posts detailed, quantitative reports (AUC, Precision, Recall, F1-Score, confusion matrices) after every training/validation run for data scientists to track performance and stability.
• #ml-platform-alerts: Critical operations channel for system health incidents and failures (e.g., missed SLA latency, feature pipeline crashes, data sync errors, deployment failures), requiring immediate action from MLOps and engineering.
Successful training generates a comprehensive notification payload including: • Leaderboard Summary: Compares the top model’s performance to competing AutoML algorithms.
• Model Version/Artifact Hash: Unique ID of the new, validated, containerized production model (e.g., v3.1.5-prod).
• Quantitative Improvements: Clear metrics showing the new model’s gain over the production version (e.g., “+0.02 AUC vs previous version,” “25% reduction in False Positives”).
• Resource Utilization: Compute cost/optimization metadata from the training run.
7 Results and Business Impact
The strategic deployment of this hybrid, automated machine learning framework has delivered a transformative impact on how organizations leverage its customer data across the entire customer lifecycle, shifting the organization from reactive analysis to proactive, predictive engagement.
7.1 Operational Metrics
The shift to an automated, framework-driven MLOps paradigm has vastly improved efficiency and reduced operational load.
• Faster Deployment: Standardized MLOps pipelines cut model deployment time by over 80%, from weeks to a single day.
• Feature Reuse: The centralized Feature Store enables instant reuse of validated features, halving feature preparation time and eliminating redundant code.
• Fresher Models: Automated weekly retraining significantly reduces the average age of models in production.
7.2 Business Outcomes
We can see that it is making a big difference in important metrics. We tried it out in a test project that focused on the “Value Realization” phase, for opportunities and the results were great. The model did a job with Prediction Accuracy. It got a score of 0.89 which’s really good at telling things apart. The model also reduced positives by 22 percent compared to the old system. This helped the sales team a lot by letting them focus on the leads. As a result the team was able to convert leads into sales, which went up by 15 percent for some accounts. This meant money for the company. The model is also very stable. Can handle a lot of work. It looks at, over 500,000 customer records every day, which shows it can handle a lot of stuff without any problems.
8 Conclusion
Scaling SaaS companies have to deal with an amount of customer data that is growing really fast. This is a problem because it is too much for traditional analysis to handle. The Hybrid AI Framework is a solution to this problem because it helps us get the most value out of this data. We used Snowflake to store our data and Databricks AutoML to build models quickly. This combination worked well and we were able to build a system that is very efficient and can handle a lot of data.
Author: Shishir Tewari, Senior Manager, Data Engineering, Procore Technologies