Predictive Intent Data: Build ML Models for Early Detection
Predictive intent data tells you who's about to buy, typically 30-90 days before reactive intent signals appear. Predictive models identify early signals in your historical customer data, then apply those patterns to current accounts.
This guide covers building supervised ML models, training data requirements, feature engineering, model accuracy, and retraining cadence. See: Pipeline Velocity Optimization and Intent Data Tools Comparison.
What Is Predictive Intent?
Predictive intent is a machine learning model trained on your historical customer data to predict which accounts are likely to buy in the next 30-90 days based on signals observed today.
Reactive vs. Predictive:
Reactive intent: "Company X is actively researching solutions" (they're buying now) Predictive intent: "Company Y has early signals matching companies that bought 60 days later" (they're about to buy)
Predictive intent questions answered: - Which accounts are likely to be in-market in the next 30 days? (early warning) - Which of our customers are likely to expand in the next 90 days? (expansion prediction) - Which accounts are showing churn risk in the next 6 months? (retention warning) - Which target accounts should we prioritize this quarter? (propensity scoring)
Supervised vs. Unsupervised Learning for Intent
Two approaches to building predictive intent models: supervised and unsupervised.
Supervised Learning (Most Common)
How it works: 1. Start with historical customer data: companies that bought (positive examples) and companies that didn't (negative examples) 2. Extract features from their behavior 60-90 days before purchase (or non-purchase) 3. Train a model to identify patterns that predict purchase 4. Apply the model to current accounts to predict future buying
Example features: - Website engagement: pages visited, time on site, return visits - Email engagement: opens, clicks, download rates - Content downloaded: topics, types - Company characteristics: size, growth rate, funding, hiring - Third-party signals: intent data, job changes, analyst coverage - Technographics: tools used, tech stack maturity
Data required: - Minimum: 50 closed-won deals (positive) + 200 non-customers (negative) - Better: 200 closed-won + 1,000 non-customers - Best: 500 closed-won + 5,000 non-customers
Accuracy: - Typical supervised model: 70-85% accuracy at predicting who will buy - With tuning: 80-90% accuracy - Real-world caveat: model accuracy in testing often 10-15% higher than production (because training data is not perfectly representative)
Training process: 1. Split data: 70% for training, 30% for testing 2. Train model on 70% (e.g., random forest, gradient boosting, neural network) 3. Test on held-out 30% to measure accuracy 4. If accuracy is 75%+, deploy to production 5. If lower, add more features or data, retrain
Pros: - Directly optimizes for your business (what actually converts for you) - Leverages your proprietary data (competitive advantage) - Typically higher accuracy than unsupervised
Cons: - Requires historical customer data (may take 6-12 months to accumulate enough) - Requires technical expertise (data science skills) - Prone to bias if training data is biased
Unsupervised Learning
How it works: 1. Start with all accounts (customers + prospects) 2. Find patterns in how similar accounts behave without labeling "bought" vs. "didn't" 3. Cluster accounts into groups based on behavioral similarity 4. Identify which clusters contain your highest-value customers 5. Score all accounts on how similar they are to your best-customer clusters
Example use cases: - Lookalike modeling: "Find accounts similar to our best customers" - Segment discovery: "What are the natural account clusters in our market?" - Cohort analysis: "Which behaviors correlate with retention?"
Data required: - Minimum: 500 accounts with behavioral data (no purchase labels needed) - Better: 2,000+ accounts - Easier than supervised because you don't need to label who bought
Accuracy: - Unsupervised models typically 60-75% accuracy - Less accurate than supervised because not directly optimized for your outcome - But faster to deploy (no historical purchase data needed)
Pros: - Faster to implement (no 6-month data accumulation) - No historical data labeling needed - Can discover unexpected patterns (what makes customers similar)
Cons: - Less accurate than supervised (not optimized for your specific outcome) - May identify irrelevant patterns - Requires more interpretation
---Building a Supervised Predictive Intent Model: 8-Week Process
Week 1-2: Data Preparation
Step 1: Gather historical data Export from your CRM: - All customers who purchased in the last 2 years (closed-won deals) - Deal amount, industry, company size, close date - Contact names, titles, company names - Sales cycle length (first touch to close)
Export prospects who didn't convert: - Accounts contacted in past 2 years that didn't buy (closed-lost or abandoned) - Why they didn't convert (lost to competitor? budget? no fit?)
Step 2: Extract behavioral features
For each customer and non-customer, look back 60-90 days before purchase/contact:
From your marketing automation platform: - Email opens (count, rate) - Email clicks (count, rate) - Unsubscribes, bounces - Content downloads (count, topic, type)
From your website analytics: - Sessions (count, duration, frequency) - Pages visited (especially pricing, product pages) - Time on site - New vs. returning visitor
From your CRM: - Sales activities (calls, meetings booked) - Contact engagement (responses, meeting attendance) - Company size, industry, geography
From third-party data: - Intent score (Bombora, 6sense, G2) - Job changes (LinkedIn API) - Funding activity - Hiring growth
Create a spreadsheet (or CSV) where each row is an account and each column is a feature:
| Company | Founded | Employees | Industry | Intent Score | Email Opens | Page Views | Downloaded | Purchased (Y/N) |
|---|---|---|---|---|---|---|---|---|
| Company A | 2018 | 450 | SaaS | 65 | 12 | 34 | 3 | Y |
| Company B | 2015 | 200 | Tech | 35 | 2 | 5 | 0 | N |
| Company C | 2019 | 800 | SaaS | 78 | 18 | 56 | 5 | Y |
Step 3: Data cleaning - Remove duplicates - Handle missing data (fill or remove rows with missing critical fields) - Remove outliers (accounts with unusually high or low values) - Normalize numerical features (so large numbers don't dominate small numbers)
Week 2-3: Model Training
Step 1: Split data - 70% training set (300 customers + 1,200 non-customers) - 30% test set (130 customers + 520 non-customers)
Step 2: Select algorithm Common supervised learning algorithms for intent prediction: - Random Forest: Usually best balance of accuracy and interpretability - Gradient Boosting (XGBoost, LightGBM): Highest accuracy, more complex - Logistic Regression: Simple, fast, easier to interpret - Neural Networks: Best for very large datasets (10,000+)
For most B2B intent prediction: start with Random Forest or XGBoost.
Step 3: Train model Use Python libraries (scikit-learn, XGBoost) or no-code tools (HubSpot's predictive lead scoring, Salesforce Einstein).
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, max_depth=10)
model.fit(X_train, y_train) # X_train = features, y_train = purchased Y/N
# Evaluate on test set
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy}") # Should be 75%+
Step 4: Feature importance Identify which features matter most for prediction:
| Feature | Importance |
|---|---|
| Intent Score (Bombora) | 0.25 |
| Email Opens | 0.18 |
| Page Views | 0.16 |
| Company Size | 0.12 |
| Job Changes | 0.10 |
| Downloaded Content | 0.09 |
| Industry | 0.07 |
| Funding | 0.03 |
This tells you: Intent score is 25% of the prediction. Email engagement is 18%. Company size is 12%. Etc.
Week 4: Validation and Tuning
Step 1: Test on held-out data Run model on 30% test set. Measure: - Accuracy: % of accounts correctly classified - Precision: % of predicted buys that actually bought - Recall: % of actual buys that model identified - F1 score: balance of precision and recall
Goal: 80%+ accuracy, 75%+ precision, 70%+ recall
Step 2: Confusion matrix Understand prediction errors:
| Actual | Predicted Buy | Predicted No Buy |
|---|---|---|
| Actual Buy | 117 (true positive) | 13 (false negative) |
| Actual No Buy | 58 (false positive) | 462 (true negative) |
- True positives (117): Correctly identified buyers - good
- False negatives (13): Missed buyers - cost of under-prediction
- False positives (58): Predicted buy but didn't - wasted sales effort
- True negatives (462): Correctly identified non-buyers - good
For ABM: you want high true positive rate (find buyers) and low false positive rate (don't waste sales time). If false positive rate is high (>20%), adjust threshold.
Step 3: Adjust threshold By default, model predicts "buy" if probability > 50%. You can adjust: - Raise to 70%: fewer predictions, higher confidence (fewer false positives, more false negatives) - Lower to 30%: more predictions, lower confidence (more false positives, fewer false negatives)
For aggressive ABM: use 50% (balanced). For conservative sales: use 70% (only high-confidence).
Week 5: Production Deployment
Step 1: Integrate with CRM - Export model as API or SQL query - Score all current accounts - Create field "Predicted Intent Score" in CRM - Rank accounts by score
Step 2: Scoring schedule - Initial score: all historical accounts - Ongoing: score new accounts as they enter CRM - Rescore existing accounts monthly (as their behavior changes)
Step 3: Set alert threshold Define action trigger: "If account score > 75, escalate to sales within 24 hours" "If account score increases 20+ points in a month, flag as accelerating"
Week 6-8: Monitor and Retrain
Step 1: Track predictions vs. outcomes For accounts predicted to buy: - Did they actually buy? - How long did it take? - What was deal size?
Build measurement dashboard: - Accounts scored 80+: 60% conversion to opportunity - Accounts scored 60-79: 35% conversion - Accounts scored 40-59: 15% conversion
Step 2: Identify drift If week 6 predictions have 15% lower accuracy than week 1, model is drifting. Causes: - Your customer changed (targeting different segment) - Market changed (buying signals different) - Training data is stale
Response: retrain model with recent data.
Step 3: Plan retraining Retrain model every 3-6 months: - Add new customers and non-customers from past 3 months - Re-evaluate feature importance (what predicts buying now?) - Adjust model parameters if accuracy declined
Skip the manual work
Abmatic AI runs targets, sequences, ads, meetings, and attribution autonomously. One platform replaces 9 tools.
See the demo โModel Decay and Retraining
Predictive models decay over time. Why?
- Distribution shift: Your customer base changes, or your market changes, so historical patterns don't apply
- Feature decay: If you stop collecting certain signals (e.g., email engagement drops company-wide), that feature becomes less predictive
- Seasonality: Buying patterns may differ by season (budget reset in Q1, hiring freeze in Q4)
- Product changes: If your product changes, customer fit signals change
Monitoring for decay:
Track these metrics weekly: - Model accuracy: is it still 80%+? - Precision: are predictions still reliable? - Coverage: what % of accounts get scored? - Lift: do scored accounts convert 3x better than random?
If accuracy drops below 75%, model is decaying.
Retraining frequency: - Fast-moving market (VC-backed startups): retrain monthly - Stable market (enterprise software): retrain quarterly - Mature market: retrain semi-annually
Common Predictive Intent Mistakes
Mistake 1: Insufficient training data With only 20 customers, your model will be unreliable. Wait until you have 100+ customers before building.
Mistake 2: Using wrong features If your training data is missing critical signals (e.g., no intent data), model will be less accurate. Collect comprehensive data.
Mistake 3: Overfitting Model achieves 95% accuracy on training data but 60% on new data. It learned training data exactly, not general patterns. Use cross-validation to prevent.
Mistake 4: Not validating with sales Model says Company X is likely to buy. Sales says "we've been talking to them for 6 months, they're not serious." Validate model predictions with human judgment.
Mistake 5: No retraining schedule Model accuracy decays. If you train once and never retrain, after 6 months you're making decisions on stale patterns. Schedule retraining.
Mistake 6: Treating prediction as certainty "Model says 80% probability of buying" means "based on historical patterns, 8 out of 10 similar accounts bought." It doesn't guarantee this account will. Use as input to decision, not the decision itself.
---Practical Implementation: When to Build vs. Buy
Build your own model if: - You have 200+ customer purchase records - You have data science resources - You need to optimize for your specific segment - You want proprietary competitive advantage
Buy from vendor if: - You're early-stage (< 50 customers) - You lack data science expertise - You want out-of-box, no maintenance - You want to combine multiple data sources
Vendors providing predictive intent: - 6sense: Predictive account scoring - Demandbase: AI-assisted account identification - HubSpot: Predictive lead scoring (built-in) - Salesforce Einstein: Revenue intelligence, propensity scoring - Custom: Build with your data eng team
Bringing It Together
Predictive intent identifies accounts showing early warning signs of buying intent. Combined with reactive intent (what's happening now), predictive intent creates a complete picture: "This account shows signals of buying in the next 30 days."
Start simple: random forest model trained on your best 100 customers and 500 non-customers. Accuracy will be 75-80%. Measure results for 4-8 weeks. If conversion lift is 2-3x, expand to all accounts. If not, add better features or more training data.
By month 6, you'll have a tuned model that's accurate and reliable. By month 12, you've retrained twice and your accuracy is 85%+.
Predictive intent is a lever for early-stage account identification. Use it to reach out 60 days before reactive intent appears. That lead time compounds with faster sales cycles. See: Pipeline Velocity Optimization.
That lead time is competitive advantage.





