Most B2B teams inherit an account scoring model they did not design. It was built by a previous ops person, it runs on outdated criteria, and nobody is quite sure why certain weights are set the way they are. When it produces a list, sales ignores it because the top-scored accounts are wrong-fit companies.
Building a scoring model from scratch is not as complicated as it sounds. The hardest part is not the math. It is getting alignment on what “good account” means across sales, marketing, and leadership before you start assigning weights.
This guide walks through every stage: defining the model architecture, sourcing the data, assigning initial weights, validating against your historical data, and running the model in production.
An account scoring model produces a numerical representation of how likely a given company is to become a customer, weighted by how valuable that customer would be. It combines two types of inputs:
Fit signals: Does this company look like the companies you have successfully sold to? Industry, company size, revenue range, tech stack, headcount in relevant departments, geography, business model.
Behavioral signals: Is this company showing active interest in your category or your specific solution? Website visits, content downloads, email engagement, intent data spikes, event attendance, product trial activity.
A good scoring model weights both. A company that fits your ICP perfectly but is completely dark on behavioral signals is a good cold outbound target. A company that is highly engaged but is a poor ICP fit is a bad outbound target and a potential churn risk if they close.
The goal of the model is to surface the accounts that are both high-fit and showing active interest, and to surface them before your competitors identify them.
Before assigning any weights, you need a precise definition of your ICP that can be operationalized in data fields. The ICP definition as a prose paragraph is useful for marketing positioning. For a scoring model, you need specific, matchable criteria.
Build the ICP data model by analyzing your best customers. Pull your last 12 to 24 months of closed-won deals. Filter for the customers with the highest ACV, lowest churn rate, and best net promoter scores. Those are your best customers.
For each best customer, capture: - Industry and sub-industry - Employee count at time of sale - Annual revenue at time of sale (or estimated) - Geography and primary market - Technology in the relevant categories (CRM, marketing automation, data platforms) - Headcount in marketing and sales departments specifically - Funding stage if applicable - Growth rate if available (headcount growth is a reasonable proxy)
Look for the attributes that cluster among your best customers. If 80% of your best customers are Series B to Series D B2B SaaS companies with 50 to 500 employees and a HubSpot or Salesforce implementation, those are your primary ICP dimensions.
Define negative ICP criteria too. Look at your closed-lost and churned accounts. What attributes were consistently present in deals that went badly? Certain industries, company sizes below a threshold, companies without a specific tech dependency, geographic markets you cannot serve well. These become negative scoring factors that pull accounts down in the model.
The fit score measures how closely a given account matches your ICP criteria.
Assign weights to each ICP dimension based on predictive importance. Not all criteria are equal. Some dimensions are strongly correlated with closed-won outcomes. Others are nice-to-have filters.
A common starting weighting structure:
| Dimension | Weight |
|---|---|
| Industry match (primary) | 25 points |
| Company size (employee range) | 20 points |
| Tech stack match (key integrations) | 20 points |
| Funding stage or revenue range | 15 points |
| Geographic match | 10 points |
| Department headcount (relevant function) | 10 points |
This adds to 100 points for perfect fit. Adjust the weights based on what you observe in your data. If tech stack is actually a stronger predictor of close rate than company size for your specific product, swap those weights.
Handle partial matches. ICP dimensions are rarely binary. An account may be in your target industry but at the edge of your preferred size range. Build partial credit into the model: full points for a direct match, half points for an adjacent match, zero for a miss.
Build the fit score in your CRM. Most CRM platforms support calculated fields or scoring rules. If your CRM cannot do this natively, run the scoring in a connected spreadsheet or a RevOps tool and push the score back to the account record via API.
The behavioral score measures how actively engaged an account is with your brand and category.
First-party behavioral signals (highest confidence, lowest cost):
Third-party intent signals (lower confidence, broader coverage):
Score decay. Behavioral signals go stale. A demo page visit from three months ago should carry less weight than one from last week. Build a decay function into your behavioral score: full weight for activity within the last 14 days, 50% weight for activity 14 to 30 days ago, 25% weight for activity 30 to 60 days ago, minimal weight beyond 60 days.
Most effective models combine fit and behavioral into a single composite score, with the weighting between the two components calibrated to your specific sales motion.
The weighting between fit and behavioral should reflect your go-to-market reality:
If you run primarily outbound, fit score should carry more weight. You are identifying accounts to go after, many of which have not yet engaged. A 70/30 split (70% fit, 30% behavioral) is reasonable for heavy outbound motions.
If you run primarily inbound or PLG, behavioral score should carry more weight. You have incoming signals to prioritize and your job is to identify which engaged accounts are worth sales investment. A 40/60 split (40% fit, 60% behavioral) makes more sense.
For a balanced inbound and outbound motion, start at 50/50 and adjust based on what you observe in the validation step.
Map the composite score to tiers. A composite score is only useful if it maps to an action:
Calibrate the tier thresholds based on how many accounts you can realistically work at each tier. If your Tier 1 threshold produces 500 accounts but your sales team can only work 30, the threshold is too low.
Before running the scoring model in production, validate it against your historical closed-won and closed-lost data.
Retrospective scoring. Take a sample of your historical accounts from the past 12 months, run them through the model, and check whether the model would have scored your closed-won accounts higher than your closed-lost accounts.
Run a basic analysis: for the top quartile of scores, what percentage of accounts in that quartile actually closed? For the bottom quartile, what percentage closed? If the model is predictive, the top quartile close rate should be significantly higher than the bottom quartile.
Check for false positives and false negatives. False positives are high-scored accounts that never closed. Look at the attributes these accounts share. Are they in an industry that looks like your ICP on paper but actually buys differently? Is there a tech stack component missing from the model? False positives indicate dimensions that are in the model but have low predictive value.
False negatives are low-scored accounts that actually closed and became good customers. Look at what they have in common. Are they missing a dimension that should be in the model? False negatives indicate that the model is missing a relevant fit criterion.
Adjust weights based on validation findings. The first version of any scoring model is a hypothesis. The validation step is where you start replacing hypothesis with evidence. Expect to make two or three rounds of weight adjustments before the model is reliably predictive.
Once the model is validated, run it in production with a defined operating cadence.
Automated scoring updates. Behavioral scores should update in near-real-time as new engagement data arrives. Fit scores update less frequently because firmographic data changes slowly. A weekly refresh of fit scores and a daily refresh of behavioral scores is a reasonable cadence for most teams.
Scoring triggers and alerts. Build CRM automations that alert the account owner when a Tier 2 or Tier 3 account crosses the threshold into a higher tier. The alert should include the specific signals that drove the score increase, so the rep knows what angle to take in outreach.
Prevent over-rotation on score. Score is a prioritization tool, not a qualification tool. A high score tells you to look at the account now. It does not tell you the deal is qualified. Reps should use the score to prioritize their research and outreach, not to skip qualification steps.
Review the model quarterly. The world your model was trained on changes. New competitors enter the market. Your product expands into new use cases. Your ICP evolves as you learn more about who actually succeeds with your product. Schedule a quarterly 90-minute model review: look at false positive and false negative rates from the previous quarter, check whether the top-tier accounts are producing pipeline, and update the model accordingly.
The scoring model becomes most powerful when it feeds your ABM program automatically.
Score as the mechanism for ABM tier assignment. Instead of manually maintaining a target account list, let the score determine which accounts belong in which ABM tier. Accounts crossing the Tier 1 threshold get automatically added to the active ABM list. Accounts dropping below the Tier 2 threshold over consecutive weeks get moved to the demand gen nurture pool.
Score as the trigger for personalization. If you have a website personalization layer, use account score as one of the inputs for the personalization decision. A high-scored account visiting your website should see the most targeted version of your messaging. A low-scored account should see the default experience.
Score as the input for content recommendations. Build content recommendation logic that surfaces different assets based on the account’s score and the specific behavioral signals driving that score. An account spiking on integration topics should be surfaced integration-focused content. An account spiking on pricing topics should be surfaced comparison or ROI-focused content.
Building the model without sales buy-in. A scoring model that marketing builds in isolation will be ignored by sales. Get sales to co-design the ICP criteria and the tier thresholds. When reps have contributed to the model, they are more likely to use it.
Using too many dimensions. A model with 15 dimensions and complex weights is difficult to explain and difficult to maintain. Start with five to seven dimensions. Add complexity only when the data supports it.
Not building in score decay. A model without decay produces a growing list of high-scored accounts based on stale engagement data. Accounts that were highly engaged six months ago but have gone dark should not still be at the top of the list.
Mistaking correlation for causation. If your best customers happen to be based in the western United States, putting geography in the model as a primary dimension may overfit to a historical pattern that does not reflect future opportunity. Validate every dimension against causal logic: does this criterion actually predict success, or is it just correlated with past success in a way that could be coincidental?