Account scoring is the math that decides which accounts your team should care about today. Most teams either do not have a model at all (and run on rep instinct) or have one so over-engineered that the data team is permanently held hostage to it. There is a third option: a defensible, transparent, weighted-average model you can stand up in two weeks without a dedicated data scientist. This is how to set it up.
Full disclosure: Abmatic AI ships an account-scoring layer on top of CRM data, so we have a financial interest in teams running serious scoring programs. The framework here is platform-agnostic; the same model can be built in Snowflake plus dbt plus reverse ETL, in HubSpot's native scoring tool, in 6sense, or in Abmatic. The principles do not change.
A working account-scoring model has two layers: a fit score (firmographic and technographic) that is stable, and an intent score (engagement and signal) that is dynamic. Combine them with explicit weights, expose every input to the team, refresh the intent score daily and the fit score quarterly, and tune the thresholds against actual close-rate data after 90 days. Skip the black-box ML model on day one; start simple, ship, and iterate.
See an account-scoring model running live on real CRM data, book a demo.
Without a score, prioritization happens by rep instinct, recency, or whoever yells loudest in the pipeline meeting. Rep instinct is undervalued; it is also unscalable, untestable, and biased toward whatever the rep saw work last quarter. Scoring is the artifact that turns prioritization into a system.
Most account-scoring projects fail in one of three ways:
The version that works is opinionated about exactly two things: the model has to be transparent (every rep can read the score and explain it), and it has to be living (refreshes happen automatically without a data-team ticket).
Almost every working account-scoring system in B2B is some version of fit times intent. Other compositions exist (additive, multiplicative, threshold-gated), but the conceptual split is universal: who they are versus what they are doing.
| Layer | What it measures | Refresh cadence | Who owns it |
|---|---|---|---|
| Fit score | Firmographic, technographic, and ICP attributes | Quarterly (daily for new accounts on entry) | Marketing operations |
| Intent score | Engagement and behavioral signal, first and third party | Daily | Marketing and rev ops jointly |
| Composite score | Combined ranking used for prioritization and routing | Daily (recomputed when either input changes) | Rev ops |
For the intent-side data foundations, see how to use intent data, first-party intent data, and predictive intent data.
This is the build that ships in four weeks with one analyst plus part-time RevOps support, not the version that takes six months and a data-engineering squad. The deliberately scoped version. You can always add sophistication later; you cannot easily undo bad early-stage credibility.
Sit down with sales leadership, RevOps, and marketing for one workshop. Output: a list of 8 to 15 fit attributes and 5 to 10 intent attributes, each with a numeric weight that sums to 100 within the layer.
Do not skip the workshop. The whole credibility of the model depends on sales leadership having co-authored the weights. A score that the data team unilaterally assigned will be argued with forever; a score the field helped author will be defended.
Three categories of source data:
The data is rarely as clean as the workshop assumed. Plan for one analyst-week of data cleaning. The most common gotchas: industry codes inconsistent between sources, employee counts off by an order of magnitude in long-tail accounts, and tech-stack signals stale by 12 to 24 months.
This is the part where the temptation to over-engineer is strongest. Resist it. The first version of the model is a weighted average. Each input scores 0 to 100, the layer score is the weighted sum, the composite is fit times intent or fit plus intent (pick one and document it).
Multiplicative composites (fit times intent) penalize low-fit-high-intent accounts heavily; additive composites treat fit and intent as substitutes. Multiplicative is closer to how most B2B sales actually work; additive is easier to explain to executives. Either is defensible.
The output of the layer is a bucket: A (90-100), B (75-89), C (50-74), D (25-49), E (under 25). Buckets are easier for reps to act on than raw scores; "this is an A account" is operational, "this is an 87-point account" is information.
The score has to drive operating decisions or it is a dashboard. Three minimum operational integrations:
A reference rubric. Tune the weights to your business; do not adopt verbatim.
| Input | Weight | Scoring |
|---|---|---|
| Industry | 20 | Tier-1 industry: 100. Tier-2: 60. Tier-3: 30. Out of ICP: 0. |
| Employee count | 20 | Inside target band: 100. Adjacent band: 60. Outside: 20. |
| Revenue band | 15 | Inside target band: 100. Adjacent: 60. Outside: 20. |
| Geography | 10 | Tier-1 region: 100. Tier-2: 50. Tier-3: 25. |
| Tech stack adjacency | 15 | 3+ adjacent tools: 100. 1 to 2: 60. None: 0. |
| Funding or growth signal | 10 | Funded in last 18 months or 30 percent+ headcount growth: 100. Otherwise: 50. |
| Hiring signal | 10 | 2+ relevant role openings: 100. 1: 60. None: 30. |
| Input | Weight | Scoring |
|---|---|---|
| Pricing-page visits (last 30 days) | 20 | 3+ visits from 2+ visitors: 100. 1 to 2 visits: 50. None: 0. |
| Comparison-page visits (last 30 days) | 15 | 2+ visits: 100. 1 visit: 50. None: 0. |
| Demo or contact form (last 30 days) | 20 | Submitted: 100. Viewed but not submitted: 30. Otherwise: 0. |
| Third-party intent surge | 15 | Surge on 2+ relevant topics: 100. 1 topic: 60. None: 0. |
| Trigger events (last 90 days) | 10 | 2+ relevant: 100. 1: 60. None: 30. |
| Sales engagement (last 30 days) | 10 | Meeting booked: 100. Email or call connect: 50. None: 0. |
| Content engagement | 10 | 3+ pieces consumed: 100. 1 to 2: 50. None: 0. |
Composite score = fit score * intent score / 100, mapped to A through E buckets at thresholds 90, 75, 50, 25.
This rubric will not be optimal for any specific business. The point is to start with explicit weights and adjust based on observed close-rate differential after 90 days, not to find the perfect rubric in week one.
A lead score evaluates a person's behavior; an account score evaluates a company. The two are complementary, not interchangeable. ABM motions need account scores; high-velocity demand-gen motions need lead scores; most teams need both. See lead scoring for the breakdown.
Reps need to be able to look at the score, look at the inputs, and explain why the account is rated where it is. Black-box ML models, even sophisticated ones, fail this test. Use ML for diagnostics and feature suggestion, but keep the production model interpretable.
An intent score that refreshes weekly is not an intent score; it is a lagging fit score. Daily refresh is the floor. Real-time is better where the underlying systems support it.
The weights you guessed in the workshop are guesses. After 90 days, look at the actual close rate and pipeline conversion by bucket. If A accounts are not meaningfully better than B accounts, the weights are wrong. Tune them. Re-tune annually.
The first model is the v1. Plan a quarterly review. Plan an annual rebuild. Plan to add new signals as new sources become available. The teams that win at scoring are the teams that ship a v1 in four weeks and run twelve revisions over two years.
Most teams score every account in the database, including the long tail of irrelevant ones. The model spends compute on the wrong population. Score only the addressable market. The qualified-out cohort gets a "not in ICP" flag, not a score.
Reps should be able to override the score (they have ground truth the model does not). The override should be logged with a reason and reviewed monthly. If the override pattern reveals a missing signal, add the signal to the model. Silent overrides destroy the model's diagnostic value.
Three diagnostic metrics to watch in the first 90 days:
A-bucket close rate should be meaningfully higher than B, B higher than C, and so on. If the curve is flat, the model is not differentiating. If A and B are similar but both far above C, the model differentiates but the threshold between A and B is set too tight.
A accounts should also move through the funnel faster. Median time-to-opportunity, time-to-close, and time-to-renewal should all be shorter in higher buckets.
Track how often reps manually override the bucket. A 5 to 15 percent override rate is healthy (reps adding judgment); above 25 percent means the model is missing something the field knows. Investigate the override reasons; the patterns reveal the missing signals.
Abmatic AI builds the two-layer model directly in your CRM, with the fit and intent inputs configured during onboarding (typically two to three weeks), the daily refresh running automatically, and the bucketed score writing to a CRM field every rep can see. The transparent-rubric design is intentional; reps can pull up any account and see exactly why it is rated where it is. Where most teams either run shadow IT in a spreadsheet or hold the data team hostage on a black-box model, Abmatic ships the working v1 in weeks and the v2 and v3 as quarterly revisions on top.
Related reading: best ABM platforms 2026, how to build an ICP, marketing qualified account, identify in-market accounts.
Lead scoring evaluates an individual person's behavior and fit. Account scoring evaluates a company's collective behavior and fit. ABM motions need account scoring as the primary signal because the buying decision is made by a committee, not a person.
Start rule-based with explicit weights. Once you have 12 to 18 months of outcome data, ML can refine the weights or surface non-obvious feature interactions. Going ML-first usually fails because the team cannot explain the score, the field stops trusting it, and the project gets shelved before the data is rich enough for ML to help.
Fit score refreshes quarterly (and on entry for new accounts). Intent score refreshes daily. The composite refreshes whenever either input changes. Weekly refresh is acceptable for fit but borderline-late for intent.
8 to 15 fit inputs and 5 to 10 intent inputs. More than 25 total starts to over-fit and becomes hard to explain; fewer than 10 total leaves too much signal on the table.
Close rate, deal velocity, and average deal size should all be meaningfully higher in higher buckets. If the curves are flat, the model is not differentiating, and the weights need to be tuned against the actual outcomes.
They get a high fit score and a low intent score, which produces a moderate composite. They go to nurture or to outbound discovery, not to the front of the queue. Once the intent signal arrives, they re-bucket automatically.
A working account-scoring model is two layers, eight to fifteen inputs per layer, transparent weights, daily refresh, quarterly tuning. It ships in four weeks with one analyst, not six months with a data-engineering squad. The hardest part is not the math; it is the discipline of keeping the model transparent, the cadence regular, and the weights tuned to actual outcomes.
If you want to see what a working two-layer scoring model looks like running live on your CRM data, with the daily refresh, bucket fields, and routing rules all wired up, book a 30-minute Abmatic AI demo. We will walk through the model on a slice of your accounts and show you the differential against your actual close-rate history.