Account scoring is the math that decides which accounts your team should care about today. Most teams either do not have a model at all (and run on rep instinct) or have one so over-engineered that the data team is permanently held hostage to it. There is a third option: a defensible, transparent, weighted-average model you can stand up in two weeks without a dedicated data scientist. This is how to set it up.
Full disclosure: Abmatic AI ships an account-scoring layer on top of CRM data, so we have a financial interest in teams running serious scoring programs. The framework here is platform-agnostic; the same model can be built in Snowflake plus dbt plus reverse ETL, in HubSpot's native scoring tool, in 6sense, or in Abmatic. The principles do not change.
The 30-second answer
A working account-scoring model has two layers: a fit score (firmographic and technographic) that is stable, and an intent score (engagement and signal) that is dynamic. Combine them with explicit weights, expose every input to the team, refresh the intent score daily and the fit score quarterly, and tune the thresholds against actual close-rate data after 90 days. Skip the black-box ML model on day one; start simple, ship, and iterate.
See an account-scoring model running live on real CRM data, book a demo.
Why account scoring exists (and why most attempts fail)
Without a score, prioritization happens by rep instinct, recency, or whoever yells loudest in the pipeline meeting. Rep instinct is undervalued; it is also unscalable, untestable, and biased toward whatever the rep saw work last quarter. Scoring is the artifact that turns prioritization into a system.
Most account-scoring projects fail in one of three ways:
- Over-engineering on day one. The team assembles 40 features, an ensemble model, a custom training pipeline, and three months later the project is still pre-launch. Reps lose patience, the project gets shelved.
- Under-engineering forever. A flat lead score (form-fill plus webinar attend equals MQL) gets called "account scoring" and produces ranks that do not match what the team actually believes about the accounts.
- Black-box trust failure. The model produces scores reps cannot explain, reps stop trusting them, the field reverts to instinct, and the model becomes shelfware.
The version that works is opinionated about exactly two things: the model has to be transparent (every rep can read the score and explain it), and it has to be living (refreshes happen automatically without a data-team ticket).
The two-layer model
Almost every working account-scoring system in B2B is some version of fit times intent. Other compositions exist (additive, multiplicative, threshold-gated), but the conceptual split is universal: who they are versus what they are doing.
| Layer | What it measures | Refresh cadence | Who owns it |
| Fit score | Firmographic, technographic, and ICP attributes | Quarterly (daily for new accounts on entry) | Marketing operations |
| Intent score | Engagement and behavioral signal, first and third party | Daily | Marketing and rev ops jointly |
| Composite score | Combined ranking used for prioritization and routing | Daily (recomputed when either input changes) | Rev ops |
What goes into fit
- Industry and vertical
- Employee count and revenue band
- Geography (HQ region and operating regions)
- Tech stack signals (CRM, MAP, data warehouse, key adjacent tools)
- Funding stage and recency (for venture-backed ICPs)
- Public role openings (proxy for product or operating need)
- Public infrastructure or compliance signals where relevant
What goes into intent
- First-party site behavior (pricing-page visits, comparison-page visits, demo requests, content engagement)
- Third-party intent topics (Bombora, G2, public review activity)
- Trigger events (funding round, leadership change, M and A activity, product launch)
- Campaign engagement (ad clicks, webinar attends, event check-ins)
- Sales engagement (email opens, meetings booked, calls connected)
- Product-led signals where applicable (trial signups, freemium activity)
For the intent-side data foundations, see how to use intent data, first-party intent data, and predictive intent data.
The four-week build (without burning your data team)
This is the build that ships in four weeks with one analyst plus part-time RevOps support, not the version that takes six months and a data-engineering squad. The deliberately scoped version. You can always add sophistication later; you cannot easily undo bad early-stage credibility.
Week 1: Define the inputs and weights
Sit down with sales leadership, RevOps, and marketing for one workshop. Output: a list of 8 to 15 fit attributes and 5 to 10 intent attributes, each with a numeric weight that sums to 100 within the layer.
Do not skip the workshop. The whole credibility of the model depends on sales leadership having co-authored the weights. A score that the data team unilaterally assigned will be argued with forever; a score the field helped author will be defended.
Week 2: Source the data
Three categories of source data:
- Firmographic. Internal CRM, supplemented by an enrichment vendor (ZoomInfo, Apollo, Cognism, Clearbit-equivalent). See ZoomInfo alternatives and Cognism alternatives for vendor selection.
- Technographic. BuiltWith, HG Insights, Clearbit, internal stack-detection from email signatures or job-posting scraping.
- Intent and engagement. Site analytics (with visitor identification), MAP, ad platforms, third-party intent feeds.
The data is rarely as clean as the workshop assumed. Plan for one analyst-week of data cleaning. The most common gotchas: industry codes inconsistent between sources, employee counts off by an order of magnitude in long-tail accounts, and tech-stack signals stale by 12 to 24 months.
Week 3: Build the math
This is the part where the temptation to over-engineer is strongest. Resist it. The first version of the model is a weighted average. Each input scores 0 to 100, the layer score is the weighted sum, the composite is fit times intent or fit plus intent (pick one and document it).
Multiplicative composites (fit times intent) penalize low-fit-high-intent accounts heavily; additive composites treat fit and intent as substitutes. Multiplicative is closer to how most B2B sales actually work; additive is easier to explain to executives. Either is defensible.
The output of the layer is a bucket: A (90-100), B (75-89), C (50-74), D (25-49), E (under 25). Buckets are easier for reps to act on than raw scores; "this is an A account" is operational, "this is an 87-point account" is information.
Week 4: Wire to the operations
The score has to drive operating decisions or it is a dashboard. Three minimum operational integrations:
- CRM field. Every account record carries the fit bucket, intent bucket, and composite bucket as visible fields, updated automatically.
- Routing rules. A-bucket accounts route to senior AEs and to fast-lane SDR cadences. B and C buckets route to standard pods. D and E buckets route to nurture.
- Reporting. Pipeline, conversion, and velocity are all reported by score bucket. The output of week 4 is whether the model produces meaningful differentiation; if A accounts close at the same rate as C accounts, the model is wrong, and you need to investigate before shipping it widely.
The starter scoring rubric
A reference rubric. Tune the weights to your business; do not adopt verbatim.
Fit score (out of 100)
| Input | Weight | Scoring |
| Industry | 20 | Tier-1 industry: 100. Tier-2: 60. Tier-3: 30. Out of ICP: 0. |
| Employee count | 20 | Inside target band: 100. Adjacent band: 60. Outside: 20. |
| Revenue band | 15 | Inside target band: 100. Adjacent: 60. Outside: 20. |
| Geography | 10 | Tier-1 region: 100. Tier-2: 50. Tier-3: 25. |
| Tech stack adjacency | 15 | 3+ adjacent tools: 100. 1 to 2: 60. None: 0. |
| Funding or growth signal | 10 | Funded in last 18 months or 30 percent+ headcount growth: 100. Otherwise: 50. |
| Hiring signal | 10 | 2+ relevant role openings: 100. 1: 60. None: 30. |
Intent score (out of 100)
| Input | Weight | Scoring |
| Pricing-page visits (last 30 days) | 20 | 3+ visits from 2+ visitors: 100. 1 to 2 visits: 50. None: 0. |
| Comparison-page visits (last 30 days) | 15 | 2+ visits: 100. 1 visit: 50. None: 0. |
| Demo or contact form (last 30 days) | 20 | Submitted: 100. Viewed but not submitted: 30. Otherwise: 0. |
| Third-party intent surge | 15 | Surge on 2+ relevant topics: 100. 1 topic: 60. None: 0. |
| Trigger events (last 90 days) | 10 | 2+ relevant: 100. 1: 60. None: 30. |
| Sales engagement (last 30 days) | 10 | Meeting booked: 100. Email or call connect: 50. None: 0. |
| Content engagement | 10 | 3+ pieces consumed: 100. 1 to 2: 50. None: 0. |
Composite
Composite score = fit score * intent score / 100, mapped to A through E buckets at thresholds 90, 75, 50, 25.
This rubric will not be optimal for any specific business. The point is to start with explicit weights and adjust based on observed close-rate differential after 90 days, not to find the perfect rubric in week one.
Common scoring mistakes
Confusing lead score and account score
A lead score evaluates a person's behavior; an account score evaluates a company. The two are complementary, not interchangeable. ABM motions need account scores; high-velocity demand-gen motions need lead scores; most teams need both. See lead scoring for the breakdown.
Hiding the weights
Reps need to be able to look at the score, look at the inputs, and explain why the account is rated where it is. Black-box ML models, even sophisticated ones, fail this test. Use ML for diagnostics and feature suggestion, but keep the production model interpretable.
Refreshing too slowly
An intent score that refreshes weekly is not an intent score; it is a lagging fit score. Daily refresh is the floor. Real-time is better where the underlying systems support it.
Not tuning to actuals
The weights you guessed in the workshop are guesses. After 90 days, look at the actual close rate and pipeline conversion by bucket. If A accounts are not meaningfully better than B accounts, the weights are wrong. Tune them. Re-tune annually.
Treating the model as final
The first model is the v1. Plan a quarterly review. Plan an annual rebuild. Plan to add new signals as new sources become available. The teams that win at scoring are the teams that ship a v1 in four weeks and run twelve revisions over two years.
Scoring everything
Most teams score every account in the database, including the long tail of irrelevant ones. The model spends compute on the wrong population. Score only the addressable market. The qualified-out cohort gets a "not in ICP" flag, not a score.
Letting reps override silently
Reps should be able to override the score (they have ground truth the model does not). The override should be logged with a reason and reviewed monthly. If the override pattern reveals a missing signal, add the signal to the model. Silent overrides destroy the model's diagnostic value.
How to know the model is working
Three diagnostic metrics to watch in the first 90 days:
Close-rate differential by bucket
A-bucket close rate should be meaningfully higher than B, B higher than C, and so on. If the curve is flat, the model is not differentiating. If A and B are similar but both far above C, the model differentiates but the threshold between A and B is set too tight.
Velocity differential by bucket
A accounts should also move through the funnel faster. Median time-to-opportunity, time-to-close, and time-to-renewal should all be shorter in higher buckets.
Rep override rate
Track how often reps manually override the bucket. A 5 to 15 percent override rate is healthy (reps adding judgment); above 25 percent means the model is missing something the field knows. Investigate the override reasons; the patterns reveal the missing signals.
Where Abmatic fits in this
Abmatic AI builds the two-layer model directly in your CRM, with the fit and intent inputs configured during onboarding (typically two to three weeks), the daily refresh running automatically, and the bucketed score writing to a CRM field every rep can see. The transparent-rubric design is intentional; reps can pull up any account and see exactly why it is rated where it is. Where most teams either run shadow IT in a spreadsheet or hold the data team hostage on a black-box model, Abmatic ships the working v1 in weeks and the v2 and v3 as quarterly revisions on top.
Related reading: best ABM platforms 2026, how to build an ICP, marketing qualified account, identify in-market accounts.
FAQ
What is the difference between account scoring and lead scoring?
Lead scoring evaluates an individual person's behavior and fit. Account scoring evaluates a company's collective behavior and fit. ABM motions need account scoring as the primary signal because the buying decision is made by a committee, not a person.
Should we use machine learning or a rule-based model?
Start rule-based with explicit weights. Once you have 12 to 18 months of outcome data, ML can refine the weights or surface non-obvious feature interactions. Going ML-first usually fails because the team cannot explain the score, the field stops trusting it, and the project gets shelved before the data is rich enough for ML to help.
How often should the score refresh?
Fit score refreshes quarterly (and on entry for new accounts). Intent score refreshes daily. The composite refreshes whenever either input changes. Weekly refresh is acceptable for fit but borderline-late for intent.
How many inputs should the model have?
8 to 15 fit inputs and 5 to 10 intent inputs. More than 25 total starts to over-fit and becomes hard to explain; fewer than 10 total leaves too much signal on the table.
How do we tell if the model is working?
Close rate, deal velocity, and average deal size should all be meaningfully higher in higher buckets. If the curves are flat, the model is not differentiating, and the weights need to be tuned against the actual outcomes.
How do we handle accounts with no intent signal?
They get a high fit score and a low intent score, which produces a moderate composite. They go to nurture or to outbound discovery, not to the front of the queue. Once the intent signal arrives, they re-bucket automatically.
The takeaway
A working account-scoring model is two layers, eight to fifteen inputs per layer, transparent weights, daily refresh, quarterly tuning. It ships in four weeks with one analyst, not six months with a data-engineering squad. The hardest part is not the math; it is the discipline of keeping the model transparent, the cadence regular, and the weights tuned to actual outcomes.
If you want to see what a working two-layer scoring model looks like running live on your CRM data, with the daily refresh, bucket fields, and routing rules all wired up, book a 30-minute Abmatic AI demo. We will walk through the model on a slice of your accounts and show you the differential against your actual close-rate history.