Most translation agencies track two things about their linguists: did the job arrive on time, and did the client complain? That is a reactive quality model. You only get signal after something goes wrong — and by then, you have already delivered substandard work to a client.
A proactive quality model measures performance across multiple dimensions before problems reach clients. This article describes the KPIs that generate real signal, how to collect them without creating administrative overhead, and how to use them for vendor decisions that hold up to scrutiny.
Why Most Agencies Measure the Wrong Things
"On-time delivery rate" is the most common linguist KPI in the translation industry. It is also nearly useless as a quality signal.
Delivery time measures your project management process as much as it measures the linguist. A linguist with unclear briefs, late source files, and no CAT tool support will miss deadlines. A linguist with the same capabilities in a well-managed environment will deliver on time. The variable you are measuring is highly contaminated.
Client complaint rate is worse. It measures client tolerance as much as linguist quality. A demanding direct enterprise client will generate complaints on work a mid-tier agency would accept. A volume buyer who never reads the translations closely will generate zero complaints on work with systematic quality problems. You are measuring client management as much as translation output.
Neither KPI helps you answer the question you actually need to answer: which linguists produce better output, and by how much?
The Five KPIs That Generate Real Signal
1. Edit Distance from MT Baseline
If your workflow includes machine translation post-editing (MTPE), you can measure how much a linguist changes the MT output per segment. This gives you two data points:
- Net edit distance — percentage of characters changed relative to the MT output. High edit distance means more thorough post-editing; very low edit distance is a warning sign for linguists who may be underediting.
- Edit consistency across sessions — does the same linguist apply the same level of effort across similar content types? High variance is a signal worth investigating.
This KPI requires MTPE workflow data, but even partial data (for linguists doing MTPE work) is useful for calibration.
2. TM Leverage Utilization
When a linguist has access to a populated translation memory, what percentage of fuzzy matches do they confirm unmodified versus edit? The TM leverage rate tells you:
- Whether the linguist is reading fuzzy matches carefully before confirming (good) or rubber-stamping them without review (bad)
- Whether the linguist is improving TM quality by correcting legacy errors (very good)
You should see a linguist confirming fewer high-quality fuzzy matches unmodified as they get deeper into a project — because early segments with strong TM matches should get progressively more tailored to the current text's style and context. A linguist who confirms 90%+ of 75–84% fuzzy matches unmodified throughout a project is not reviewing carefully.
3. Revision Acceptance Rate
When a translation is revised by a reviewer (LQA step), how much of the reviewer's suggested changes does the original translator accept when given the opportunity to respond? In a proper revision workflow:
- Linguist delivers
- Reviewer marks corrections
- Linguist reviews corrections and either accepts, rejects with justification, or opens discussion
The acceptance rate — and the quality of rejection justifications — is one of the richest quality signals available. A linguist who rejects 40% of corrections with well-reasoned arguments is engaged and competent. A linguist who accepts 100% of corrections without comment may be disengaged or unable to distinguish valid corrections from reviewer preferences.
This KPI requires a documented revision step in your workflow, which most agencies running translation quality assurance should have anyway.
4. Segment-Level Consistency Score
Within a single project, how consistently does a linguist translate repeated segments and near-identical source strings? Inconsistency here is:
- A signal of inattention in a monolingual linguist
- A signal of possible terminology confusion
- A direct quality problem — readers notice when the same term is translated three different ways in the same document
Your CAT tool should flag internal repetitions. Tracking how often a linguist introduces inconsistency on segments that were consistent in the source gives you a project-level score that aggregates to a per-linguist trend.
KanCAT's features track this within the translation editor and surface consistency warnings before delivery — giving the linguist an opportunity to self-correct before the job reaches the revision stage.
5. LQA Error Rate by Category
If you run a formal LQA (linguistic quality assurance) step, score errors by category:
- Accuracy — mistranslation, omission, addition
- Fluency — unnatural language, grammatical errors, punctuation
- Terminology — deviation from client glossary or style guide
- Formatting — tag errors, whitespace, number formatting
Track error rates per category per linguist, normalized by word count. A linguist who is strong on accuracy but weak on terminology adherence needs a different kind of support than one with fluency issues. The category breakdown tells you where to invest in feedback.
The industry standard scoring model here is MQM (Multidimensional Quality Metrics), which provides a structured taxonomy. For smaller agencies, a simplified 3-tier severity model (critical / major / minor) applied to the four categories above is sufficient.
Collecting Data Without Drowning Your PMs
The objection to structured performance measurement is always: "Our PMs don't have time to do all this." The answer is that most of these KPIs should be generated automatically by your tools, not manually by your PMs.
What your CAT tool should generate automatically:
- Segment word counts and match category breakdown
- TM leverage statistics
- Internal consistency reports
- Tag error reports
What your QA workflow should generate:
- LQA error scores by category
- Revision acceptance data (if your revision workflow captures this)
What your project management system should aggregate:
- Per-linguist delivery timeliness (relative to assigned deadline, not a global standard)
- Per-linguist project history with linked quality scores
The PM's role is to review aggregated scores, not to manually collect data. If your tools require manual PM data entry to generate quality reports, the workflow is broken at the tool level.
Using KPIs for Vendor Decisions
Once you have 3–6 months of data across multiple linguists, the data supports three types of decisions:
Rate negotiation. A linguist with consistently strong LQA scores across a high-volume, complex content type has demonstrated value that justifies a rate increase. A rate negotiation backed by data — "your average accuracy error rate is 0.8 per 1,000 words versus the portfolio median of 1.6" — is a professional conversation. Without data, rate increases are arbitrary.
Tier assignment. Some clients have higher quality requirements than others. Assign your highest-performing linguists (by LQA score and consistency data) to clients where errors have the highest cost — legal, medical, financial content. Reserve newer or lower-scoring linguists for lower-stakes content types while they build track record.
Development conversations. A linguist whose accuracy scores are strong but whose terminology scores are weak needs client-specific glossary training, not a general quality warning. Category-level data enables targeted development rather than vague feedback.
The Data Privacy Obligation
Collecting performance data on contractors creates obligations. In the EU and UK, freelance translators are data subjects under GDPR. Performance scores linked to their identity are personal data.
You need to:
- Inform linguists that performance data is collected and how it is used (inform in your contractor agreement or data processing notice)
- Retain data only as long as the business purpose requires
- Provide access to their own performance data if requested
None of this prevents you from running a quality program — it just requires transparency. The agencies most resistant to transparency on this are often the ones most likely to use performance data in ways that don't hold up to scrutiny. KanCAT's privacy and audit trail design keeps the data retention and access control compliant by default.
Building Toward a Vendor Performance Dashboard
The goal is a single view per linguist that shows:
- Languages and content types (from their profile)
- Projects completed (volume, weighted word count)
- Average LQA score over time (trending chart)
- Top error categories (bar chart)
- TM leverage utilization rate
- Delivery timeliness rate
- Active rate card (per language pair)
With this view, a PM assigning a new project can make a data-informed decision in thirty seconds rather than relying on memory or gut feel. That is the real productivity gain from structured translation quality assurance — not the compliance benefit, not the client reporting benefit, but the decision speed benefit inside your own team.
KanCAT's vendor management features support per-linguist performance tracking integrated with the project Kanban board and invoicing system. The Team Pro tier unlocks the full audit trail and vendor performance analytics.
The translation industry has the raw material for excellent performance measurement — every project generates word counts, match rates, revision cycles, and delivery records. The gap is in aggregation and presentation: most agencies have the data scattered across CAT tool exports, email threads, and spreadsheets, but no single view that connects it.
Start with one KPI you can collect cleanly from your existing tools. LQA error rate by category, if you have a revision step, is usually the highest-signal starting point. Build from there. The agencies that make the shift from reactive to proactive quality management consistently report fewer client escalations, better retention of high-quality linguists, and more defensible vendor decisions.