Training and QA

Inter-Rater Reliability for Transcripts

Inter-rater reliability helps supervisors, instructors, and teams compare two independent transcripts of the same language sample. ConductSpeech reports agreement across boundaries, words, morpheme counts, codes, and an overall score.

This is useful for graduate training labs, onboarding, and clinical QA because it turns transcript differences into a concrete review conversation.

Compare Two Transcripts

Sample result

Reviewed

Boundary agreement

How closely raters split utterances

Word agreement

How closely transcript words match

Coding agreement

How consistently codes were applied

Boundary

Utterance agreement

Words

Word-level agreement

Morphemes

Count correlation

Codes

Coding agreement

Boundary

Utterance agreement

Words

Word-level agreement

Morphemes

Count correlation

Codes

Coding agreement

How it fits into a speech workflow

Collect

Start from a recording, transcript, or saved session.

Review

Check speaker turns and make clinical edits before relying on results.

Measure

See the language measures and notes that matter for this feature.

Use

Bring the output into reports, progress review, or research exports.

Compare the same sample

The reliability workflow is built for two transcripts of the same recording. ConductSpeech rejects different recordings so the agreement score reflects rater differences rather than different source material.

Useful for university programs

Instructors can have students transcribe and code the same sample, then compare agreement. The resulting scores make it easier to identify whether the class is struggling with boundaries, word accuracy, morpheme counts, or coding conventions.

Practical agreement metrics

ConductSpeech summarizes utterance boundary agreement, word-level agreement, morpheme-count correlation, coding agreement, and an overall score. Advanced terms can be explained lower on the page without making the hero copy intimidating.

What users see

Reliability report fields

A compact result view turns the feature into reviewable language, not a technical readout.

Boundary agreement

How closely raters split utterances

Word agreement

How closely transcript words match

Coding agreement

How consistently codes were applied

Clinical interpretation notes

Reliability scores are meaningful only when both transcripts come from the same sample.
The scores identify disagreement; a supervisor still decides which transcript is clinically correct.

SALT Transcript Editor

Edit AI-drafted transcripts with SALT-style codes, review suggestions, and re-analyze language sample metrics instantly.

SALT-Compatible Language Sample Analysis

AI language sample analysis with SALT-style coding, C-units, SI, mazes, grade norms, reliability, and clinical reports.

Clinical Language Sample Reports

Generate IEP-ready language sample reports with MLU, PGU, SI, C-units, maze summaries, norms, and fluency context.

SALT-compatible analysis methodology

Read how ConductSpeech documents conventions, validation, and limitations.

Ready to try it

Start with a real language sample.

Create an account, upload or review a sample, and see how this feature appears inside the ConductSpeech workflow.

Compare Two Transcripts

Inter-Rater Reliability for Transcripts

Collect

Review

Measure

Use

Compare the same sample

Useful for university programs

Practical agreement metrics

Reliability report fields

Clinical interpretation notes

Related pages

Start with a real language sample.