ConductSpeech

Training and QA

Inter-Rater Reliability for Transcripts

Inter-rater reliability helps supervisors, instructors, and teams compare two independent transcripts of the same language sample. ConductSpeech reports agreement across boundaries, words, morpheme counts, codes, and an overall score.

This is useful for graduate training labs, onboarding, and clinical QA because it turns transcript differences into a concrete review conversation.

Sample result

Inter-Rater Reliability for Transcripts

Reviewed

Boundary agreement

How closely raters split utterances

Word agreement

How closely transcript words match

Coding agreement

How consistently codes were applied

Boundary

Utterance agreement

Words

Word-level agreement

Morphemes

Count correlation

Codes

Coding agreement

Boundary

Utterance agreement

Words

Word-level agreement

Morphemes

Count correlation

Codes

Coding agreement

How it fits into a speech workflow

1

Collect

Start from a recording, transcript, or saved session.

2

Review

Check speaker turns and make clinical edits before relying on results.

3

Measure

See the language measures and notes that matter for this feature.

4

Use

Bring the output into reports, progress review, or research exports.

Compare the same sample

The reliability workflow is built for two transcripts of the same recording. ConductSpeech rejects different recordings so the agreement score reflects rater differences rather than different source material.

Useful for university programs

Instructors can have students transcribe and code the same sample, then compare agreement. The resulting scores make it easier to identify whether the class is struggling with boundaries, word accuracy, morpheme counts, or coding conventions.

Practical agreement metrics

ConductSpeech summarizes utterance boundary agreement, word-level agreement, morpheme-count correlation, coding agreement, and an overall score. Advanced terms can be explained lower on the page without making the hero copy intimidating.

What users see

Reliability report fields

A compact result view turns the feature into reviewable language, not a technical readout.

Boundary agreement

How closely raters split utterances

Word agreement

How closely transcript words match

Coding agreement

How consistently codes were applied

Clinical interpretation notes

  • Reliability scores are meaningful only when both transcripts come from the same sample.
  • The scores identify disagreement; a supervisor still decides which transcript is clinically correct.

Related pages

Ready to try it

Start with a real language sample.

Create an account, upload or review a sample, and see how this feature appears inside the ConductSpeech workflow.

Compare Two Transcripts