From: Classification-based spoken text selection for LVCSR language modeling
Spoken language | Written language |
---|---|
(1) A sentence is incomplete or fragmented (missing a subject or a verb) [10, 34]. Connected phrases maybe found continuously [34]. | (1) A sentence is complete. |
(2) A sentence is less sophisticated: fewer subordinate clauses [34]. | (2) A sentence is more sophisticated: more subordinate clauses [34]. |
(3) A sentence starts with a topic-comment structure [34]. | (3) A sentence starts with a subject-predicate form [34]. |
(4) Repetition, word duplication or paraphrasing, often appears [35]. | (4) A sentence contains less repetition [35]. |
(5) A filler, a word or expression which is filled up when a speaker is in the process of thinking, often appears [35]. | (5) A filler does not appear [35]. |
(6) A final particle, e.g. /khâʔ/, /khráp/, /nî:aʔ/, and /c-â:ʔ/, often appears [35]. | (6) A sentence contains fewer final particles [35]. |
(7) Slang and foreign words are often used. | (7) Formal lexicon is used. |