The text below is an in-depth write up, for a summary of this paper in video format, please visit this link or watch below.
Predictive text is a feature so embedded in daily communication that we rarely pause to consider how it works—or what its successes and failures reveal about language, algorithms, and uncertainty. Whether we’re sending a quick text or composing an email, predictive systems offer suggestions that can speed us up, frustrate us, or quietly shift how we write. This essay explores how predictive text systems have evolved, how they function, how their accuracy and uncertainty are evaluated, and what broader lessons they offer about the nature of prediction in a world shaped by language and probability.
Early predictive text systems, like T9 input on flip phones, operated on purely rule-based algorithms. Though largely obsolete, the echoes of T9 still remain in the 3-4 small letters of the alphabet found beneath numbers on many numerical pads on both physical keypads and our smartphones. Though rudimentary, this worked well for the time: T9, short for ‘Text on 9 keys,’ mapped sequences of keypresses to possible words based on static frequency dictionaries. For example, pressing 4-6-6-3 could produce good, home, or gone, but it could also produce goof or IMOD (like the software). Good, home, and gone are all statistically more common than goof or IMOD, so this set would likely appear in the order: good, home, gone, goof, IMOD, where the user can use a d-pad or other button to navigate the list and select the one they meant to type. So yes, users could cycle through options, but the system lacked any contextual awareness—it had no way of knowing what you meant to say beyond brute-force frequency rankings. One might even argue that something like T9 is more correlative than predictive…
Today’s predictive systems, like those found in smartphone keyboards or tools like Gmail’s Smart Compose, rely on transformer-based models (one of the most well-known of these being GPT). Transformer-based models are a type of deep learning system that predict language by processing entire sequences of text, not just one word at a time, attending to relationships between words across full sentences. Unlike earlier predictive text models that ‘read’ left to right, transformers look at the entire body of text at once and determine which words are most important for understanding meaning. This allows for more nuanced and context-aware predictions—these can feel almost magical when common phrases like ‘see you [soon]’ or just letting [you know]’ are predicted from just the first couple of words. Even beyond just a simple phrase, something like Gmail’s Smart Compose can complete formal structures in emails and even suggest polite phrases like ‘please let me know if you have any questions’ at the end of an email. As Chen et al. explain, Smart Compose (an example of a more complex predictive text model) uses internal confidence thresholds to decide when to offer predictions, a quiet but significant acknowledgement that these systems are built to guess, not to know (Chen et al. 2019). The pinnacle of these transformer-based models is a predictive system like ChatGPT which, in its own words, relies on an
"attention mechanism, which allows the model to consider the entire context of a sentence or conversation at once [...] ChatGPT doesn’t simply guess the next word from a fixed list but dynamically weighs all prior input to generate a coherent and contextually relevant response [... ChatGPT] learns statistical patterns in language that guide these predictions." (OpenAI 2025)
This attention model (Vaswani et al. 2017) allows for predictions so uncannily accurate that ChatGPT appears to be conversing naturally—though uncanny, this accuracy can be measured.
Accuracy in predictive text is often measured using three metrics: perplexity, exact match rate, and real-world speed tests. Perplexity is a statistical measure of how well a model predicts a sample of text; lower perplexity implies more confident, fluent predictions; exact match rate evaluates how often the top prediction matches what the user actually types; and real-world speed tests—such as measuring words per minute—assess whether predictive text genuinely helps users write faster and more efficiently. Yet studies show mixed results. Kristensson and Müllners (2021) found that predictive keyboards often don’t improve speed, and may even slow users down due to cognitive load. Likewise, Quinn and Zhai (2016) observed that overly assertive predictions can disrupt user flow.

The figure featured above, taken from Kristensson and Müllners (2021) compares minimum word length on the x-axis with number of letters typed before looking at a prediction on the y-axis. The dotted line marks the divide between where predictive text increases “net entry rate” and where predictive text actually impedes the user. In terms of the heat map colors, the deepest red is where predictive text is most helpful while the blue shows where it is unhelpful. The most efficient operating area (shown by the blue point in the deep red) sits significantly above this line, meaning that the average user doesn’t actually benefit that much from predictive text.
I was curious to see if this held true in my own life, so I conducted a small-scale manual test using five common sentence starters like “I can’t wait to…” and “Let me know if…”. For each phrase, I recorded my iPhone’s three suggested completions and compared those to a sample of ten real sentences with those starters gathered from my own iMessage history. The table featured below presents the phrase I tested alongside the three predictive offerings from the iPhone’s predictive text boxes, the count total of the actual messages I examined (all 10), and what words actually followed the phrases with the frequency shown after each word:

While this test was extremely limited in scope and not necessarily statistically rigorous, it offers some modest insight into system behavior. The test revealed prediction accuracies ranging from 33% to 67% with an average of 53%. This average means that, generally, there is only a slightly better than random chance that one of the three words the iPhone suggests is the word you meant to type. In the times when the system was correct, it performed best on short, highly patterned phrases like “Let me know if…” where suggestions like “you” or “there’s” were accurate in two-thirds of cases. More open ended and ambiguous prompts like “It’s…” or “I was thinking…” were more difficult and predictions like “a” and “of” were only weakly related to actual user responses. Though rudimentary, this mirrored larger trends: predictive text tends to succeed on formulaic language but falters when intent or phrasing becomes idiosyncratic. This reflects both aleatory uncertainty (variation in human intent) and epistemic uncertainty (limits in training data and model understanding), at least in the small, limited context I was testing in.
My pseudo-findings affirm more robust and rigorous studies like those of Kristensson and Müllner (2021) and Quinn and Zhai (2016) above. Though both studies agree that predictive systems can reduce keystrokes, Quinn and Zhai in particular determined that their usefulness depends on how assertively suggested are presented to the user and whether they are trusted to be correct. This has the potential to create an interesting predictive feedback loop where users are more likely to use predictions they trust to be correct → high frequency, highly patterned phrases are easy to predict and for the system to suggest correctly → users may be more likely to opt for predictable phrases → increasing how statistically frequent those phrases are in each user’s typing corpus. A portion of this possible loop was documented by Arnold et al. (2020) who found that predictive keyboards that sped up typing also nudged users towards more generic and predictable language. This raises the question, especially when models are trained on massive datasets, whether correct predictions are correct because user intent is understood or because they’re statistically likely completions for a broad population of users.
Arnold et al. and Quin and Zhai’s research paint an interesting picture when taken together: the efficacy of a predictive system interacts with its visibility to the user and the more effective the system the more likely it is to influence the typing habits of the user. Situating this in real world contexts with systems like the iPhone’s predictive text or Gmail Smart Compose, it would seem that systems are often most useful when they’re least intrusive—when they work seamlessly and the suggestions feel natural. The iPhone only makes three suggestions which disappear quickly if you keep typing, and Gmail’s suggestions appear in a light gray and similarly don’t linger. There is also virtually no information given as to how these predictions are to be interpreted by the user, i.e., uncertainty in these systems is rarely made visible to users. Unlike weather forecasts, which communicate confidence intervals and probabilities, predictive text offers no indication of how sure it is (an opacity that could lead users to overtrust the system). But internally, developers do consider uncertainty: for example, Gmail’s Smart Compose suppresses low-confidence predictions entirely, a silent decision designed to protect the user experience (Chen et al. 2019).
It isn’t necessary to make uncertainty invisible. A study by Vasconcelos et al. (2023) explores a novel way to communicate uncertainty in AI autocomplete suggestions (specifically in the context of code editors, but the principles apply broadly). The authors note that AI-generated suggestions can be incorrect, and simply providing probabilities is not always helpful to users. They evaluate an interface that highlights the parts of a suggestion the model is least confident about. In a user study, highlighting tokens that were predicted to be edited (i.e. likely wrong or uncertain) led to faster task completion and more targeted user edits. By contrast, highlighting tokens purely based on low probability (the model’s internal confidence) did not yield benefits over no highlighting (Vasconcelos et al. 2023). Users preferred the intelligent highlighting of uncertain parts, finding it more informative and less overwhelming. This work shows how making uncertainty visible can improve user trust and efficiency with predictive systems, and it offers design guidance for conveying uncertainty (e.g. subtle highlights on low-confidence words) in autocomplete or predictive text interfaces.
The main source of uncertainty in predictive text is that language is inherently ambiguous and context-dependent. The model doesn’t know what you intend to say—it can only guess based on patterns it has seen in its training data. This leads to aleatory uncertainty (randomness in what users might type next) and epistemic uncertainty (gaps or biases in what the model has learned). Even with massive datasets, the system is still just making probabilistic predictions, not understanding meaning. This uncertainty is intriguing because it masquerades as epistemic and reducible—if you could train a model on a corpus of everything ever written in human history it would get very very very good at predictive text perhaps, but I believe it to be ultimately aleatory—it is human. The foundations of Chomskyan grammar examine language as a recursive mechanism: it is possible to produce infinite, novel sentences, and completely possible every day to produce a sentence no one has ever written before. This, when taken alongside the reality that a machine can’t read our minds (yet), means that the uncertainty can’t be truly reduced. Predictive text has transformed how we communicate. It is incredibly powerful but at its core, it is just a very sophisticated guesser, not a true thinker. The more we understand the uncertainties behind these systems, the smarter we can be about how we use them and in doing so, we shift from mere users to informed participants in a world increasingly shaped by predictive algorithms. When we understand that predictive text is not a prophet, but a mirror of past language, we gain agency not just in writing, but in how we relate to the predictive systems that now co-author our digital lives.
Read more (and see references below):
“Can a Machine Learn to Write for The New Yorker?” (2019) – John Seabrook, The New Yorker.
“On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” (2021) – Emily M. Bender, Timnit Gebru, et al.
“AI Is Ushering in a Textpocalypse” (2023) – Matthew Kirschenbaum, The Atlantic.
“The Great Language Flattening” (2025) – Victoria Turk, The Atlantic.
References
Arnold, K. C., Chauncey, K., & Gajos, K. Z. (2020). Predictive text encourages predictable writing. Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI '20), 128–138. https://www.eecs.harvard.edu/~kgajos/papers/2020/arnold20predictive.pdf.
Chen, M. X., Ma, M., Sun, S., Wang, Y., Chang, M.-W., & Zitnick, C. L. (2019). Gmail Smart Compose: Real-time assisted writing. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2287–2295. https://arxiv.org/abs/1906.00080.
Kristensson, Per Ola and Müllners, Thomas. 2021. Design and Analysis of Intelligent Text Entry Systems with Function Structure Models and Envelope Analysis. In CHI Conference on Human Factors in Computing Systems (CHI ’21), May 8–13, 2021, Yokohama, Japan. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3411764.3445566.
OpenAI. (2025, April 28). Explanation of transformer-based models and predictive text in ChatGPT [Large language model response]. ChatGPT. https://chat.openai.com/.
Quinn, P., & Zhai, S. (2016). A cost–benefit study of text entry suggestion interaction. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 83–88. https://doi.org/10.1145/2858036.2858305.
Vasconcelos, H., Sankaranarayanan, G., & Meade, B. (2023). Generation probabilities are not enough: Uncertainty highlighting in AI code completions. arXiv preprint. https://arxiv.org/pdf/2302.07248.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, 30. https://papers.nips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.