← Home
Artificial Intelligence

Five things we need to fix with Gen AI

Five things we need to fix with Gen AI

AI is now embedded in the workflows of millions of professionals, myself included, and to a lesser extent. But as with any revolutionary technology, the initial "wow" factor is starting to give way to some very real "what now?" moments. It's time to talk about the quirks and frustrations that are holding Gen AI back from its true potential. Here are five things based on my experience that I believe need a major overhaul, right now.

1. The "You're Holding It Wrong" Mentality Needs to Go

Do you remember the time with release of an early model Apple iPhone where after criticism of its poor phone reception Steve Jobs infamously said: "You're holding it wrong". The latest release of GPT-5 from OpenAI is a similar case in point. With variable results from many people, OpenAI responded with a voluminous prompt guide on "how to hold it right": cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide

While that guide is useful to those of us who are keen, I believe the whole "expert prompter" trend is getting a little out of hand. The idea that we need to become experts, carefully crafting the perfect incantation to get a useful response, is a step backward. While prompt engineering is currently a crucial skill for getting the most out of LLMs, the burden of communication should be on the AI, not the user. Some systems do a better job of this with Mixture of Expert systems (like Perplexity) trying to reduce the user complexity while allowing for some level of sophistication.

But the underlying issue remains: the user is expected to adapt to the machine's eccentricities, not the other way around. That should change - right now.

2. Let the Right Tool do the Right Job, Automatically

We've all seen that GenAI can write a birthday greeting, or a lame joke about a stray cat, but simply cannot reliably do simple arithmetic (it got date calculations wrong for me very recently).

The underlying reason is that LLMs (which are the main approaches used for GenAI) are token predictors, not calculators. They're great with words, but they try and do numerical calculations in a weird combination of manipulating tokens. We need AI that is smart enough to automatically call on the right tool for the job. When a query requires numerical precision, it should seamlessly hand the task over to a computational tool without the user even needing to know. This would also apply to fact-checking, where the AI should cross-reference information with reliable databases to ensure accuracy especially where it's clear that precision is required (legal precedents, financial or company report analysis).

3. Stop with the "May Be Inaccurate" Backside Covering

GenAI can be confidently wrong, a phenomenon known internationally as "hallucination" but to us Australians better described as "bullshitting". While this can occasionally be amusing, it's a significant risk, especially in professional contexts as a major consulting firm in Australia was recently very publicly reminded about. There's a critical need for these systems to be more transparent about the uncertainty of their responses. We need clear, upfront indicators of confidence levels, so users can make informed decisions about how much to trust the information they're given. Backside covering of this imprecision in the terms of service isn't good enough; the systems now need to do WAY better at expunging the imprecision or at least making it clear when there IS imprecision. And can we get the image generators to fix their numeracy and spelling while we're at it? (my AI generated snapshot for this newsletter has TWO Section 2's!!).

4. The "Brain Full" Moment: Context Saturation

I'm sure most of us have been in a long and productive chat with an AI, only to find it suddenly forgetting key details from earlier in the conversation or just getting totally confused. It happens to us humans as well. For GenAI systems this occurs when you get close to hitting the "context window" limit which for most systems is about 100K tokens (that's maybe 70,000 words approx.)

While these context windows are getting larger, they still have their limits, and larger windows can even sometimes lead to a decrease in performance and an increase in cost. If you code like I have been doing on a side project with Claude Code, you will know that you get a warning about the context window becoming full (AI Studio from Google also displays this parameter).

Then, when it's nearly full the system forces a "compaction" and summarises the past conversation to allow context to be freed. It works pretty well.

Instead of the AI's performance degrading, it should proactively inform the user that its context is nearing saturation and suggest ways to summarise or clean up the chat history to continue effectively. This sort of approach should help people avoid the nonsense (or dangerous) sessions caused by long-running conversations that are well in excess of useful context windows.

5. Paying Users Deserve Stability

For those of us integrating AI into our daily workflows, unexpected changes to the models we pay for can be more than a minor annoyance; they can break entire processes. The "move fast and break things" approach doesn't work when people are relying on your service for their business.

Ideally, users should be notified of significant changes and given the option to stick with a previous version of a model if the new one no longer suits their needs. This kind of version control is standard in the software world, and it's time for the AI industry to catch up.

And a Bonus...

Let's build on what we've learned together. Current LLMs largely treat each interaction as a fresh start, unless the previous conversation is explicitly included in the context (which can consume much of the "new" context.) A truly intelligent system would learn from our ongoing interactions, getting a better sense of our goals, preferences, and even our blind spots over time.

We've had well over 12 months of these issues and it's time for us to now have a better experience in my view. The future isn't about us becoming better prompters; it's about the AI becoming a better collaborator. What do you think?

Read on other platforms or download

Was this helpful?

Loading comments...

Leave a Comment

← Previous Next →