Thursday, 9 April 2026

My observations using AI tooling

 With special thanks to Margarita Lozian and Peter Reeves for their reviews

Introduction

Over the last 9 months (September 2025 – April 2026) I’ve been using AI tooling in a more institutional way as various work approved applications came online.

In part, that is because AI tools have begun to become ubiquitous in our professional and personal lives. According to a recent BCG study, whilst AI usage has surged, the measurable impact hasn’t kept pace with expectations (BCG, 2025). This disconnect raises an important question: What are the best ways to actually use these tools?

To try to answer this question, I’ve been trying out various AI tooling (IBM Bob, Copilot, ChatGPT) to see what work works well and what doesn’t. This article explores what I have discovered and hopefully those with more experience than me can let me know if you have seen any of these issues, have managed to solve them or if some of these are just plain user error.

The AI Tools

In a work setting, I’ve been utilising IBM Bob which is trained on IBM-specific expertise. It’s very good for technical troubleshooting and product guidance, planning work and compiling data together. It’s got different modes depending on what you are trying to do and it’s a tool that we in IBM Expert Labs has been integrating into our delivery practices as discussed by the IBM Expert Labs UKI Automation Platform Delivery Damian (LinkedIn - Damian Boys).

I’ve been evaluating Microsoft Copilot which I’ve found to be better at internet facing data summaries, like understanding how companies are operating, and how real-world scenarios map to some of the technical features I’m working on with products. This is good for things like campaign ideation, exploring alternative implementations and getting rapid information on product features.

I have found from my small amounts of usage, that IBM Bob is better for project work, lots of files and information from different sources, building code to transform the data into what I want. Microsoft Copilot is better for one window context and does seem easier to make general questions that consumes general internet sites.

In a personal capacity, I’ve been using ChatGPT for my football simulation engine https://github.com/GallagherAiden/footballSimulationEngine,  advice on gardening, financial reasoning. I already had a ChatGPT account which I had rarely used, so decided to use this for more of my day to day queries. For example, when I was exploring mortgage renewals and overpayment savings, it was much easier to ask ChatGPT then to run multiple queries on MoneySavingExpert and I could then ask questions about financial advice, and war-game different interest rates and all the context was already there. 

My Main Takeaway: Understand What AI Actually Is and Not What It Seems

The thing I have to keep reminding myself is that AI tools are sophisticated reference material reviewers, not sources of independent intelligence or unique thought. Once you internalize this, you can properly contextualize what to expect in responses or chained responses over time.

I now like to think of AI as an incredibly well-read sound board, that can give you ideas about how others achieve things or what data is telling you. But, the value really comes from the innovative thoughts, ideas and analysis or putting the pieces it gives you together.

My Observations

1. Context and Integration Challenges (ChatGPT, IBM Bob)

Even when you provide all your files, AI tools struggle to piece them together properly. It misses connections between files, fails to maintain consistency, and from my playing seems to bolt things on. Now I appreciate I might not be giving enough context to the tool and may not be providing it with my development style, but some of that should be obvious from the code I provide.

Example: Working on my football simulation engine, I provided all data structures, game logic, and statistical models. Yet AI would suggest changes to one component without considering ripple effects on others and would make coding stylistic choices that were the complete opposite of what I had done elsewhere, for example, semi colons at the end of lines in Node JS (let’s not debate that decision here!)

AI treats code generation tasks independently, even when it should consider broader system architecture. It might generate a beautiful function but change the signature in ways that break existing callers even though those callers were in the context provided.

Note: Whilst you can define explicit files to touch and not touch it requires additional overhead, and obviously the world is moving so fast on AI tooling this is probably already out of date thinking!

2. The Innovation Limitation (All)

What I found more and more, is that the AI tool would constantly push me to copy how other tools do things. “This is how FIFA’s engine works and football manager does it this way too”. Great, but I’m building something different, a composition iteration-based engine that can be amended by users per iteration, not an end-to-end simulation of a match.

I get that this is subtle, but the constant attempts to move me towards known solutions really highlighted the lack of innovation that we might face in AI generated applications and services.

That’s great if you want to be quick, not so great if you want to get the edge and build something novel, faster, more streamlined, more secure etc.

3. The Confidence Trap (All)

One of the most infuriating quirks of AI, is that is doubles down when wrong. When you point out mistakes, it reveals “assumptions” it supposedly made or claims you didn’t provide information for, even when you did, possibly because its context window loses it.

Example: The AI generated code that broke my application. When I fed back the error, it responded: “Well, of course that won’t work you did X, which is completely wrong.”… But YOU gave it to me.

It’s annoying, condescending and can make you doubt your own judgement. I know you can change your profile and ask for it to speak and interact in a new way, but I haven’t tried this yet and obviously I can switch tools if I don’t like how one is speaking.

4. Problem Analysis vs. Root Cause Understanding (All)

I find AI is excellent at reviewing large amounts of data and highlighting concerns from the data but often can be wrong about what the root cause of the concerns are. Yes, it can spot patterns but lacks the domain knowledge and intuition to understand them properly.

Even with IBM Bob, which is trained on IBM-specific domain knowledge, there's still a gap between pattern recognition and true understanding. While it performs better within its training domain, it still lacks the intuition and contextual judgment that comes from real-world experience. This shows this is a generic problem across all AI tooling.

Where, the tools work really well is for grouping issues, identifying areas for improvement. However, one time when I provided some data to CoPilot that wasn’t quite right – a mislabeled API Connect issue as an IBM Liberty issue, completely skewed the summary and analysis by the tool. What this shows is that there is still, at least for now, a need for human-AI collaboration, as we’re likely to always see some errors in the data whether human or otherwise.

The disconnect can then make the summary seem wrong and for me (the person using the summary), it makes me worried about my integrity in presenting the findings.

5. Silent Assumptions (All)

I also found that AI doesn’t ask for more context. A person would ask clarifying questions if they didn’t know the answer to something. Existing AI tools almost never ask and will instead makes assumptions leading down a wrong path.

Think of the wasted resources used instead of just asking! Some tools have started to ask more and I think this will be something solved in not-too-distant iterations of the software.

6. The Always Helpful Problem (All)

Another time, I was quickly trying to answer a customer question and wanted to send them the relevant documentation link with the technical information. 

The AI tools half-guessed a viable link that then didn’t work and, at its naughtiest, completely made up a link! It will also use generic information rather than specific about a product, for example, I was looking for a compatibility matrix for java support in a product, the responses was that the  “application supports Java 12”, but it didn’t tell me that information was from documentation of another product and was wrong for the product I was discussing..

Example: Whilst trying to find some specific Maximo v9 documentation, the AI confidently provided links that didn’t work and blended general software best practices with supposedly specific product guidance. When pressed, it couldn’t provide actual documentation sources and finally admitted the advice was generic.

Whilst I checked all the links ahead of time, because I like to see AI citations in person still, this was caught. But if you were in a rush or it got enough right you could become complacent.

All of this stems from AI tooling wanting to be helpful, and if it can’t it begins to hallucinate, which is worse than not answering at all. Maybe my prompts should include “hey, its ok to not know everything little guy”.

7. Practical Limitations

Token limits in a context window are finite, or where there is a very high limit the more context can reduce the quality of the response. When you hit limits, the tools can lose context, and “forget” important details from earlier in the conversation. I saw various ignored requests myself, and like the cautionary tale of Summer Ye at Meta this led to a deletion of important context. That wouldn’t wash in an MQ design managing millions of payments.

https://techcrunch.com/2026/02/23/a-meta-ai-security-researcher-said-an-openclaw-agent-ran-amok-on-her-inbox/

It also overwrites with no real sense of version control. It regenerates entire sections rather than making edits, overwrites files and doesn’t do any form of version control on documents, although it works well with code because of the underling code commit infrastructure already provided by Git. If I ask for a rewrite trying a different prompt, it takes away the old one, I can review before I approve but I might want to keep elements of both.

Understanding Why AI Works This Way

Note: Hybrid models are emerging that claim to address these processing limitations, potentially making this observation less relevant in the near future.

Some of this is obviously just how AI works:

Pattern Recognition, Not Reasoning: AI models are trained on existing data and excel at pattern matching. They can interpolate between known solutions but can’t extrapolate to genuinely new ones. They generally lack the ability to reason from first principles.

Local vs. Global Understanding: The transformer architecture processes information through attention mechanisms focused on local context windows. While they can “see” all your files, they struggle with maintaining global state and understanding complex interdependencies. The further apart two pieces of information are, the harder it is to connect them.

No Self-Awareness: AI has no mechanism for recognizing what it doesn’t know or identifying gaps in information. It’s trained to generate complete, helpful responses, not to engage in true collaborative problem-solving. It can’t assess its own understanding.

Models are trained using Reinforcement Learning from Human Feedback (RLHF), rewarded for being helpful. But “helpful” often means “always providing an answer with confidence.” The model has no actual uncertainty quantification—it doesn’t “know” when it’s wrong. The confident tone is learned from training data.

The main way to “fix” this is to have multiple AI windows or even different tools review each other’s answers and highlight concerns.

No Truth Verification: It can’t distinguish between information from actual documentation versus inferred patterns, and has no access to real-time information.

With MCP (Model Context Protocol) tools enabled, referencing and link verification has improved significantly. However, users who disable these features to reduce token consumption may still experience hallucinations and fabricated links and even then the documentation can be old or not correct.

Limited Human Verification: If you're not familiar with the coding language the AI tools builds, how could you notice that it is wrong? How do you ensure that security is being met? In some cases, people will ask other AI tools to verify another tools output. Again, the cost of this makes me shudder.

Computational Constraints: Token limits exist because the attention mechanism’s computational complexity scales with input length. The model has no persistent memory beyond the current context window. There is also the impact of large context windows, for example I noticed browser interrupts, more frequent null responses, slow response times which are directly impacted by both the client and the server compute resources.

My Advice:

1.        Set Realistic Expectations - use AI for what it’s good at: generating boilerplate, exploring known solutions, processing large amounts of information, and providing starting points but don’t expect it to actually “think”.

2.        Verify Everything – and make sure you are able to understand what you are verifying. If you can’t, then ask someone else or another AI tool to assess the response critically

3.        Provide Comprehensive Context - Over-explain requirements, explicitly state constraints and dependencies, provide examples, and describe the broader system context.

4.        Challenge Confident Assertions - Ask: “What are you basing this on?” “Can you provide specific documentation?” “What assumptions are you making?”

5.        Use Multiple Tools - Different AI tools have different strengths. Use specialized tools for domain-specific questions, general tools for brainstorming, and cross-reference answers when something seems questionable.

6.        Treat AI as a Junior Developer – review all code carefully, provide detailed guidance, don’t assume it understands the bigger picture, and verify that changes don’t break existing functionality.

7.        Know When to Stop - Sometimes traditional methods are more efficient!

Conclusion

AI tools are powerful and useful, but they’re not magic. The gap between AI usage and impact that BCG identified exists because we’re still learning how to use these tools effectively.

The key is understanding what AI actually is: a sophisticated pattern matcher that excels at applying known solutions to familiar problems. Once you have hammered this home to yourself then you can start to get the most out of the AI tools available to you.

Let me know if you have had any of your own pain points, or solved any of mine already.