Uncanny Solutions
Prompt Engineering Challenges
Introduction
In my journey using Github Co-pilot, I have been experimenting with the different models inside of VS Code and evaluating their code quality and reasoning.
My perception on AI is that it is an excellent reference system. I consistently ask questions on coding libraries and programming paradigms. With the updated co-pilot in VSCode, they have added Ask, Edit and Agent modes, and these are pretty cool. I do like them quite a bit and they are positive improvements. Honestly, I can’t wait to see more of what is in store.
However, let’s dig in to what I am seeing vis a vis code quality and reasoning on a macro level. I often have a judgement call to make on programming as unlike when I was younger, I can’t keep the encyclopedia of knowledge in my head anymore. What this means is that I am using Co-pilot as a partner as a knowledge base.
I was going to write out a much longer evaluation for this article and as I was writing this I realized that it was too much. I’m going to break this out into two more articles with examples. For today, I feel there is a bottom line in “grounding“ one’s usage of co-pilot.
How I ground myself with co-pilot is asking these questions:
Is AI truly giving me correct answers?
Am I fooling myself into thinking an answer is the correct answer, when it is completely wrong.
Does AI adequately understand the problem to pose a solution?
Am I spending more time fussing with AI vs. writing the code myself?
Am I spending more time fussing with AI vs. writing the code myself?
To answer the last question first, I turn off the suggestions feature entirely in VS Code. For me, it is causing too much noise and cannot follow a reasonable code style for production ready code. It borks out simple things that I expect from my own coding. It will not follow pylint no matter how hard I try to convince it to.
Another great frustration is in using pandas and numpy. Those are deep libraries and AI doesn’t get it and focuses on the wrong things. It really has a hard time conceptualizing array programming and vectorization.
Given that, I write about 70% of my code now, and let AI pickup 30% of the boilerplate. However, I would say that it’s a 50/50 split between if I am spending more time fussing with AI. I should be clear here though, that this split is lightyears ahead of where it was a year ago and that deserves kudos.
Does AI adequately understand the problem to pose a solution?
No. I think this is a given in the industry. I will say that the “Agent“ mode of Co-pilot is a huge upgrade, but even with more context the reasoning engine gets stuck in its own… ego? I cannot tell you how many times I have written in the prompt: “Stop suggesting X, Y, Z solution, we’ve already established that answer does not work“.
Once again, this is especially true for “intuition based“ programming. For me, as a very experienced engineer, I can look at output and know intuitively how things are working. AI does not grow, it just regurgitates.
Am I fooling myself into thinking an answer is the correct answer, when it is completely wrong.
Yes. Absolutely. There is this uncanny nature about solutions looking “technically“ correct when they could not be more wrong. I have gotten pinched by this, especially when I am trying to solve for a really hard problem and upon seeing an AI solution blindly accepting it in haste; instead of asking myself: “Is this correct?“
I find myself reasoning: “I don’t have a better idea, let’s try that“ and going down a rabbit hole when I should be asking: “Does this make sense in computer science and best practices in my codebase“.
This happens the most when I see an esoteric exception being raised like pandas “The ‘truth value of a Series is ambiguous’“ error, which is nightmare fuel to track down why it is happening in large data sets of millions of permutations.
“Grounding“ myself with this means that even though the answers are getting better, I would say it is still more unlikely that an answer is “production ready” than being “technically“ correct. There is a big difference between those two paradigms.
Is AI truly giving me good answers?
Yes, it truly does! The most frustrating thing about co-pilot is that the uncanny nature of the results has me questioning more than trusting. Did I put enough context in? Did I write the prompt in a way that gives enough detail? How far in the past is the model looking at and is that too far in the past to be relevant today?
That can kill productivity.
Also, now that there is a faux “personality“ to co-pilot, this friendly egotist is simultaneously endlessly positive and convinced it is correct. It’s like having a very junior developer who is savvy, but doesn’t have the experience or maturity to accept some guidance.
I want to learn from AI, but AI needs to learn from me and together we can find some sort of common ground. Right now though, my code is so much cleaner and testable.
One Last Thought
If you are not a programmer, or would like a different perspective, “Ocean Liner Designs“ on YouTube has an excellent video on how the uncanny valley works in re-touching historical images. I personally feel this is an important topic, being that as we retouch history we lose the context and actual people and events. I highly suggest checking it out, as it also shows how DALL-E and other image generators use existing images to extrapolate incorrect data points.
Overcoming the uncanny valley will in my opinion be one of the truly greatest achievements of human history.
There is more to this, and I have two other compendiums that I am dropping this week as time permits:
“Over-loading a Simple Answer” which elaborates on another challenge that co-pilot had and
“Code Quality Challenges“ which I imagine will be the bane of me using AI for some time to come.
I’ll add embeds on this page as well.
Thanks for reading, don’t code tired!
Related Article
SQL and the Infinite World of Null
Last year, I fell off the radar on Artificial Intelligence as from an engineering and management standpoint my perceptions of AI were that responses from prompts were very difficult to use in test driven development.


