How reliable is AI lke ChatGPT in giving you code that you request?
ChatGPT is a language model, it’s not intended for code and you’re using it “off label” at your own risk. It can produce working code, which is impressive in itself, but in order to know if it’s decent code you still need to be competent with that language. I had someone run a few prompts for me a while back, it ignored central parts of the query, and its output was basically like a very junior developer - fair enough, but not great or even that good.
Potentially useful, but if you expect it to be more than one part of the “process”, you might be setting yourself up for trouble.
Edit: just like it’s not a coder, it’s not a search engine or knowledge base, either. It just knows language and what seems like it ought to follow a given phrase. Be very aware of this difference, because sometimes it spits out 100% falsehoods with the same level of confidence and authority as the true stuff.
I think it’s important for people to also truly understand that generative machine learning models like ChatGPT also only “know” what they’ve seen before. There’s no interpretation or synthesis. It merely regurgitates what it’s seen, with some sampling from a probability distribution.
This means, if you’re asking for something niche, and it’s only seen what you’re prompting it for once (or, really, the same text repeatedly across multiple websites) , there’s a very good chance that it will just recreate that artifact wholesale.
Which means you need to be cognizant of what the license for that material is before you use it in a product!
Software engineer with decades of experience here - ChatGPT can give you mostly-working code for solved problems, but with occasionally subtle and weird bugs. It’s very confident and will happily hallucinate. It will not help you with debugging or integrating, which is the majority of coding. It’s a pattern matching engine, nothing more.
I have built several programs with ChatGPT 4 by now. From very basic Python scripts to Python webscrapers and C# in combination with Unity3D.
In the beginning it was much better than it is currently. At the moment context is severely hampered no matter the limit and you’ll be bashing your head against circular arguments and it straight up ignoring stuff you just posted two messages ago.
Trying to troubleshoot code it wrote a few days ago will be a slog and like dragging yourself over nails at times. Here’s what I have found to help and make life better:
- Be very, very, very precise in your instructions. And keep them saved, so you can reuse them later (point 4)
- From the very start plan to build your project with small functions that interact (good policy anyway) which makes troubleshooting and changing these functions much easier and will prevent you running into message limits.
- If it fails to work for you the way you need it, you’ll might have to scrap your entire code and start over with ChatGPT -> Again, the reason for point 2 being very important. Scrapping one functin is much less painful than an entire tool.
- Start new chats when you feel the quality degrading. Sometimes it helps and since the context is garbage at the moment anyway, it doesn’t matter much.
- Post the code it is supposed to fix every single time. It will inevitably refer to other code, code hallucinations, etc. otherwise. Again why point 2 is important.
god I hate those circular arguments, it’s like you’re arguing with a todler
I wanna ask it to write me a better AI and bring on the Singularity!
The biggest issue here is that people aren’t differentiating between models. gpt-4 is probably 20-30 higher IQ than gpt-3.5-turbo. Also your question could be interpreted to include LLMs in general. Most LLMs are absolutely horrible at programming. OpenAI’s actually can do it given some limited specific task. Again, gpt-4 is much better at programming.
Also OpenAI just released new models. They now have one with 16k token context which is four times larger than before. So it can understand more instructions or read more code.
For something specific like writing basic SQL queries or even embedded Chart.js charts to fulfill a user request for a simple report on a table, gpt-4 can be very effective, and gpt-3.5 can often do the job. The trick is that sometimes you have to be very insistent about certain gaps or outdated information in it’s knowledge or what you want to do. And you always need to make sure you also feed it the necessary context.
For something a bit complex but still relatively limited in scope, gpt-4 can often handle it when gpt-3.5 screws it up.
What those models are good at doing now especially with the version just released, is translating natural language requests into something like API calls. If there is not a lot of other stuff to figure out, it can be extremely useful for that. You can get more involved programs by combining multiple focused requests but it’s quite hard to do that in a fully automated way today. But the new function calling should help a lot.
The thing is, wait 3-6 months and this could be totally out of date if someone releases a more powerful model or some of these “AGI” systems built on top of GPT get more effective.
This has some nice examples of how well large language models do with some fairly basic programming requests
I’ve used ChatGPT to answer questions relating to Python. Notably, I asked it how to use QtNetwork to send and receive requests with authentication, as the application I was using did not use any non-standard modules I was more accustomed to like
requests
but did have PyQt. Not only did it gave me working code snippets but explained it in a way that I was able to understand. No, it’s not perfect. But man it’s better than hunting Google for that one StackOverflow post.I have heard it trips up on certain less-used programming languages like Swift though, so depending on your use case YMMV. I’ve also not used Codex but a friend of mine has. Apparently it really liked to mention this one specific GitHub profile.
For shits and giggles I asked ChatGPT a while back to represent a Pokemon with a Python class, and it gave me working code. Google Bard would trip up and not use the class when I told it to.
I agree with the other comments that ChatGPT isn’t really that good for programming, it hallucinates often and you end up working too hard just to try and figure out what it got wrong. However, I have found a good AI engine, phind.com, that has started to replace my google searches. It’s just a wrapper for ChatGPT, but it cites its sources so you can verify or dig deeper, provides search engine results in a sidebar and has upvote/downvote options to help it improve. So it feels like a personal google “agent” that runs off and googles something for you and comes back with a concise report.
Personally I just can’t work with system that lies to me (even for a little) but all the time.
I tried to use chatGpt and Bing bot and phind.com few times and everytime I got answers that looks like real and looks like correct answer but slightly (and few times completely) wrong.
Everytime I have to reread documentation, check links, investigate is there a reason why LLM answered this way, maybe I wrong this time and LLM found something that I did not found…I agree that phind.com get best results, but every small incorrectness here and there irks me and makes me question myself and answer as whole.
Upd: in general questions, like when you trying to investigate some new field, technology, tooling suite LLM is very, very good. When you want to get something like overview of topic that you interested in.