Key Takeaways
- Recent advances in prose-to-code generation via Large Language Models (LLMs) will make it practical for non-programmers to “program in prose” for practically useful program complexities, a long-standing dream of computer scientists and subject-matter experts alike.
- Assuming that correctness of the code and explainability of the results remain important, testing the code will still have to be done using more traditional approaches. Hence, the non-programmers must understand the notion of testing and coverage.
- Program understanding, visualization, exploration, and simulation will become even more relevant in the future to illustrate what the generated program does to subject matter experts.
- There is a strong synergy with very high-level programming languages and domain-specific languages (DSLs) because the to-be-generated programs are shorter (and less error prone) and more directly aligned with the execution semantics (and therefore easier to understand).
- I think it is still an open question how far the approach scales and how integrated tools will look that exploit both LLMs’ “prose magic” and more traditional ways of computing. I illustrate this with an open-source demonstrator implemented in JetBrains MPS.
Introduction
As a consequence of AI, machine learning, neural networks, and in particular Large Language Models (LLMs) like ChatGPT, there’s a discussion about the future of programming. There are mainly two areas. One focuses on how AI can help developers code more efficiently. We have probably all asked ChatGPT to generate small-ish fragments of code from prose descriptions and pasted them into whatever larger program we were developing. Or used Github Copilot directly in our IDEs.
This works quite well because, as programmers, we can verify that the code makes sense just by looking at it or trying it out in a “safe” environment. Eventually (or even in advance), we write tests to validate that the generated code works in all relevant scenarios. And the AI-generated code doesn’t even have to be completely correct because it is useful to developers if it reaches 80% correctness. Just like when we look up things on Stackoverflow, it can serve as an inspiration/outline/guidance/hint to allow the programmer to finish the job manually. I think it is indisputable that this use of AI provides value to developers.
The second discussion area is whether this will enable non-programmers to instruct computers. The idea is that they just write a prompt, and the AI generates code that makes the machine do whatever they intended. The key difference to the previous scenario is that the inherent safeguards against generated nonsense aren’t there, at least not obviously.
A non-programmer user can’t necessarily look at the code and check it for plausibility, they can’t necessarily bring a generated 80% solution to 100%, and they don’t necessarily write tests. So will this approach work, and how must languages and tools change to make it work? This is the focus of this article.
Why not use AI directly?
You might ask: why generate programs in the first place? Why don’t we just use a general-purpose AI