One Program. Many Human Languages. One Semantic Core.

What I learned building a multilingual programming language from scratch.

Feb 20, 2026

Sometime ago I started asking a question that felt both obvious and under-explored: what if the surface of a programming language — its keywords, its built-in names — could be written in English, French, Spanish, or Japanese, while the program underneath stayed exactly the same?

Not a translation tool. Not a teaching toy. A real interpreter, with one formal semantic core, and multiple natural-language frontends sitting on top of it.

That experiment became multilingual.

That question feels even more urgent now. In 2025 and beyond, LLM-assisted coding — sometimes called vibecoding — is rapidly dissolving the distinction between describing what you want and writing the code that does it. If the surface layer of programming is already becoming more conversational, more natural, more human, then the language that surface is written in starts to matter enormously. multilingual is an exploration of what it means to take that shift seriously at the interpreter level, not just in a chat prompt.

The idea in one example

English

let a = 10 let b = 3 print("a + b =", a + b)

French

soit a = 10 soit b = 3 afficher("a + b =", a + b)

Both run through the same lexer, the same parser, and produce the same AST. The output is identical. The only thing that changes is the surface syntax — the human-language layer.

Why this is not “just localized keywords”

A data-driven keyword registry — add a new language by updating a resource file, not rewriting the parser.
A single execution pipeline for every language: lexer → parser → semantic checks → Python codegen → runtime.
Phrase-level aliases — alternate phrasing patterns, not just single keyword swaps.
RTL syntax support for right-to-left languages.
A REPL where you switch languages live and inspect generated Python in real time.

Currently supported: English, French, Spanish, German, Italian, Portuguese, Hindi, Arabic, Bengali, Tamil, Chinese (Simplified), Japanese.

A REPL session in French

$ python -m multilingualprogramming repl --lang fr >>> soit somme = 0 >>> pour i dans intervalle(4): ... somme = somme + i ... >>> afficher(somme) 6

Prior art: what came before and what is different

ALGOL 68 — had localization provisions for keywords, mostly theoretical.
Lingua Romana Perligata — Perl in Latin; a serious point wrapped in a joke.
Hedy — excellent pedagogical language with multilingual keyword support, focused on beginners and gradual syntax introduction.

The key differentiator: multilingual is not about teaching through a simplified language. It is about whether a full programming model can have multiple human-language surfaces without fragmenting its runtime behavior.

Current limitations (honestly)

Word order and morphology: the grammar is still Python-shaped, which feels unnatural in agglutinative or verb-final languages.
Standard library: module and API names stay as canonical Python; localization covers keywords and selected built-ins.
Naturalness: pour chaque i dans intervalle(3) is more French than for i in range(3), but still a Python-shaped sentence. Full grammatical naturalness is an open research question.

Vibecoding and the multilingual moment

Vibecoding — the practice of describing what you want in plain language and letting an AI generate the code — is already changing who can program and how. But most of these tools still produce English-centric code. A developer in Columbia, a student in Casablanca, or a researcher in Osaka who vibes their way through a program still ends up with output they cannot easily read, modify, or teach from, because the code surface defaults to English.

multilingual flips that assumption. If an LLM can generate code in French syntax as easily as English syntax — and multilingual can execute it directly — then vibecoding becomes genuinely multilingual for the first time. You describe your intent in French or Portuguese; the LLM scaffolds a program using the corresponding surface keywords; multilingual runs it. The semantic core stays stable throughout. Nothing gets lost in translation, because there is no translation step.

There is also a deeper implication for legibility. When vibecoding produces code, who reads that code afterward? Today, the generated output is always English, creating a subtle barrier: the program works, but the programmer cannot fully own it, review it, or modify it without an intermediary. A multilingual surface removes that barrier. The generated artifact belongs to the programmer in a much fuller sense.

What this is for

Teachers introducing programming to students who think in a language other than English.
Researchers exploring the boundary between natural-language syntax and formal semantics.
PL enthusiasts who want to contribute a new language mapping.
Anyone curious about what language-inclusive programming could look like as LLM-assisted coding makes the surface layer more fluid.
Vibecoding practitioners who want the LLM-generated output to be readable and modifiable in their own language, not just runnable.

It is v0.1.0 — a working prototype with a tested end-to-end pipeline, open-source under GPLv3.

What is next

More language mappings (adding a language is a data file, not a parser rewrite — contributions welcome)
Better tooling and IDE support
A clearer formal specification of the semantic core
LLM-assisted translation workflows between language surfaces
A vibecoding integration layer: prompt-to-multilingual-code pipelines that generate surface-correct programs in the user’s chosen language
Exploration of LLM fine-tuning on multilingual surface syntax to improve generation quality across all supported languages
A playground or online REPL where users can vibe a description in their language and watch the multilingual program execute in real time

https://github.com/johnsamuelwrites/multilingual

Feedback, criticism, and new language contributions are equally welcome.

John Samuel

Discussion about this post

Ready for more?