๐ŸšกProgramming language, under-the-hood stuff :Documentary.

Photo by Erik Mclean on Unsplash

๐ŸšกProgramming language, under-the-hood stuff :Documentary.

Documentary of my week-long research and learning of under-the-hood stuff in programming languages. Just out of curiosity ๐Ÿซฃ

Jun 25, 2023ยท

8 min read

Play this article

How I ended up writing this blog

It was a time when I could not handle a neverending list of unanswered questions saved in my brain. It was clear that they were taking up a lot of space, so I wrote them on a piece of paper to free my brain storage. ๐Ÿง ๐ŸงŸ

After week-long sleepless nights, it was the moment when every dot suddenly started to connect. I didn't go through compiler design books but still managed to understand fundamentals which is enough to code up my first programming language haha. BTW I'm not kidding! ๐Ÿธ

The modern way to explore and learn

Before going into the path that I took to understand the concepts, I want to clarify one thing. Apart from the theories, it was my curious mind being in an unstable state and strong hunger to understand how languages work from the inside.

Traditionally things have been taught like it's detached from reality. Hell, seriously. Lemme give you an example here, for learning an operating system concept the only thing you need do is to `build your operating system`. Yup, it's damm simple. Why do people complicate them with generalized learning approaches where you get multiple random tosses of never used terms?

I'm not opposing the traditional idea, instead, it's not the better initial approach to take when you don't have your fundamentals clear to build more advanced stuff upon.

Considering my magical way to explore stuff that works as fast as NURALINK ๐Ÿฅน. I decided to learn concepts by putting my list of favorite languages on the inspection table๐Ÿ˜‰

LanguageCompiledInterpreted
C/C++ --> Carbonโœ…โŒ
Golangโœ…โŒ
Rustโœ…โŒ
PythonโŒโœ…
Java/C#โŒโœ…
JavascriptโŒโœ…
RubyโŒโœ…
Zigโœ…โŒ
OCamlโœ…โœ…
Mojoโœ…โœ…

The first question in mind

To date, I'm using multiple programming languages combining compiled and interpreted. I wondered how a piece of software can convert highly human-readable language with symbols into the processor's native language or code.

Thus I began my exploration research in a top-down fashion. The languages I'm familiar with gave me enough motivation to start with them cause I use them. Being already informative about compiled language saved me a ton of time. While going through them, I took real-life examples of them like :

  1. C/C++

  2. Golang

  3. Rust

  4. Zig

...

Along the way explored multiple terms including Ahead of time, Natively compiled. Things got pretty serious when it was tough to wrap my head around unanswered questions like :

  • Can an executable code ( code.exe ) run on the same operating system ( Mac ) but different Instruction set architecture?

  • If the operating systems have a binary for multiple processor architectures ( ISA ) why do we need to compile for specific architecture then?

These questions led me to a term called CROSS-COMPILATION, it had all the answers I needed. WoW, problem solved then, Yeah kinda ๐Ÿฅน

Cross-compilation enables us to compile programs for different processor architectures ( ISA ) from being in a single processor architecture. The above diagram illustrates this incredibly โœจ

Lemme give you a real-life example of cross-compilation in Golang. Execute the below command in the terminal to compile Caddy for Windows running on amd64 Instruction set architecture.

Golang designs, themes, templates and downloadable graphic ...

env GOOS=windows GOARCH=amd64 go build github.com/mholt/caddy/

The executable will be created in the current directory, using the package name as its name. However, since we built this executable for Windows, the name ends with the suffix .exe. Run the below command to verify the created file.

ls | grep caddy
output
caddy.exe

The env the command runs a program in a modified environment. This lets you use environment variables for the current command execution only. The variables are unset or reset after the command executes.

The following table shows some of the possible combinations of GOOS and GOARCH you can use:

GOOS - Target operating systemGOARCH - Target Instruction set architecture
Androidarm
Dragonflyamd64
Linuxppc64
NetBSD386
Windowsamd64

What's the matter with JAVA?

I have written Java code before. Initially, I used to run my code in IDE ( Integrated development environment ) - Eclipse, which gave me a sweet abstraction layer to prevail me from looking into the internal execution processes. This was long before in my high school days, but ๐Ÿ‘ things have changed now ๐Ÿฅฒ

This time I played with SDK instead and did my terminal magic to understand under-the-hood stuff. The very next moment I got myself confused between Javac and Java inside command line env. why do we require a two-step process to execute Java code? why did the .class file gets generated after the first process? My curious mind needs answers now ๐Ÿคฅ

// Doing Javac thing first...

javac demo.java

// Doing Java thing second...

java demo

Did some digging and finally got to a question that says, 'Is Java compiled or interpreted?' ๐Ÿค” found out that Java is both compiled as well as Interpreted but how? Okie things get pretty juicy here cuz the Java architecture is entirely different than natively compiled languages like zig or rust . Let's look at the bigger picture together.

Phase 1 - Compilation

If we look at things closely, the compiler javac in Java is functionally very similar to natively compiled language, the only dissimilarity is the output of the compiler which is the Bytecode instead of machine code. The Bytecode is also called Intermediate code which is closer to the machine but not as much as the binary stuff.

The question arises here is that why we even need to compile into Bytecode to execute it further by the Interpreter. we can just do direct Interpretation like LISP In the old days. BTW LISP was the first Interpreted programming language.

  1. Performance: Bytecode is a lower-level representation of the code compared to the original source code. This allows the JVM to perform various optimizations during the interpretation or Just-In-Time (JIT) compilation process. These optimizations can result in improved performance and execution speed compared to directly interpreting the source code.

  2. Portability: Java aims to be platform-independent, allowing Java programs to run on any system that has a compatible JVM. By compiling Java code into bytecode, which is a standardized intermediate representation, it can be executed on different operating systems and architectures without the need for recompilation.

  3. Interoperability: By using bytecode as an intermediate representation, Java programs can seamlessly interact with other languages that target the JVM, such as Kotlin, Scala, and Groovy. These languages can also be compiled into bytecode, allowing them to utilize Java libraries and frameworks.

Phase 2 - Interpretation

We now have bytecode in hand, we just need to pass it on to Interpreter. The Interpreter then analyses the Bytecode line by line and produces direct output instead of an executable file { specific to the operating system }.

Variations in Interpreters

Like Native compilers, Interpreters are rapidly evolving over time. Earlier we used to directly Interpret source code, for example, LISP but now things have gotten advanced. Let's introduce you to JIT what stands for just in time.

Here is a thing, Interpreters are slow than running what's produced by native compilers, and because it's executing bytecode line by line in most of the cases which in result takes a lot of time doing stuff while being in runtime mode. That's why we have JIT with us which optimizes out dynamically typed code on the fly in runtime. Code optimization in Interpreters is not possible because it scans each line and executes it.

Code optimization let us execute the same code much faster than before. we won't go deeper into optimization for now, that's a topic for another blog but you get the point.

Okie here are the Interpreter's variations in terms of architecture that you will see in most cases:

VariationsExample language
Direct Interpreter without JITLISP
Direct Interpreter with JIT
Bytecode Interpreter without JITPython
Bytecode Interpreter with JITJAVA, MOJO

In the background, even the compiled programs( C, C++ etc) are interpreted. There is an interpreter running the binary file which is implemented by the underlying processor. But its not commonly said.

  • A CPU can be viewed as a hardware-based interpreter for its machine code.

  • A VM can be viewed as a software-based interpreter.

Architectures

Now that I have a broad understanding of a couple of programming language architectures, I began to dig more into language architectures and I found:

  1. Natively compiled โœ…

  2. Direct Interpreter ( with or without JIT ) โœ…

  3. Bytecode Interpreter ( with or without JIT ) โœ…

  4. Transpiler

Multiple Implementations of the same language

Let's take it back to 90tees. We all know C language which is epic for system programming but don't have an official implementation like python.org or ruby-lang.org. If you type C programming language in Google you won't get a website dedicated to C language instead you are forced to download its compiler according to the operating system.

PlatformImplementation
WindowsMinGW, Cygwin
Mac/Linuxclang, GCC

List of C compiler Implementations ... https://en.wikipedia.org/wiki/List_of_compilers#C_compilers

Let's roll towards Interpreted language like Python. Earlier, I said Python has an official implementation so why am I talking about it in this section? In addition to official implementation, it has some unofficial implementations like pypy.

List of Python Implementations ...

https://wiki.python.org/moin/PythonImplementations?action=show&redirect=implementation

List of implementations

For learning Implementations of other languages refer to this Link:

https://en.wikipedia.org/wiki/List_of_compilers

ย