The Linux File System

"What is Linux?" is an important question to be answered before we even start talking about its components. And to understand this, we need to understand what it is actually doing.
Suppose you had a Chinese friend who is very intelligent. He can solve any problem that you have. That is why you love talking to that friend. However, there is a problem. You are a native Hindi speaker who knows English but no Mandarin. But this person doesn't speak Hindi or English. Now what do we do? In this case, we would need either a person or software that can do translations for us. This person is now what we call a translator.
Our computers behave mostly in a similar manner.
Computers are machines. As we know, a machine has just one job - given what to do, it will do it very precisely and speedily. Computers are also the same. We want to offload our intensive manual tasks to computers. However, computers only understand things in 0s and 1s. To counter this, we created languages that could later be converted into 0,1 code. However, that was not enough. Computers had lots of parts: memory, CPU, etc. We needed programs for this also. So instead of us handling all of this manually, a software called the kernel was designed to automate this process. To interact with this kernel, we were provided with a user interface software that was easy for us to understand and use. The combination of the kernel and the user interface is what we call an operating system. Today, there are many popular operating systems like Windows (spoiler alert: it costs money) and Linux (spoiler alert: it's free). Among these, Linux or more officially "GNU/Linux" is the most widely used operating system, especially at the server side. To be able to work efficiently with cloud environments and servers, we need to understand Linux closely.
The File System Hierarchy
There is a very popular joke in the Linux community - "Everything in Linux is a file". This is kind of equivalent to JavaScript's - "Everything is an Object". This is, however, a very important skill to have as it can provide an intuition as to how the Linux filesystem works. We will navigate the File System using clear examples, as I have EndevourOS, an Arch-based operating system.
The Root
In the context of Linux, 'root' can mean a lot of things. However, here we refer to 'root' to describe the operating system's root directory. This folder is the origin of everything. Any file that is present in the operating system can be found inside the root directory. It is represented by a forward slash (/). If a user tries to move out of this directory, they will not see any changes, as we can see in the image below.
Now, since the root contains everything, let us look at its contents.
By running the "ls" command, we can see all the files and folders in our OS. Using the 'ls -a' command, we can also see hidden files and folders. All the folders that are visible here are very interesting. Most of the folders here are responsible for holding up the operating system's and users' core functionalities. We will try to cover many of these folders, as they will be the ones that will help us understand Linux.
The bin and sbin folders
As we understand now, to interact with the kernel, we need some programs. For example, when we want to see all the files in a particular directory, we run the ls command. At the lowest level, the "ls" command is also a file that contains a program. The input for this program is the current directory path, and it outputs the file directories inside our current directory. All such executable files that perform system-based tasks are present in the bin and sbin folders. But why 2 folders?
That is a great question if you had it. And the reason is, not all tasks should be allowed for anyone, should they? Because any Linux system has a root user or an admin. This user has the power to run anything and do anything which a normal user does not have. To separate the commands that can be executed only by the root user, when he has authenticated himself either by being the root user who is logged in or by providing the system password, the sbin folder was created. This folder contains only those files that are allowed to be used by the root user.
Here are some files in the bin directory:
By the way, here is a prank you can do. You can rename any system binary file to anything. Then, when you run a command with that name, it would be super cool. For example, what if you renamed "ls" to "summonAllChildren".
The boot folder
First things first - don't touch it. Now, this folder is one of THE most important folders in the system. This is because when we open our system, all information from the disk is loaded into the RAM, system processes are started, and important things are configured. The boot folder has the files and folders that contain processes that manage the booting of the system. If anything happens to this folder, the integrity of the system may be compromised, and you might not be able to boot into your system.
The dev folder
Now this is where the 'everything is a file in Linux joke really shines. The dev folder consists of our devices, like keyboard, mouse, disk, CPU, network card, etc. These are basically virtual mappings to our hardware to configure our hardware settings.
Here are the files and folders in the dev folder:
The primary disk of the system is called sda. Partitions may be called as sda1,sda2, etc. A USB stick may be recognised as sdc. For disks, it provided an interface to read and write data. It also has information about our keyboard, mouse, etc.
The etc folder
The etc folder is a folder that deserves extra respect. It won't be an exaggeration to call it the central nervous system of Linux. This is because it consists of major config files, settings, networking information and a lot more. Let us take a peek inside it:
This folder stores the identities of users, system configurations like hosts, DNS resolvers, and system services. What I love about this folder is the transparency. It exposes everything for the end user to see and check for themselves, and if needed, edit it too. Let us explore this folder in a little detail, and this little 3-letter folder contains a LOT of stuff.
The Hostname
This one differentiates my system as a separate node in a network. This name can be said to be the name of my system. When you see "parikar@Nivren", that represents the user@system combination. Kind of beautiful if you look at it in that manner.
The Hosts and resolv.conf
I mention these 2 together because they tie in with the concept of DNS. We all know that names like Google.com, mobile.dev etc are not actually understandable by the machines. They need IP addresses. So domain names need to be mapped to IP Addresses.
In old times, to resolve domain names, the /etc/hosts file was used to store the domain-IP mapping. Since it is largely succeeded by DNS today, this file mostly stores the address of the localhost in both IPv4 and IPv6 formats. No entry in the hosts file is compulsory today by any protocol or rule, but it must at least have this entry. This is because, for domain name resolution, this file is still often consulted before DNS resolution. Having the entry for localhost here ensures that there is no complex DNS resolution for localhost. It also can contain other IP addresses and make a domain resolve locally. One can also alias their projects running on localhost here.
Since not all domain entries would be present in our system, we provide an address to a special server called a name server. These are DNS Servers that our system queries to find the IP of the target domain. You can check out the entire DNS lookup concept at: https://domain-world.hashnode.dev, as it is beyond the scope of this blog.
The etc/fstab folder
This folder is used to automount partitions on boot.
The etc/shadow, etc/sudoers, etc/passwd folders
These are folders that store user information. They store user password hashes, who can be superusers and user definitions.
The systemd subfolder
It is one of the most important folders. This folder contains files that come into use during the boot process to start background services. It manages daemons (not fiction, we mean processes here). , timers and dependencies between services. It also contains the /network subdirectory that is used when systemd-networkd manages networking. Even the starting of Wi-Fi-related services is the task of systemd processes.
The lib and lib64 folders
These folders show the philosophy of Linux. Since most of the programs on Linux depend on some libraries or imported code, these libraries store some of the most important code that would be needed to carry out essential system processes, boot process, etc. The lib64 folder stores important libraries for 64-bit systems, as historically some systems were 32-bit based.
The mnt folder
This folder is used by the system administrator to mount storage devices manually temporarily. I have mounted my HDD on this and made this process automatic.
The opt folder
The opt folder is the definition of the binaries of applications that we install ourselves. For example, if one downloads Chrome on Linux, it could go into /opt/google/chrome, indicating that it is an optional application. The software installed here is meant to be independent of the system's package manager.
The sys folder
The contents of this folder are virtual and are created by the kernel in RAM and are not present on a physical drive. On shutdown, they get lost. While dev is used to read and write data to the hardware, it is used to manage the configuration and metadata of the hardware.
The tmp folder
This can be thought of as a slate board. It is used to store temporary non-essential data. If you want to create a new file that you don't wish to store permanently, you can create it here. On a reboot, it would be empty again.
The usr folder
The usr folder is less absolute and more of a story. It is undoubtedly an important presence. Let us take a peek inside it.
We can see that the system contains many redundant folders. bin, sbin, lib, etc. are repeated. This is because this folder had historical significance. It was meant to represent user services and dependencies that were kept separately from the system's core files. Earlier, this folder used to mean user, but now it is an acronym for Unix System Resources. In some modern distros, bin and sbin are actually not real folders but links to /usr/bin and /usr/sbin.
The var folder
The var folder represents 'variable' data. It stores files that are expected to change or grow. This includes cache files, log files, spool files, etc.
The proc folder
Now this is a big part of Linux, especially to understand the depth of the Linux world. This folder, first of all, is a virtual folder and does not actually exist on the drive. It is created by the Kernel. This folder contains all the processes that are running in our system. Any process that is running will have a /proc/pid entry. As such, we can get processes and terminate processes. When a process is complete,d the process entry is also deleted.
Multiple commands can be used to infer a lot of information about the process. The commands we can use are present as a file in the /proc/PID. At its root, the proc folder consists of commands that can be used to get general information on system processes, like uptime and version.
Some important commands include:
/proc/cpuinfo: Detailed information about the system's processor./proc/meminfo: Real-time data on RAM usage, such as free and buffered memory./proc/uptime: Shows how long the system has been running since boot./proc/sys: A critical subdirectory that allows administrators to tune kernel parameters (like networking or security settings) at runtime.
The Home Folder
In Linux we can have multiple users. Each user is assigned a personal folder where some common functionalities are already provided, and the user can work in, create new directories and store data without any superuser permission. The home folders for all users are present under the home folder of the root directory. The name of the home folder of a user is the name which the user has decided as their username. On boot, this is the folder that the user starts in.
The Root Folder
The root directory is the home directory of the superuser or the root user. Here, only the authorised user can enter. It is kept separate from the other home folders so that its integrity and security can be maintained.
Conclusion
We have tried to explain the system in detail, covering all important folders in the root directory. The Linux system is something that becomes clearer as one tries to get their own hands dirty. To get into the system and tweak something here and there, breaking and building things is where real learning happens. Nevertheless, an understanding of the detailed structure of the entire Linux filesystem would make navigation through this complex maze visibly smooth. As such, I would encourage everyone to try out Linux themselves and start exploring.
