2025-02-02
I'm taking an OS class this semester and, although it doesn't cover the most interesting OS (Windows), it is quite a nice excuse to finally get around to reading OSTEP. Our first assignment was to build a pretty basic Unix shell in C. Of course, since it's a school assignment, I won't post or directly reflect upon the actual code, but I did want to explore some of the ideas I was encouraged to pursue in the toils of my labor.
As a brief overview of my meandering-
As with all assignments (school or otherwise), the specifications were unclear (or rather the process of understanding them, in spite of their shortcomings, is a labor in it of itself) which gave rise to many inquiries about certain behaviors. I would like to consider myself moderately well versed in the shell, but sometimes I have left curiosities about the inner mechanisms as just that. We were granted many liberties, like not having to handle quotations particularly when tokenizing (the tokenizer/AST parser is pretty low-tech actually). One thing that was not immediately clear, although not necessary for this project, was how commands like echo $PATH
are processed.
I was surprised that something this
#include <unistd.h>
int main(void) {
char *arr[3] = { "echo", "$PATH", NULL };char *arr[3] = { "echo", "$PATH", NULL };char *arr[3] = { "echo", "$PATH", NULL };char *arr[3] = { "echo", "$PATH", NULL };char *arr[3] = { "echo", "$PATH", NULL };char *arr[3] = { "echo", "$PATH", NULL };char *arr[3] = { "echo", "$PATH", NULL };char *arr[3] = { "echo", "$PATH", NULL };
execv("/bin/echo", arr);
}
outputs $PATH
and not the expanded version. To my current understanding, it is the shell not the echo binary itself /bin/echo
that performs the shell parameter expansion or globbing, etc. This makes sense in retrospect.
To take the assignment 'to the the next level', it seems like I would have to invest more effort into the actual parsing and tokenization of the fetched line, including expanding environment variables. One thing to note is that we were instructed to use execv
for the assignment, and I guess the shell becomes trivial if you can leverage an existing shell to perform these functions for you, like execlp
or execvp
https://linux.die.net/man/3/execvp. At first I thought it necessary to maintain my own KV store for these variables, but doing so would be very over engineered. It seems like combining getenv
with some pre-processing would be sufficient for an in-house implementation.
At times, my parsing/input processing at some 'off by one' or NULL handling bugs that caused segfaults. I tried to inspect these errors in GDB but initially had a difficult time because the code was running in a child processes and I was only attached to the parent. I had never attempted to debug a child process in C before, so it was interesting to figure out how to do so.
Lastly, while I have used C for a few (mostly school) projects now, I don't use it very frequently and often have to warm myself back up to the language. I am of the opinion that a programmer of the C language has an easier time reasoning about their code because it is easier to understand how their code actually works/is executed. Comparatively, I find Python much more difficult to reason about (though easier to write, mostly) because much of it is abstracted away. Mileage may vary, especially when you have to sift through large codebases or libraries, but this has been my personal experience. I would be hard pressed to explain every step in how even a straightforward Python program executes on a computer, but would find it much easier to do so for a C program.
That being said, programming in C can be painful at times. No OOP may or may not be a blessing, but the lack of generics highlights a convenience I take for granted in other languages. I know more sophisticated C programmers have workarounds and likely shift their paradigms to accommodate for the language, but as I found myself needing to resize arrays manually for different parts of my shell implementation, I looked to create a dyn_array ADT and found it burdensome, though not particularly difficult. I looked into creating macros for the task (which seem fine), but am not super well versed in them and opted to just.. duplicate my code. An unfortunate conclusion, but the first steps to get better are to acknowledge your faults! Or something.
My only regret with this project was my own faults. Although I had an issue with my file descriptor handling that took a while to find (I was closing stdout rather than the redirection target), it took me way longer than it should have to realize that, although my logic was actually fine, I was passing in the wrong variable to some my parsing and redirection functions which messed everything up. Much time wasted.
That being said, it was fun. I'm enjoying the OSTEP book (the dialogues are really interesting to have been included) and the class.