Hey hey I’m back again! Do you know what is source code? Do you know how to read it?
At this time maybe about 90% of programming questions can be searched for answers with Google, and usually the answers will be found on a stack overflow site, on someone’s blog, and sometimes on some sites that do discuss specific topics that are sought.
Of course, these answers can be found if you can speak English. At a minimum know a few keywords to search for, and the rest is the ability to read English text. All sorts of simple questions have already been asked, for example “how to split the string in “. Often the answer is accompanied by its function source code, for example, for questions on how to split strings in C that are not in the standard library.
In addition to various answers on stack overflow, and articles on various sites, there are also various books and documents that can be read to find answers. But there is one source that is most accurate for programming questions: source code. Sometimes the information from the documentation can be wrong or not updated, but the information from the source code will not be wrong.
At present most of the software used is open source. Various open source operating systems (Linux, FreeBSD, OS X, and many more), compilers and interpreters (gcc, llvm, go, php, python, etc.), plain text editors (emacs, notepad ++, vi, etc.) or even IDE (IntelliJ, Eclipse, Visual Studio Code, etc.). Various games, audio editors, video converters, etc. all have an open source version. Various libraries already exist to do almost anything, from decoding file formats to recognizing faces and sounds.
Not all software has complete documentation. For example: even though millions of people use OS X, and the OS X kernel is open source, there is very little documentation and books about OS X’s internal kernel. If for very popular software the documentation is not much, of course there are still hundreds of thousands of other software with less documentation again.
Documentation of a function in a language also often does not mention implementation details, which are sometimes important in terms of security, for example the mt_rand and mt_srand documentation in PHP do not mention how the seeding process is carried out if it is not explicitly stated. Of course this can be asked to stack overflow, and in this case someone answers for a specific version of PHP (and he gets the answer from the source code).
From the security side, we can find weaknesses from reading the source code. In the case of mt_rand, there have already been two bugs found (in 2008 and 2011). Bugs in software that uses PHP in connection with the use of mt_rand are also still frequently found (for example one that is relatively new in October 2017).
Reading the source code without a strong reason is rarely done. But when forced, we should be able to read other people’s code. Often we don’t need to read deep, usually it’s just a surface or specific to a specific section, depending on needs. Reading a large source code cannot be like reading a novel from the first page to the last. Reading large source code must use various approaches.
The first approach is skimming, just looking at the various files, then opening the file. Check if there are comments. The most important thing is knowing whether the file is used or not. Some files turned out not to be used, some turned out to only be compiled if certain features are activated (while the default feature is not active).
Another important thing that can be obtained from just skimming is: what library is used by this program or what external program is called by this program? Sometimes it turns out that the source code that we find is not as expected because it turns out that it just wraps another library or command line.
The next approach is to run the program. This can be done immediately if the program does not need to be compiled and does not need a specific server (for example: some programs need a database server). If the program needs to be compiled, this step also provides a lot of information. We know what files are actually compiled, so we know what libraries are linked, etc.
If you are able to run the program, it will be even better if you can debug the program. Some programs are very difficult to debug, for example the Linux kernel on Android requires a setup that isn’t easy, plus it needs to learn to use GDB. But if a debugger can be used, then this will facilitate the understanding of the program. If the debugger cannot be used, another way is to use “print” debugging.
For the record: various steps can be done simultaneously. Compiling source code sometimes takes a long time (hours) or downloading requirements (various libraries needed) can take hours. While the compilation is running, the code can be read. If the code is too large to download (for example: Qt source code size is hundreds of megabytes) can be read online source code (for example
An IDE will be very helpful for reading programs: finding out where a function is defined, from where the function is called can be carried out easily. To understand very large programs (for example the Linux kernel) usually an IDE is not workable (too slow). As an alternative, cross reference programs (for example LXR) can be used.
If the program has a unit test, then this test can also be a way to understand certain functions. Unit tests only call one program module to test the truth, so we can understand one small part of the program.
Also note that sometimes popular applications are very complex because they have a lot of optimization. For example, if you only want to understand how to make a web server, you can start by reading a simple source code, then you can continue to read more complicated ones. If you do not understand the basic web server, and immediately read the Apache source code, then you will most likely be confused.
Some things need a good theoretical basis, for example: understanding the source code compression library requires a basic theory of compression. Understanding the implementation of MP3 encoders / decoders would also be impossible if they did not have a theoretical basis for human hearing (and many mathematical concepts).
The most important thing when reading source code is: motivation. Without certain motivation, reading source code is indeed very boring. Motivation to read the source code can be many, and depends on each person. Some examples of things that have motivated me:
• There are configurations that don’t work, and I don’t understand why, while the documentation is unclear
• I want to understand certain bugs (for example this)
• I need to know the implementation of certain algorithms
• I need to find bugs for pentesting purposes. For example there is an old software but found no exploits on the web
Hopefully all the tips above can help you try reading other people’s source code.