Hack This Site

Assembly using the native "debug" command: a simple walkthrough

I'm going to show you how to write a simple "Hello World" program in assembly. Assembly is a low-level programming language. It's not object-orientated like most programming languages. Instead, assembly deals directly with the CPU (BE CAREFUL!) by using mneumonics to represent certain processes (called opcodes), registers (flags and pointers as well) which hold values for a particular function, and hexadecimal to represent certain values. We call this the ISA (Instruction Set Architecture). Therefore, any possible binary sequence that is usable for a particular computing system is attainable in assembly. The coding will be different depending on the CPU architecture (like x86) and address space (like 32-bit). Most of this information you can google, wiki, or find on CPU manufacture's website.

Okay, so first we'll need an assembler. Some popular ones include A86, fasm, nasm, and tasm... but we're not going to use any of those. Instead, we'll use a native debugging tool on Windows (which can act as an assembler, disassembler, or hex dump). You can use the native "gdb" for linux. I'm going to talk about "debug" on Windows though because it's much easier for a beginner than gdb. If you have linux and call yourself a hacker, then I'm going to assume that you're knowledgable enough to download (if necessary) and read up on gdb. Note that debug runs at the 16-bit process level and is therefore limited to 16-bit computer programs. This is fine for our purposes though. If you want to program 32-bit programs then you can download debug32 from here. You can also enter the hex values manually which match the instructions using the extended registers (32-bit) in standard debug.

Alright, so first open up command prompt. You can do so by pressing [WINKEY]+R and entering "cmd" without the quotes. Enter "debug", without the quotes, in the prompt. You should now see a little dash and your cursor marker. We're going to assemble at address 100 using the following command... CODE :

a 0100

Now you'll see a physical address and a virtual one along with your cursor marker. Enter each of these commands and be aware what is output and what is not.
CODE :

0100 jmp 111 0102 -e 102 'Hello World!',0D,0A,'$' -a 111 0111 mov dx, 102 0114 mov ah, 09 0116 int 21 0118 int 20 011A -h 011a 0100 021A 001A -n hello.com -rcx CX 0000 :001a -w Writing 0001A bytes -q C:\Users\Admin>hello.com Hello World! C:\Users\Admin>_

Don't close your prompt yet!

Congratulations, you've made your first 'Hello World!' program in assembly. Now what the hell does this all mean? Well, it's actually very simple. The first instruction we entered "jumps" to address 111. This is analagous to "GOTO" in batch files. We chose this address because of the length of the string we were going to enter. I'll illustrate this in a bit. After exiting assembly, we then entered data directly at address 102. The stuff in quotations is our string, the next to bytes (represented in hex) are the values to make a new line, and finally the '$' mark defines the data as a string. Other assembly programs like nasm do this kind of thing for you with some kind of define instruction. I'm not too familiar with it.

Anyways, next we assemble at address 111 (remember, that's what we jumped to). Okay, so what does "mov dx, 102" mean? Well, "mov" stands for "move". "dx" is the 16-bit data register. It's used for temporary storage of data. "102" is simply an address. So when enter this instruction, it's saying: "Move the data at address 102, which is a string, into the data register". It's not "mov 102, dx" because the CPU works through what is called a "stack". That means last in, first out which is why it's backwards. In hex, the whole instruction is actually represented like this: BA 02 01. This is a very important concept to learn. I recommend reading up on it.

At this point, I'd like you to open up debug again (your prompt should have still been left open). The CPU will clear the string if you closed it, which means you'll have to start over. Enter "d 102". Notice where address 111 is. I wanted you to see this because it illustrates the data more clearly.

Onto our next command: "mov ah, 09". The 16-bit "ax" register is the accumulator. It's used for arithmetic, logical, shift, rotate, or other similar operations. In this case, we don't need the whole 16-bit register. We only need 8 bits (one byte) which happens to be 09. "ah" corresponds to the high order byte of ax and "al" corresponds the the lower order byte accordingly. This instruction actually translates to "Declare function 9".

Next, we do an interrupt with the command "int 21". An interrupt simply interrupts the control flow of the processor. When an interrupt is triggered, the processor will jump to an ISR (Interrupt Service Routine). Each ISR is a program held in memory which handles a certain interrupt. In our case, we executed interrupt 21. When interrupt 21 is used in conjunction with function 9, it will print the string stored in the data register. Here's proof that I'm not bullshiting you: http://stanislavs.org/helppc/int_21-9.html

Finally, we execute interrupt 20 with "int 20". It simply closes the program. No functions are used with this interrupt.

CODE :

-h 011a 0100 021A 001A -n hello.com -rcx CX 0000 :001a -w Writing 0001A bytes -q

"h" just calculates bytes difference and addition. "n" names the program. "rcx" enters the control register. "w" writes the amount of bytes specified in the control register. "q" quits.

Well, I hope you've learned something. And more importantly, I hope you've been inspired to study some of this material. It's not too hard once you wrap your head around it. Start out making simple 16-bit programs before working with the extended registers (eax,ebx,ecx,edx).

Donate

Challenges

Get Informed

Get Involved

Communicate

About HTS

Partners

Assembly using the native "debug" command: a simple walkthrough