Hack This Site

Hello all,

Because of everyone asking, the intrinsic lack of information on this subject, and I broke my computer again (I break things a lot), I'm going to start a small mini-series about basic NASM (this tutorial will apply to most other Intel-Syntax assemblers too, such as FASM, MASM, TASM, etc.) programming.

Preface:
Just some information/expectations for you guys:
1. I will be using Linux for these articles. No, they will not work on Windows. However, because of the nature of this site, I expect that most if not all people on here have access to at the vary least, a virtual machine they can use. If not, direct yourself over to VMware, and to Ubuntu

2. I will be using Ubuntu 12.04 for these articles. Don't fret, the code will assemble on any Linux-based distro, but you may have to use a different package management system to download the assembler.

3. You need to know hexadecimal for these articles. You don't need to know all that much, just what it is, how to convert it, and just be generally comfortable seeing it.

4. This is not for people new to programming. In my opinion, you should learn assembler after you know a bit more about C/C++, or some other local language. Because this is not for those who are new to programming, I will not explain certain programming paradigms I expect most people with moderate knowledge of programming will know. (Functions, pointers, arrays, etc.)

5. I will be using a x86 OS. USING A x86 (or 32 bit) OS IS EXTREMELY IMPORTANT. THESE TUTORIALS WILL NOT WORK ON A x64 OS. Assembly language differs from CPU to CPU, such that a x86 CPU cannot run x64 bit instructions, and visa versa. However, if you're using a x64 bit CPU with a x32 bit OS (like me), then you should be just fine.

6. That's it! Let's get started!

Why should I learn assembler?:

There's a multitude of reasons that you might want to learn assembler. One of the most obvious ones is just that it gives you and in-depth view of how your computer actually works. It helps you understand what's going on behind the scenes, and might even help you diagnose some of the errors your compiler/interpreter throws at you. Another common reason I see is that (especially for people on this site) in-order to do reverse engineering, you need to be competent in assembler to some degree (that's actually how I got into it). Or if you're writing an OS, or compiler, you need a solid foundation on assembly. And one of the last reasons that I can think of, is that you just enjoy learning/a challenge. If any of these descriptions fit you, and/or you don't give two shits and are going to do it anyway, then this tutorial is for you!

Different types of assembly-languages:

Just like any other programming language, there are quite a few different types of assembly-languages. And just like any other programming language, they're all pretty much the same thing, albeit slightly different syntaxes. However, there are two main categories of assembly-languages, that all other ones fit into. These categories are AT&T syntax, and Intel syntax. Essentially, it's just how you prefer your code to look. I'll show you a simple, "Hello HTS!" in each syntax.

This is the program in AT&T syntax (assembled with GAS):

CODE :

.globl _start .text _start: movl $len, %edx movl $msg, %ecx movl $1, %ebx movl $4, %eax int $0x80 movl $0, %ebx movl $1, %eax int $0x80 .data msg: .ascii "Hello, HTS!\n" len = . - msg

And this is that same program, in Intel syntax (assembled with NASM):
CODE :

SECTION .data msg: db "Hello HTS!", 10 len: equ $-msg SECTION .text global _start _start: mov eax, 4 mov ebx, 1 mov ecx, msg mov edx, len int 0x80 mov eax, 1 mov ebx, 0 int 0x80

Note: I didn't actually assemble either of these examples, I just wrote them out in my text editor. However, they should both assemble fine.

As you can see, there's quite a big syntax change between the two, however they're both doing the same thing. As I stated above, all assemblers variations can be divided between the two syntaxes. For instance:

AT&T syntax:
GAS

Intel syntax:
NASM (The one we use in these articles)
MASM
FASM

I personally prefer the Intel syntax, as in my opinion it's easier to read, but if you want to go with AT&T syntax, go for it.

Installing NASM:

As I said before, I'm using Ubuntu 12.04 for this, so I'll show you how to install NASM with that. If you're using a different distro, you should be able to find it with your package-management system. Anyway, just plop this code in the terminal, and hit enter:

CODE :

sudo apt-get install nasm

And there we go, NASM is installed.

The sections:

If you look at the "Hello HTS!" code above, you'll notice some odd things. First off, you probably noticed that everything is organized into sections. Well, in assembly, there are 3 different sections in any given file:

The .data section:

This is the section where you would put initialized variables. These are things that you're going to assign a value to in your code (example in c: int i = 10), not something that who's value is going to be assigned at runtime (example in c: int i; scanf(%d, i);). If you scroll back up to my example, you'll notice that in our ".data" section, we have two initialized variables in there. The first one is our message that is going to be displayed, which we assigned the value of "Hello HTS!" to. The next one, is the length (in bytes) of our message. These are both initialized variables, as we already assigned their value.

The .bss section:

This is the section where you put all variables that aren't going to be initialized. Like, if you're going to get user input, but you don't know how much they're going to input, or what they're going to input, you would create that variable here. There isn't a .bss section in the example above, because we don't need one. That's a good thing to know about assembly; if you don't need a section, don't include it.

The .text section:

This is the section where all code that is going to be run, is run. It's essentially the meat and potatoes of the program. In the example above, we can see that several commands are executed in the .text section, which then outputs "Hello HTS!", and then exits.

The registers:

In your CPU, there are small sections of memory that are called registers. In a x86 CPU, there are 9 register that you really should know about. While there are others, they're not as important for a beginner to know about. The 8 registers are as follows:

The general purpose registers:
EAX - The accumulator register
EBX - The base index register (for arrays)
ECX - The count register
EDX - The data/general register

The specific registers, and what they do:
EBP - The base pointer, it holds the current stack frame address in it.
ESI - The source index for string operations.
ESP - The stack pointer register, it tells you what's on the top of the stack.
EDI - The destination index for string operations.
EIP - The instruction pointer, it tells you what the current instruction is.

The general purpose registers are just that, general purpose. While they do have names for them, that's usually for calling specific things, or by convention. It doesn't mean that you have to count only with ECX, or only hold data in EDX. You can do pretty much anything with them.

Some other useful commands:

I'm not going to go completely in-depth on the commands here, as I'm going to do that in the section "Dissection the 'Hello HTS!' program". This is going to essentially be a reference for that section:

mov destination, source - Moves data from the source to the destination. For example, to move the number 4 to the ECX register, you would do "mov ecx, 4".

global _start - The assembler looks for this, much like a C compiler looks for the main function. You should generally place this in, or before your .text section.

jmp functionName - Jumps to whatever function (or address) you specify.

functionName: - In assembly you have something like functions, but they're more like goto labels in that one will bleed into another. Here's an example of that:

CODE :

_start: mov eax, 4 mov ebx, 2 otherFunc: mov eax 2

otherFunc will still be executed, because like I said before, if you don't jump to a different function, they will bleed into each other. (No, this isn't a bug)

push data - Pushes the data onto the stack

pop register - Pops the data on the top of the stack, into the register specified

;Your comment in NASM - This is how to comment in NASM, much like "//" in CPP.

int 0x80 - "int" stands for interrupt. It basically stops your programs execution, calls the Linux kernel to come and examine the registers, and to see if it recognizes any of them. If it does, then it runs that certain function... It's hard to explain without seeing it.

Dissecting the "Hello HTS!" program

Up above, when I was talking about the differences between assemblers and their respective syntaxes (AT&T, Intel), I listed examples of each. Well, now we're going to dissect it, and see what each and every command does, and hopefully understand it.

CODE :

SECTION .data ;This is our .data section, where we'll keep our initialized variables (as explained before) msg: db "Hello HTS!", 10 ;This is our message. "db" means double-byte. You can also use "dw" for doulbe-word, and "dd" for double dword. The "10" at ;the end appends a new line to the string. len: equ $-msg ;This gets the length of msg. SECTION .text ;Explained above global _start ;Essentially our main function _start: ;Here's our main function itself... Though as described above, they really are more like labels. mov eax, 4 ;Moving 4 to eax, which tells our kernel that we're going to do a write mov ebx, 1 ;Moving 1 to ebx, which tells our kernel that we're going to write to STDOUT mov ecx, msg ;Moving our msg to ecx, which is the thing we're going to print mov edx, len ;Moving the length of our message to edx int 0x80 ;Awaking our kernel, and telling it to do whatever we just said. mov eax, 1 ;Moving 1 to eax, which tells our kernel that we're planning to exit our program mov ebx, 0 ;Our return value for this program, which as it's not an error, it's 0. int 0x80 ;Awaking our kernel, and telling it to do whatever we just said.

Save this program as "hellohts.asm", and congratulations! You potentially just understood and wrote your first assembly program! If you didn't understand ANY of this, feel free to message me on the forums.

Assembling our program:

In order to run this program, we need to assemble it with NASM. Here's the code for the terminal to do so:

CODE :

nasm -f elf hellohts.asm

That will produce a "hellohts.o", which we need to link, in order to run. To do that, we need to run the command:

CODE :

ld -o hellohts hellohts.o

And now (if everything went right) we should have a runable file! Just type in:

CODE :

./hellohts

And you should get the output, "Hello HTS!". If not, then something went wrong. Make sure you entered the code right. If it's still going wrong, just send me a message on the forums, and I'll respond as fast as possible.

Hope you enjoyed, more tutorials to come!
~Cent

Donate

Challenges

Get Informed

Get Involved

Communicate

About HTS

Partners