Need Help with a program in C

Re: Need Help with a program in C

Post by Monica on Wed Dec 08, 2010 1:36 pm
([msg=50289]see Re: Need Help with a program in C[/msg])

WithOUT Thetan's assistance in C, you would be extremely upset. But WITH Thetan's assistance, you no longer will be upset with C.
hi am new so plz dont troll me or i report 2 the HTS mods ty
User avatar
Monica
Contributor
Contributor
 
Posts: 900
Joined: Thu Oct 02, 2008 12:29 am
Location: In The Shadows
Blog: View Blog (0)


Re: Need Help with a program in C

Post by tgoe on Fri Dec 10, 2010 10:25 am
([msg=50371]see Re: Need Help with a program in C[/msg])

Yeah, sorry for muddling the terms. The point is, worrying about function call overhead is like weighing two 150 pound lightweight fighters in milligrams and calling the heavier one fat. I'd love an example of where it matters. I'm sure it does at some point but I just don't see it.
User avatar
tgoe
Contributor
Contributor
 
Posts: 673
Joined: Sun Sep 28, 2008 2:33 pm
Location: q3dm7
Blog: View Blog (0)


Re: Need Help with a program in C

Post by thetan on Sat Dec 11, 2010 11:23 am
([msg=50395]see Re: Need Help with a program in C[/msg])

ahh, i see how you're looking about it now. I must say, thats not the best way to look at solutions algorithmically. As you are only addressing one of the points with tunnel vision and you're ignoring the algorithm entirely.

You are looking at the solution as a single operation (IE weighing ~150 pound boxers one each side). While in reality the problem is very different. As demonstrated by the loopdeloop and the stack algorithms the problem in reality would be weighing O(n) 150 pound boxers vs weighing O((n+1)*n/2) 150 pound boxers, which does indeed make a significant difference. (FYI, incase you jumped to this post, tgoe is referring to function calls as 150lb boxers, so just to avoid confusion)

Also, in terms of algorithmic performance the order of magnitude is the most important factor. Abstraction and reusability is very important. If done right, you enable your algorithm to be executed abstractly and endlessly at a later point in time. That being said when reused in repetition the single operation time becomes far less significant in comparison to the general order of magnitude of performance. As demonstrated by my two algorithms, stack is an order of magnitude faster then loopdeloop, 1e-06 vs 2e-05 respectively.

_never_ disregard the algorithm. _always_ start with the algorithm.

So it being said that the algorithm will always be the pivotal point of truth, real world applications where such optimizations would be critical are in use on a daily basis. JiT compilers that implement trace trees are a prime example in a field where performance _is_ the point of dominance and it is common to see jump tables over function calls in such implementations as well as heavy inlining of code. It's no coincidence that lex/flex are macro heavy, because performance matters in lexical analyzation. In such related FSM and LL/LR GLL aglorithms are on average more aglorithmically complex then the loopdeloop algorithm by an order of magnitude or two.
"If art interprets our dreams, the computer executes them in the guise of programs!" - SICP

Image

“If at first, the idea is not absurd, then there is no hope for it” - Albert Einstein
User avatar
thetan
Contributor
Contributor
 
Posts: 657
Joined: Thu Dec 17, 2009 6:58 pm
Location: Various Bay Area Cities, California
Blog: View Blog (0)


Re: Need Help with a program in C

Post by tgoe on Tue Dec 14, 2010 10:10 pm
([msg=50464]see Re: Need Help with a program in C[/msg])

No. I think we're on the same page actually. The algorithm is so important that it dwarfs everything else. The two fighters in my analogy represent two competing implementations of the same algorithm differing only in number of function calls (which I contend is like the difference in weight of fighters in the same weight class). Speed is represented by the fighters' weight.

I forget where I first read it but I came across a piece of advice relating to programming that stuck with me:

1. Get it working (correct output)
2. Get it working correctly (correct output + correct algorithm)
3. Get it working fast enough
4. Get it working as fast as possible

.. or something like that. Programs can be hampered by sheer numbers of calls to make step 3-4 impossible. Makes sense but this isn't why loopdeloop is so much slower. Worrying about steps 3-4, function call overhead in particular, is a waste of time if you haven't done steps 1 and 2. If you do that, step 3 usually comes free. Output should be lines here. So, I'm looking at it as if loopdeloop doesn't even pass step 1.

Modified loopdeloop to output strings and added function call counts to both:
Code: Select all
#include <stdio.h>
#include <stdlib.h>

int calls = 0;

void dummy_io(int *i)
{
    *i++;
}

void loopdeloop(char c, int m)
{
    char s[512];
    int i, j;
   
    for (i = 0; i < m; i++)
    {
        for (j = 0; j <= i; j++) {
            s[j] = c;
        }

        s[i+1] = '\0';
        //printf("%s\n", s);
        dummy_io(&j);
        calls++;
    }
}

int main (int argc, char *argv[])
{
    int i;
   
    for (i = 0; i < 99999; i++)
    {
        loopdeloop('a', 75);
        calls++;
    }

    printf("Calls(lloop): %d\n", calls);
   
    return 0;
}


Code: Select all
#include <stdio.h>
#include <stdlib.h>

int calls = 0;

void dummy_io(int *i)
{
    *i++;
}

void stack(char c, int m)
{
    char s[512];
    int i, j;
   
    for (i = 0; i < m;)
    {
        s[i] = c;
        s[++i] = '\0';
        //printf("%s\n", s);
        dummy_io(&j);
        calls++;
    }

}

int main (int argc, char *argv[])
{
    int i;
   
    for (i = 0; i < 99999; i++)
    {
        stack('a', 75);
        calls++;
    }

    printf("Calls(stack2): %d\n", calls);

    return 0;
}


Code: Select all
$ time ./lloop2
Calls(lloop): 7599924

real   0m2.221s
user   0m2.220s
sys   0m0.000s
$ time ./stack2
Calls(stack2): 7599924

real   0m0.134s
user   0m0.140s
sys   0m0.000s



Function call count is the same and performance is just as lopsided. The difference is in the algorithm. Stack is better because it requires far fewer instructions period. Back to my analogy, the two versions of loopdeloop are very similar in performance even though the original does way more function calls.

Here's a further modified stack abstracting a piece away into a function:

Code: Select all
#include <stdio.h>
#include <stdlib.h>

int calls = 0;

void dummy_io(int *i)
{
    *i++;
}

void wrapped(int *i, char *s)
{
    s[++(*i)] = '\0';
}

void stack(char c, int m)
{
    char s[512];
    int i, j;
   
    for (i = 0; i < m;)
    {
        s[i] = c;
        wrapped(&i, s);
        calls++;
        //printf("%s\n", s);
        dummy_io(&j);
        calls++;
    }

}

int main (int argc, char *argv[])
{
    int i;
   
    for (i = 0; i < 99999; i++)
    {
        stack('a', 75);
        calls++;
    }

    printf("Calls(stack3): %d\n", calls);

    return 0;
}


Code: Select all
$ time ./stack3
Calls(stack3): 15099849

real   0m0.191s
user   0m0.190s
sys   0m0.000s



Millions more function calls than the above two yet very similar in performance to the other stack. Moving on to step 4... sure, worry about function call overhead...
User avatar
tgoe
Contributor
Contributor
 
Posts: 673
Joined: Sun Sep 28, 2008 2:33 pm
Location: q3dm7
Blog: View Blog (0)


Re: Need Help with a program in C

Post by thetan on Wed Dec 15, 2010 2:15 am
([msg=50469]see Re: Need Help with a program in C[/msg])

And bringing them all together
Code: Select all
jmoniz-mac:test jmmoniz$ ls
loopdeloop.c   loopdeloop2.c   stack.c      stack2.c
jmoniz-mac:test jmmoniz$ gcc -o loopdeloop loopdeloop.c
jmoniz-mac:test jmmoniz$ gcc -o loopdeloop2 loopdeloop2.c
jmoniz-mac:test jmmoniz$ gcc -o stack stack.c
jmoniz-mac:test jmmoniz$ gcc -o stack2 stack2.c
jmoniz-mac:test jmmoniz$ time ./loopdeloop

real   0m0.954s
user   0m0.951s
sys   0m0.002s
jmoniz-mac:test jmmoniz$ time ./loopdeloop2
Calls(lloop): 7599924

real   0m0.736s
user   0m0.733s
sys   0m0.002s
jmoniz-mac:test jmmoniz$ time ./stack

real   0m0.045s
user   0m0.042s
sys   0m0.002s
jmoniz-mac:test jmmoniz$ time ./stack2
Calls(stack3): 15099849

real   0m0.062s
user   0m0.059s
sys   0m0.002s



So does the function call overhead add an order of magnitude worth of overhead no. Does it have a performance impact sure.

Code: Select all
(gdb) disas loopdeloop
Dump of assembler code for function loopdeloop:
0x0000000100000e5b <loopdeloop+0>:   push   %rbp
0x0000000100000e5c <loopdeloop+1>:   mov    %rsp,%rbp
0x0000000100000e5f <loopdeloop+4>:   sub    $0x20,%rsp
0x0000000100000e63 <loopdeloop+8>:   mov    %esi,-0x18(%rbp)
0x0000000100000e66 <loopdeloop+11>:   mov    %dil,-0x14(%rbp)
0x0000000100000e6a <loopdeloop+15>:   movl   $0x0,-0x4(%rbp)
0x0000000100000e71 <loopdeloop+22>:   jmp    0x100000e9c <loopdeloop+65>
0x0000000100000e73 <loopdeloop+24>:   movl   $0xffffffff,-0x8(%rbp)
0x0000000100000e7a <loopdeloop+31>:   jmp    0x100000e88 <loopdeloop+45>
0x0000000100000e7c <loopdeloop+33>:   lea    -0xc(%rbp),%rdi
0x0000000100000e80 <loopdeloop+37>:   callq  0x100000e4c <dummy_io>
0x0000000100000e85 <loopdeloop+42>:   incl   -0x8(%rbp)
0x0000000100000e88 <loopdeloop+45>:   mov    -0x8(%rbp),%eax
0x0000000100000e8b <loopdeloop+48>:   cmp    -0x4(%rbp),%eax
0x0000000100000e8e <loopdeloop+51>:   jl     0x100000e7c <loopdeloop+33>
0x0000000100000e90 <loopdeloop+53>:   lea    -0xc(%rbp),%rdi
0x0000000100000e94 <loopdeloop+57>:   callq  0x100000e4c <dummy_io>
0x0000000100000e99 <loopdeloop+62>:   incl   -0x4(%rbp)
0x0000000100000e9c <loopdeloop+65>:   mov    -0x4(%rbp),%eax
0x0000000100000e9f <loopdeloop+68>:   cmp    -0x18(%rbp),%eax
0x0000000100000ea2 <loopdeloop+71>:   jl     0x100000e73 <loopdeloop+24>
0x0000000100000ea4 <loopdeloop+73>:   leaveq
0x0000000100000ea5 <loopdeloop+74>:   retq   
End of assembler dump.

(gdb) disas stack
Dump of assembler code for function stack:
0x0000000100000df7 <stack+0>:   push   %rbp
0x0000000100000df8 <stack+1>:   mov    %rsp,%rbp
0x0000000100000dfb <stack+4>:   sub    $0x230,%rsp
0x0000000100000e02 <stack+11>:   mov    %esi,-0x228(%rbp)
0x0000000100000e08 <stack+17>:   mov    %dil,-0x224(%rbp)
0x0000000100000e0f <stack+24>:   mov    0x212(%rip),%rax        # 0x100001028
0x0000000100000e16 <stack+31>:   mov    (%rax),%rdx
0x0000000100000e19 <stack+34>:   mov    %rdx,-0x8(%rbp)
0x0000000100000e1d <stack+38>:   xor    %edx,%edx
0x0000000100000e1f <stack+40>:   cmpl   $0x1ff,-0x228(%rbp)
0x0000000100000e29 <stack+50>:   jg     0x100000e7e <stack+135>
0x0000000100000e2b <stack+52>:   movl   $0x0,-0x214(%rbp)
0x0000000100000e35 <stack+62>:   jmp    0x100000e70 <stack+121>
0x0000000100000e37 <stack+64>:   mov    -0x214(%rbp),%eax
0x0000000100000e3d <stack+70>:   movslq %eax,%rdx
0x0000000100000e40 <stack+73>:   movzbl -0x224(%rbp),%eax
0x0000000100000e47 <stack+80>:   mov    %al,-0x210(%rbp,%rdx,1)
0x0000000100000e4e <stack+87>:   incl   -0x214(%rbp)
0x0000000100000e54 <stack+93>:   mov    -0x214(%rbp),%eax
0x0000000100000e5a <stack+99>:   cltq   
0x0000000100000e5c <stack+101>:   movb   $0x0,-0x210(%rbp,%rax,1)
0x0000000100000e64 <stack+109>:   lea    -0x218(%rbp),%rdi
0x0000000100000e6b <stack+116>:   callq  0x100000de8 <dummy_io>
0x0000000100000e70 <stack+121>:   mov    -0x214(%rbp),%eax
0x0000000100000e76 <stack+127>:   cmp    -0x228(%rbp),%eax
0x0000000100000e7c <stack+133>:   jl     0x100000e37 <stack+64>
0x0000000100000e7e <stack+135>:   mov    0x1a3(%rip),%rax        # 0x100001028
0x0000000100000e85 <stack+142>:   mov    -0x8(%rbp),%rdx
0x0000000100000e89 <stack+146>:   xor    (%rax),%rdx
0x0000000100000e8c <stack+149>:   je     0x100000e93 <stack+156>
0x0000000100000e8e <stack+151>:   callq  0x100000ed0 <dyld_stub___stack_chk_fail>
0x0000000100000e93 <stack+156>:   leaveq
0x0000000100000e94 <stack+157>:   retq   
End of assembler dump.

So the stack implementation produces just under 100 bytes more of machine code but is still faster then loopdeloop.


Even more interesting
Code: Select all
jmoniz-mac:test jmmoniz$ gcc -O3 -o loopdeloop loopdeloop.c
jmoniz-mac:test jmmoniz$ gcc -O3 -o loopdeloop2 loopdeloop2.c
jmoniz-mac:test jmmoniz$ gcc -O3 -o stack stack.c
jmoniz-mac:test jmmoniz$ gcc -O3 -o stack2 stack2.c
jmoniz-mac:test jmmoniz$ time ./loopdeloop

real   0m0.003s
user   0m0.000s
sys   0m0.002s
jmoniz-mac:test jmmoniz$ time ./loopdeloop2
Calls(lloop): 7599924

real   0m0.016s
user   0m0.014s
sys   0m0.002s
jmoniz-mac:test jmmoniz$ time ./stack

real   0m0.003s
user   0m0.000s
sys   0m0.002s
jmoniz-mac:test jmmoniz$ time ./stack2
Calls(stack3): 15099849

real   0m0.019s
user   0m0.017s
sys   0m0.001s

With aggressive optimizations it appears that counter intuitively the loopdeloop2 code you posted is actually slower. Also that with aggressive optimizations the compiler is actually smart enough to optimize loopdeloop and stack into roughly the same thing.

These are fairly simple algorithms though so this impressive optimization should be taken with a grain of salt.
"If art interprets our dreams, the computer executes them in the guise of programs!" - SICP

Image

“If at first, the idea is not absurd, then there is no hope for it” - Albert Einstein
User avatar
thetan
Contributor
Contributor
 
Posts: 657
Joined: Thu Dec 17, 2009 6:58 pm
Location: Various Bay Area Cities, California
Blog: View Blog (0)


Re: Need Help with a program in C

Post by tgoe on Wed Dec 15, 2010 7:42 am
([msg=50473]see Re: Need Help with a program in C[/msg])

Yeah by fewer instructions I don't mean size of the binary necessarily but the number of times an instruction is executed. I think loopdeloop is faster because dummy_io doesn't do much and when fully optimized out translates to an increment operation vs loopdeloop2 doing an assignment in the inner for loop. And of course the original isn't counting its own function calls either.
User avatar
tgoe
Contributor
Contributor
 
Posts: 673
Joined: Sun Sep 28, 2008 2:33 pm
Location: q3dm7
Blog: View Blog (0)


Previous

Return to C and C++

Who is online

Users browsing this forum: No registered users and 0 guests