It is currently Sat, 21-09-19, 0:55 GMT

All times are UTC




Post new topic Reply to topic  [ 196 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 14  Next
Author Message
 Post subject:
PostPosted: Sun, 16-09-07, 21:36 GMT 
Offline
Site Admin
User avatar

Joined: Fri, 31-08-07, 7:01 GMT
Posts: 4643
Location: Hamburg, Germany
dirkpitt wrote:
I'm back, and armed with a new power adapter I once more "fired up" F-TexTools.
Looks like I'm not the only person "burned" by an adapter. For example look at this guy's:
http://www.neowin.net/forum/index.php?showtopic=531840

A UPC at home? Somehow I think a fire extinguisher is going to be more useful in my case.. :wink:



My God! Computing is dangerous!
Notably, working with the F-TexTools!!! ;-)

Quote:
Anyway, back to work.
I ran test 2 and 3 again. This time the tiles look perfect, and here are the time results:

Test 2 (generate specmap): 26 min 11 seconds (previous result: 30 mins)
=> Fridger * 6.3
Test 3 (generate 2048 png tiles): 88 min 30 seconds (previous result: 99 mins)
=> Fridger * 5.3

So proof that a new power adapter upgrades my computer's speed? :shock:
Of course not! What probably happened, was that my ex-adapter, before it burned out,
was intermittently failing. My laptop - like many others - reduces processor speed when not
running off AC. If the adapter doesn't work properly, the laptop switches to battery mode,
hence the poor benchmarks previously.

Even so, 6x and 5x slower than Fridger's results is not so "hot". :wink: I think the reason,
which I hinted at before, may be because of my slow HD link speed. I'll try and benchmark
by omitting the png steps, later today.


Anyway, thanks very much for your benchmarks! The times are a bit slow, indeed... Yet I remember that Runar also had a pretty slow performance when testing the nmtools with his older MAC...

Did you notice, I already posted a slight update. See Announcements...
Also I introduced a locked thread where people can always find the links to the latest tool versions for a quick download.

Did you see the nice F-TexTools image
http://www.forum.celestialmatters.org/v ... 44&start=6

and here

http://www.celestialmatters.org/cm/index.shtml

Cheers,
Fridger

_________________
Image


Top
 Profile  
 
 Post subject:
PostPosted: Mon, 17-09-07, 16:31 GMT 
Offline
User avatar

Joined: Tue, 04-09-07, 2:32 GMT
Posts: 433
Location: South Korea
Ok, I tried out test [1] again, but with the following changes:

- Read and write entirely from the local hard disk, thus IO speed should now be as fast as possible
- Removed bin2png step

This still took about 64 mins 39 seconds, and it's not even converted to PNG yet.
I noticed that most of the cpu time is being taken up by the tx2pow2 tool.
A profile (done using Apple's excellent "Shark" app, although gprof would probably work fine too) revealed that:
- Disk io accounts for about 14% execution time
- Image processing accounts for about 76% (mostly floating-point double load instructions)
- The rest, almost 9%, is taken up by floor().

I'd be curious to know similar numbers for other platforms.
Meanwhile I'll try out a couple of things like tweaking the compile options to see if it makes a difference.


Top
 Profile  
 
 Post subject:
PostPosted: Mon, 17-09-07, 19:10 GMT 
Offline
Site Admin
User avatar

Joined: Fri, 31-08-07, 7:01 GMT
Posts: 4643
Location: Hamburg, Germany
dirkpitt wrote:
Ok, I tried out test [1] again, but with the following changes:

- Read and write entirely from the local hard disk, thus IO speed should now be as fast as possible
- Removed bin2png step

This still took about 64 mins 39 seconds, and it's not even converted to PNG yet.
I noticed that most of the cpu time is being taken up by the tx2pow2 tool.
A profile (done using Apple's excellent "Shark" app, although gprof would probably work fine too) revealed that:
- Disk io accounts for about 14% execution time
- Image processing accounts for about 76% (mostly floating-point double load instructions)
- The rest, almost 9%, is taken up by floor().

I'd be curious to know similar numbers for other platforms.
Meanwhile I'll try out a couple of things like tweaking the compile options to see if it makes a difference.


I have asked Runar to also try a bench run with his much faster 2.33 GHz MAC Intel / 2 GB RAM. He promised to do it tomorrow. My former PhD student has also one of the weaker Powerbooks (like 1.5 GHz / 1.25 GB RAM) and said the slow performance would not surprise him ;-)

Yes, I know about floor() taking quite some time. There is a faster alternative, but I was not sure whether it exists on all OS's. We might try.

tx2pow2 is definitely a NON-trivial algorithm with plenty of steps to calculate. So no wonder that it takes it's time ;-)

I'll also do a test without bin2png.

Thanks,
Fridger

_________________
Image


Top
 Profile  
 
 Post subject:
PostPosted: Mon, 17-09-07, 21:25 GMT 
Offline
Site Admin
User avatar

Joined: Fri, 31-08-07, 7:01 GMT
Posts: 4643
Location: Hamburg, Germany
DW,

I also run the benchmarks for example 1) without bin2png on my 2 computers:

The results are only a /little bit/ faster this way:

Linux (Desktop, 3.2 GHz P4): 12.2 min instead of 13.1 min with bin2png
Windows (notebook, 2 GHz Core2Duo) 11.1 min ~ same as with bin2png.

Bye Fridger

_________________
Image


Top
 Profile  
 
 Post subject:
PostPosted: Mon, 17-09-07, 21:30 GMT 
Offline
Site Admin
User avatar

Joined: Fri, 31-08-07, 7:01 GMT
Posts: 4643
Location: Hamburg, Germany
DW,

another remark: While my Desktop machine is not the latest (> 3 years old) I always build these machines myself and optimize the components carefully. So, in particular, I used 3 GB /CL2/ memory(!) and a SATA 133 harddisk that is providing 67 MB/sec solid performance! So whenever there is plenty of IO, my Desktop system does pretty well (still!)

Bye Fridger

_________________
Image


Top
 Profile  
 
 Post subject:
PostPosted: Mon, 17-09-07, 22:59 GMT 
Offline
User avatar

Joined: Tue, 04-09-07, 2:32 GMT
Posts: 433
Location: South Korea
t00fri wrote:
I have asked Runar to also try a bench run with his much faster 2.33 GHz MAC Intel / 2 GB RAM. He promised to do it tomorrow. My former PhD student has also one of the weaker Powerbooks (like 1.5 GHz / 1.25 GB RAM) and said the slow performance would not surprise him ;-)


In that case, I suppose you'll need the binaries that I've been using?


Top
 Profile  
 
 Post subject:
PostPosted: Mon, 17-09-07, 23:47 GMT 
Offline
Site Admin
User avatar

Joined: Fri, 31-08-07, 7:01 GMT
Posts: 4643
Location: Hamburg, Germany
dirkpitt wrote:
t00fri wrote:
I have asked Runar to also try a bench run with his much faster 2.33 GHz MAC Intel / 2 GB RAM. He promised to do it tomorrow. My former PhD student has also one of the weaker Powerbooks (like 1.5 GHz / 1.25 GB RAM) and said the slow performance would not surprise him ;-)


In that case, I suppose you'll need the binaries that I've been using?


Of course, I forgot that he can't compile himself ;-) Runar would need the binaries in the first place. But it's of course also fine with me.

Bye Fridger

_________________
Image


Top
 Profile  
 
 Post subject:
PostPosted: Tue, 18-09-07, 0:17 GMT 
Offline
User avatar

Joined: Tue, 04-09-07, 2:32 GMT
Posts: 433
Location: South Korea
Ok, I just sent you a mail with the binaries as a zip attachment.


Top
 Profile  
 
 Post subject:
PostPosted: Tue, 18-09-07, 0:20 GMT 
Offline
User avatar

Joined: Tue, 04-09-07, 2:32 GMT
Posts: 433
Location: South Korea
t00fri wrote:
Yes, I know about floor() taking quite some time. There is a faster alternative, but I was not sure whether it exists on all OS's. We might try.


There is a documented assembly alternative for powerpc**, and there's an SSE2 optimization for intel procs (it's in the Squish code). I could try the powerpc speedup tonight. Note: it doesn't require vectorizing the code.

**For the curious, here's Apple's official doc for a fast floor (powerpc only), misleadingly hidden inside their Altivec docs:
http://developer.apple.com/hardwaredrivers/ve/algorithms.html#fast_floor
Code:
    #if defined( __GNUC__ )
    #include <ppc_intrinsics.h>
    #endif

    inline double fastfloor( double f )
    {
        double c = __fsel( f, -0x1.0p+52, 0x1.0p+52 );
        double result = (f - c) + c;

    #if 1
        /* This case is likely a win for ordinary code */
        if( f < result ) result -= 1.0;

    #else
        /* This case is probably a win for inlining into */
        /* highly parallel/unrolled code */
        result -= __fsel( f - result, 0.0, 1.0 );

    #endif

        return result;
    }


I'm not sure whether this will have a huge impact on performance, but at least it'll be easy to check.


Top
 Profile  
 
 Post subject:
PostPosted: Tue, 18-09-07, 7:59 GMT 
Offline
User avatar

Joined: Tue, 04-09-07, 2:32 GMT
Posts: 433
Location: South Korea
Assembly optimizations aside..
Just to eliminate the more obvious candidates for optimization, is the resampling method currently used in tx2pow2 a one-pass interpolation? Bilinear filtering? Looking at the code, the apparent use of a 3x3 kernel, etc seems to indicate a one-pass approach (O(N^2)), but correct me if I'm wrong.


Top
 Profile  
 
 Post subject:
PostPosted: Thu, 20-09-07, 7:38 GMT 
Offline
User avatar

Joined: Tue, 04-09-07, 2:32 GMT
Posts: 433
Location: South Korea
The dwarf 5400x2700 test files seem to work nicely.
I'm using them right now on a slow Windows machine to test out a two-pass modification of tx2pow2.


Top
 Profile  
 
 Post subject:
PostPosted: Fri, 21-09-07, 23:01 GMT 
Offline
Site Admin
User avatar

Joined: Fri, 31-08-07, 7:01 GMT
Posts: 4643
Location: Hamburg, Germany
DW,

following up your suggestion to use the assembly version of nint() for Windows from here
http://ldesoras.free.fr/doc/articles/rounding_en.pdf
in tx2pow2, I also find a substantial

speed increase: 24.2 %

So after that we only need a similar timesaver for Linux!

Cheers,
Fridger

PS: Why don't you integrate and test the MAC version first in the tx2pow2 code, then I do the Windows part?

But testing on
#if defined( __GNUC__ )

isn't enough to distinguish MAC's from the rest of the gang in the preprocessor statements?

_________________
Image


Top
 Profile  
 
 Post subject:
PostPosted: Sat, 22-09-07, 1:26 GMT 
Offline
User avatar

Joined: Tue, 04-09-07, 2:32 GMT
Posts: 433
Location: South Korea
Actually you tell Mac PPC apart from x86 (Windows, Linux, Intel Mac) via the __ppc__ macro.
I'll try and translate the Visual Studio style __asm syntax into equivalent gcc inline syntax so that it will compile on Linux and Intel Macs.


Top
 Profile  
 
 Post subject:
PostPosted: Sat, 22-09-07, 2:24 GMT 
Offline
User avatar

Joined: Tue, 04-09-07, 2:32 GMT
Posts: 433
Location: South Korea
***Update: I've discovered what appears to be even faster floor() and nint() functions! ***
These versions do not use assembly, but instead exploit ieee floating point bit layout and normalization cleverly.
Since assembly is not used, it's fully cross-platform! (I tested on Windows, Mac PPC - should also work on Linux)
Web site: "Know your FPU: Fixing Float Fast"
This guy apparently worked on Poser and for Viewpoint so he knows what he's talking about.


Here are preliminary results of optimizing tx2pow2 and tx2half.
Optimizations used:
- [s]Assembly floor() on Mac PPC and nint() on Windows[/s]
- Sree Kotay's fast xs_FloorToInt() and xs_RoundToInt() using "magic numbers" (no assembly code!)
- Two-pass algorithm for tx2pow2

== Results (Test 1 using 86400x43200 dataset, final png step omitted) ==

* Mac PPC G4 1.33GHz: 55 mins 49 sec -> 42 mins 25 sec (assembly floor()) -> 40 mins 5 sec (magic numbers)
Speedup (magic numbers): 15 mins 44 sec (39% improvement)

* Windows Core 2 Duo 2.13GHz: 9 mins 50 sec -> 5 mins 7 sec (assembly nint()) -> 5 mins (magic numbers)
Speedup (magic numbers): 4 mins 36 sec (97% improvement)

We're looking at a very encouraging 2x speedup on a Core 2 Duo!
The speedup is not as great on my slower Mac PPC config, but it's nonetheless quite significant.


Top
 Profile  
 
 Post subject:
PostPosted: Sat, 22-09-07, 8:29 GMT 
Offline
Site Admin
User avatar

Joined: Fri, 31-08-07, 7:01 GMT
Posts: 4643
Location: Hamburg, Germany
Amazing!, DW,

just got up ;-) and had a nice Saturday morning breakfast in the garden... Now let's look closer.


Cheers,
F.

_________________
Image


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 196 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 14  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group