Switch from 32 bit to 64 bit - Is is worth the hassle? [Archive]

View Full Version : Switch from 32 bit to 64 bit - Is is worth the hassle?

FD[_4_]

November 26th 12, 04:03 AM

I purchased windows 8 upgradeon line on October 26th with 32 bit vista and
32 bit xp.

I purchased 4 upgrade licences and burnt a 32 dvd fro iso for installation

I purchased ONE DVD from Microsoft which came in the mail 2 days ago -
almost one month later. It has 64 bit and 32bit DVD

Only one of the 4 computers I have has more than 4 GB of memory.
It has 8 GB as it was custom built just in September and has
ivy bridge 3570 processor and an appropriate motherboard.

I know that 4 gigs of my memory are not beings used. It only cost
me about 20 dollars extra to go from 4 gigs to 8 gigs so it can
remain parked on my motherboard for years and it will not bother me.

I do not do any heavy computer work. 32 bit programs are fine for me.

I did do a test installation of 64 bit win 8 and its footprint is
about 40% higher.

Is there any reason to do a fresh installation of 64 bit?

FD

Paul

November 26th 12, 04:44 AM

FD wrote:
> I purchased windows 8 upgradeon line on October 26th with 32 bit vista and
> 32 bit xp.
>
> I purchased 4 upgrade licences and burnt a 32 dvd fro iso for installation
>
> I purchased ONE DVD from Microsoft which came in the mail 2 days ago -
> almost one month later. It has 64 bit and 32bit DVD
>
> Only one of the 4 computers I have has more than 4 GB of memory.
> It has 8 GB as it was custom built just in September and has
> ivy bridge 3570 processor and an appropriate motherboard.
>
> I know that 4 gigs of my memory are not beings used. It only cost
> me about 20 dollars extra to go from 4 gigs to 8 gigs so it can
> remain parked on my motherboard for years and it will not bother me.
>
> I do not do any heavy computer work. 32 bit programs are fine for me.
>
> I did do a test installation of 64 bit win 8 and its footprint is
> about 40% higher.
>
> Is there any reason to do a fresh installation of 64 bit?
>
> FD

There are a very limited number of software products,
that come in "64 bit only" now. Adobe is the company
pushing this. You'd need barrels of cash to afford
some of their stuff. And if so, you'd want the 64 bit OS.

Otherwise, I can't think of a user-centric reason for caring.

The biggest difference I've ever seen this make, was when
running some "special math". I used the GMP library, which
allows extended precision arithmetic. The program was
asked to calculate numbers with 40,000,000 digits. (This
is for Mersenne Primes.) I compiled the program against
32 bit GMP and against 64 bit GMP. The 64 bit version
could run one loop of the code, about 70% faster than the
32 bit version could. But considering how slow a loop was
anyway, if the completion time was 100 years, it's not
like I bothered running the code for real. The nature of
the code, was too slow to be practical. (They'll find
all the Primes, before my code finished.)

http://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library

For most other purposes, you might see a 5% difference,
as continuous math operations are not typical of things
like Microsoft Word, or perhaps your web browser.

Note that, on Intel processors, there is a slight loss
of performance when switching from 32 bit to 64 bit
operations. There is an internal packing operation in
the execution pipeline, that combines two 32 bit operations
and carries them through the pipe. When you switch to
64 bit ("pure") code, the packing operation can no longer
be done, which reduces the rate stuff moves through that
part of the pipe. The same thing doesn't happen on
AMD64, in a way, because both sizes are slow :-)

I suppose this topic is fun, if you do nothing but
benchmark stuff :-) If your processor is "sufficiently fast",
you probably don't care.

Paul

Ed Cryer

November 26th 12, 12:12 PM

Paul wrote:
> FD wrote:
>> I purchased windows 8 upgradeon line on October 26th with 32 bit vista
>> and
>> 32 bit xp.
>>
>> I purchased 4 upgrade licences and burnt a 32 dvd fro iso for
>> installation
>>
>> I purchased ONE DVD from Microsoft which came in the mail 2 days ago -
>> almost one month later. It has 64 bit and 32bit DVD
>>
>> Only one of the 4 computers I have has more than 4 GB of memory.
>> It has 8 GB as it was custom built just in September and has
>> ivy bridge 3570 processor and an appropriate motherboard.
>>
>> I know that 4 gigs of my memory are not beings used. It only cost
>> me about 20 dollars extra to go from 4 gigs to 8 gigs so it can
>> remain parked on my motherboard for years and it will not bother me.
>>
>> I do not do any heavy computer work. 32 bit programs are fine for me.
>>
>> I did do a test installation of 64 bit win 8 and its footprint is
>> about 40% higher.
>>
>> Is there any reason to do a fresh installation of 64 bit?
>>
>> FD
>
> There are a very limited number of software products,
> that come in "64 bit only" now. Adobe is the company
> pushing this. You'd need barrels of cash to afford
> some of their stuff. And if so, you'd want the 64 bit OS.
>
> Otherwise, I can't think of a user-centric reason for caring.
>
> The biggest difference I've ever seen this make, was when
> running some "special math". I used the GMP library, which
> allows extended precision arithmetic. The program was
> asked to calculate numbers with 40,000,000 digits. (This
> is for Mersenne Primes.) I compiled the program against
> 32 bit GMP and against 64 bit GMP. The 64 bit version
> could run one loop of the code, about 70% faster than the
> 32 bit version could. But considering how slow a loop was
> anyway, if the completion time was 100 years, it's not
> like I bothered running the code for real. The nature of
> the code, was too slow to be practical. (They'll find
> all the Primes, before my code finished.)
>
> http://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library
>
> For most other purposes, you might see a 5% difference,
> as continuous math operations are not typical of things
> like Microsoft Word, or perhaps your web browser.
>
> Note that, on Intel processors, there is a slight loss
> of performance when switching from 32 bit to 64 bit
> operations. There is an internal packing operation in
> the execution pipeline, that combines two 32 bit operations
> and carries them through the pipe. When you switch to
> 64 bit ("pure") code, the packing operation can no longer
> be done, which reduces the rate stuff moves through that
> part of the pipe. The same thing doesn't happen on
> AMD64, in a way, because both sizes are slow :-)
>
> I suppose this topic is fun, if you do nothing but
> benchmark stuff :-) If your processor is "sufficiently fast",
> you probably don't care.
>
> Paul

I once wrote a program to do a knight's tour of a chessboard; on an NCR
8250 with full monitor display. Simple trial and error technique,
back-tracking when dead-end reached, and all attempts stored on an
internal table.
It looked pretty good running. You could input any starting point. It
went from empty board to about 50% filled in seconds, then wiped the
latest branches off and started new ones.
I ran it for about an hour finally with no solution found.

I wonder how long it would take on a modern 3GHz processor with 8GB RAM
and 64-bit architecture.

Ed

Paul

November 26th 12, 01:11 PM

On 26/11/2012 7:12 AM, Ed Cryer wrote:
> Paul wrote:
>> FD wrote:
>>> I purchased windows 8 upgradeon line on October 26th with 32 bit vista
>>> and
>>> 32 bit xp.
>>>
>>> I purchased 4 upgrade licences and burnt a 32 dvd fro iso for
>>> installation
>>>
>>> I purchased ONE DVD from Microsoft which came in the mail 2 days ago -
>>> almost one month later. It has 64 bit and 32bit DVD
>>>
>>> Only one of the 4 computers I have has more than 4 GB of memory.
>>> It has 8 GB as it was custom built just in September and has
>>> ivy bridge 3570 processor and an appropriate motherboard.
>>>
>>> I know that 4 gigs of my memory are not beings used. It only cost
>>> me about 20 dollars extra to go from 4 gigs to 8 gigs so it can
>>> remain parked on my motherboard for years and it will not bother me.
>>>
>>> I do not do any heavy computer work. 32 bit programs are fine for me.
>>>
>>> I did do a test installation of 64 bit win 8 and its footprint is
>>> about 40% higher.
>>>
>>> Is there any reason to do a fresh installation of 64 bit?
>>>
>>> FD
>>
>> There are a very limited number of software products,
>> that come in "64 bit only" now. Adobe is the company
>> pushing this. You'd need barrels of cash to afford
>> some of their stuff. And if so, you'd want the 64 bit OS.
>>
>> Otherwise, I can't think of a user-centric reason for caring.
>>
>> The biggest difference I've ever seen this make, was when
>> running some "special math". I used the GMP library, which
>> allows extended precision arithmetic. The program was
>> asked to calculate numbers with 40,000,000 digits. (This
>> is for Mersenne Primes.) I compiled the program against
>> 32 bit GMP and against 64 bit GMP. The 64 bit version
>> could run one loop of the code, about 70% faster than the
>> 32 bit version could. But considering how slow a loop was
>> anyway, if the completion time was 100 years, it's not
>> like I bothered running the code for real. The nature of
>> the code, was too slow to be practical. (They'll find
>> all the Primes, before my code finished.)
>>
>> http://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library
>>
>> For most other purposes, you might see a 5% difference,
>> as continuous math operations are not typical of things
>> like Microsoft Word, or perhaps your web browser.
>>
>> Note that, on Intel processors, there is a slight loss
>> of performance when switching from 32 bit to 64 bit
>> operations. There is an internal packing operation in
>> the execution pipeline, that combines two 32 bit operations
>> and carries them through the pipe. When you switch to
>> 64 bit ("pure") code, the packing operation can no longer
>> be done, which reduces the rate stuff moves through that
>> part of the pipe. The same thing doesn't happen on
>> AMD64, in a way, because both sizes are slow :-)
>>
>> I suppose this topic is fun, if you do nothing but
>> benchmark stuff :-) If your processor is "sufficiently fast",
>> you probably don't care.
>>
>> Paul
>
> I once wrote a program to do a knight's tour of a chessboard; on an NCR 8250 with full monitor display. Simple trial and error technique, back-tracking when dead-end reached, and all attempts stored on an internal table.
> It looked pretty good running. You could input any starting point. It went from empty board to about 50% filled in seconds, then wiped the latest branches off and started new ones.
> I ran it for about an hour finally with no solution found.
>
> I wonder how long it would take on a modern 3GHz processor with 8GB RAM and 64-bit architecture.
>
> Ed

My ZX-81 used to play chess.

Single ply depth search, about 30 minutes per move.
Your game takes all day. Like your opponent was a
real deep thinker.

My current computer is at least 3000 times faster, as
moves on there, seem to take in the 1 second range,
and probably with a different number of plies.

The ZX-81 was playing chess with the 16KB RAM pack added.
As 2KB wasn't enough to play chess :-)

The 64 bit only helps if you have profitable ways to use
the entire register. The GMP library gets a speedup because
the register width gets fully used. And instead of doubling
the speed, you get 70% more speed. Lots of other things,
don't make use of the register width.

I'd say the power of the 3GHz processor, doesn't always
get applied. Plenty of stuff we do on computers, doesn't
seem to scale that well, and leaves you with the feeling
it should have run faster.

Paul

Timothy Daniels[_5_]

November 27th 12, 06:11 PM

"Paul" wrote:
> [ ..... ]
> as continuous math operations are not typical of things
> like Microsoft Word, or perhaps your web browser.

By "continuous" do you mean "floating point"?

*TimDaniels*

Paul

November 27th 12, 09:36 PM

Timothy Daniels wrote:
>
> "Paul" wrote:
>> [ ..... ]
>> as continuous math operations are not typical of things
>> like Microsoft Word, or perhaps your web browser.
>
> By "continuous" do you mean "floating point"?
>
> *TimDaniels*

Microsoft Word would likely have no long sequences of
floating point instructions

FMUL
FDIV

nor would it have long sequences of integer operations, like

MUL
DIV

That sort of thing.

The occasional INC or DEC, shift_left, shift_right,
comparison, AND, OR, XOR, that's not "math" for me.
A lot of those can be done with the regular ALU.
Whereas a MUL or DIV or FMUL or FDIV, requires something
with a lot more gates and complexity (like a different
functional unit).

Lots of programs do a ton of branches and comparisons,
using variables to store logic states. So the code
doesn't really challenge the processor that much.
There have been processors in the past, if you
write assembler code and put a hundred FMUL, FDIV...
type instructions together, the processor will actually
tip over :-) It's because compilers don't produce such
sequences, that the affected processors work just fine,
and "nobody notices". The processor internal noise problem
was only noticed by synthetic testing (and only months
after the processor was released), using sequences a
compiler would not normally produce. But a practitioner,
using carefully crafted assembler code, might succeed.

An example of hand-optimized code, is Prime95, where
a lot of the code used to search for Mersenne Primes
is written in assembler. (Custom FFTs.) A developer
at Microsoft, making a copy of Word, uses a high level
language, and the "power sucking code density" isn't all
that great in regular compiler output. But there
are still twits around, who write programs entirely
in assembler (twits who do it for no demonstrably
good reason). And they insist on showing you stacks
of paper printout, to demonstrate all the work and
agony they went through (I've worked with a couple
of those people :-) ) People who program in high
level languages, don't usually try to impress you
with stacks of paper output. The assembler people
seem to like to print out their work, and then
wave it around (or, use it as a seat to sit on).

In the GMP library, sequences of 32 bit math instructions,
can be replaced by sequences of half as many 64 bit
instructions. And that ends up being around 70% faster.
Normal code doesn't have the density of such improvements,
to see that kind of speedup. Compares and branches,
the speed doesn't change.

Paul

Robin Bignall

November 28th 12, 12:22 AM

On Tue, 27 Nov 2012 16:36:15 -0500, Paul > wrote:

>Timothy Daniels wrote:
>>
>> "Paul" wrote:
>>> [ ..... ]
>>> as continuous math operations are not typical of things
>>> like Microsoft Word, or perhaps your web browser.
>>
>> By "continuous" do you mean "floating point"?
>>
>> *TimDaniels*
>
>Microsoft Word would likely have no long sequences of
>floating point instructions
>
> FMUL
> FDIV
>
>nor would it have long sequences of integer operations, like
>
> MUL
> DIV
>
>That sort of thing.
>
>The occasional INC or DEC, shift_left, shift_right,
>comparison, AND, OR, XOR, that's not "math" for me.
>A lot of those can be done with the regular ALU.
>Whereas a MUL or DIV or FMUL or FDIV, requires something
>with a lot more gates and complexity (like a different
>functional unit).
>
>Lots of programs do a ton of branches and comparisons,
>using variables to store logic states. So the code
>doesn't really challenge the processor that much.
>There have been processors in the past, if you
>write assembler code and put a hundred FMUL, FDIV...
>type instructions together, the processor will actually
>tip over :-) It's because compilers don't produce such
>sequences, that the affected processors work just fine,
>and "nobody notices". The processor internal noise problem
>was only noticed by synthetic testing (and only months
>after the processor was released), using sequences a
>compiler would not normally produce. But a practitioner,
>using carefully crafted assembler code, might succeed.
>
>An example of hand-optimized code, is Prime95, where
>a lot of the code used to search for Mersenne Primes
>is written in assembler. (Custom FFTs.) A developer
>at Microsoft, making a copy of Word, uses a high level
>language, and the "power sucking code density" isn't all
>that great in regular compiler output. But there
>are still twits around, who write programs entirely
>in assembler (twits who do it for no demonstrably
>good reason). And they insist on showing you stacks
>of paper printout, to demonstrate all the work and
>agony they went through (I've worked with a couple
>of those people :-) ) People who program in high
>level languages, don't usually try to impress you
>with stacks of paper output. The assembler people
>seem to like to print out their work, and then
>wave it around (or, use it as a seat to sit on).
>
>In the GMP library, sequences of 32 bit math instructions,
>can be replaced by sequences of half as many 64 bit
>instructions. And that ends up being around 70% faster.
>Normal code doesn't have the density of such improvements,
>to see that kind of speedup. Compares and branches,
>the speed doesn't change.
>
Interesting post. When I was an IBM SE many decades ago I had a seismic
customer that was very proud of its software being more advanced than
many of its competitors because of the brilliance of their algorithms
AND the fact that they wrote in assembler for Univac 1108s, taking
advantage of detailed knowledge of the hardware. They had a dozen or
more mathematicians working full time on this stuff, which they felt
gave them the edge.

--
Robin Bignall
Herts, England

Paul

November 28th 12, 12:44 AM

Robin Bignall wrote:
> On Tue, 27 Nov 2012 16:36:15 -0500, Paul > wrote:
>
>> Timothy Daniels wrote:
>>> "Paul" wrote:
>>>> [ ..... ]
>>>> as continuous math operations are not typical of things
>>>> like Microsoft Word, or perhaps your web browser.
>>> By "continuous" do you mean "floating point"?
>>>
>>> *TimDaniels*
>> Microsoft Word would likely have no long sequences of
>> floating point instructions
>>
>> FMUL
>> FDIV
>>
>> nor would it have long sequences of integer operations, like
>>
>> MUL
>> DIV
>>
>> That sort of thing.
>>
>> The occasional INC or DEC, shift_left, shift_right,
>> comparison, AND, OR, XOR, that's not "math" for me.
>> A lot of those can be done with the regular ALU.
>> Whereas a MUL or DIV or FMUL or FDIV, requires something
>> with a lot more gates and complexity (like a different
>> functional unit).
>>
>> Lots of programs do a ton of branches and comparisons,
>> using variables to store logic states. So the code
>> doesn't really challenge the processor that much.
>> There have been processors in the past, if you
>> write assembler code and put a hundred FMUL, FDIV...
>> type instructions together, the processor will actually
>> tip over :-) It's because compilers don't produce such
>> sequences, that the affected processors work just fine,
>> and "nobody notices". The processor internal noise problem
>> was only noticed by synthetic testing (and only months
>> after the processor was released), using sequences a
>> compiler would not normally produce. But a practitioner,
>> using carefully crafted assembler code, might succeed.
>>
>> An example of hand-optimized code, is Prime95, where
>> a lot of the code used to search for Mersenne Primes
>> is written in assembler. (Custom FFTs.) A developer
>> at Microsoft, making a copy of Word, uses a high level
>> language, and the "power sucking code density" isn't all
>> that great in regular compiler output. But there
>> are still twits around, who write programs entirely
>> in assembler (twits who do it for no demonstrably
>> good reason). And they insist on showing you stacks
>> of paper printout, to demonstrate all the work and
>> agony they went through (I've worked with a couple
>> of those people :-) ) People who program in high
>> level languages, don't usually try to impress you
>> with stacks of paper output. The assembler people
>> seem to like to print out their work, and then
>> wave it around (or, use it as a seat to sit on).
>>
>> In the GMP library, sequences of 32 bit math instructions,
>> can be replaced by sequences of half as many 64 bit
>> instructions. And that ends up being around 70% faster.
>> Normal code doesn't have the density of such improvements,
>> to see that kind of speedup. Compares and branches,
>> the speed doesn't change.
>>
> Interesting post. When I was an IBM SE many decades ago I had a seismic
> customer that was very proud of its software being more advanced than
> many of its competitors because of the brilliance of their algorithms
> AND the fact that they wrote in assembler for Univac 1108s, taking
> advantage of detailed knowledge of the hardware. They had a dozen or
> more mathematicians working full time on this stuff, which they felt
> gave them the edge.
>

If you have an instruction level simulator, then yes, you
might be able to hand tune certain loops that are the
critical path on some code. But behavioral simulators
aren't always available.

Modern architectures are complicated enough, you can't hope
to win the optimization battle, using only the instruction set
manual. Even with a behavioral simulator, it's still tough to do,
and takes hours to make the smallest improvement. Some processors
now are approaching ~1000 possible instructions, and that means
there could be multiple means to write short code segments. The
compiler contents itself with only a tiny percentage of those
instructions. Which begs the question, why do they keep adding
more instructions to the processors ? At least one person wrote
an article, asking them to stop :-)

Paul

Robin Bignall

November 28th 12, 04:18 PM

On Tue, 27 Nov 2012 19:44:14 -0500, Paul > wrote:

>Robin Bignall wrote:
>> On Tue, 27 Nov 2012 16:36:15 -0500, Paul > wrote:
>>
>>> Timothy Daniels wrote:
>>>> "Paul" wrote:
>>>>> [ ..... ]
>>>>> as continuous math operations are not typical of things
>>>>> like Microsoft Word, or perhaps your web browser.
>>>> By "continuous" do you mean "floating point"?
>>>>
>>>> *TimDaniels*
>>> Microsoft Word would likely have no long sequences of
>>> floating point instructions
>>>
>>> FMUL
>>> FDIV
>>>
>>> nor would it have long sequences of integer operations, like
>>>
>>> MUL
>>> DIV
>>>
>>> That sort of thing.
>>>
>>> The occasional INC or DEC, shift_left, shift_right,
>>> comparison, AND, OR, XOR, that's not "math" for me.
>>> A lot of those can be done with the regular ALU.
>>> Whereas a MUL or DIV or FMUL or FDIV, requires something
>>> with a lot more gates and complexity (like a different
>>> functional unit).
>>>
>>> Lots of programs do a ton of branches and comparisons,
>>> using variables to store logic states. So the code
>>> doesn't really challenge the processor that much.
>>> There have been processors in the past, if you
>>> write assembler code and put a hundred FMUL, FDIV...
>>> type instructions together, the processor will actually
>>> tip over :-) It's because compilers don't produce such
>>> sequences, that the affected processors work just fine,
>>> and "nobody notices". The processor internal noise problem
>>> was only noticed by synthetic testing (and only months
>>> after the processor was released), using sequences a
>>> compiler would not normally produce. But a practitioner,
>>> using carefully crafted assembler code, might succeed.
>>>
>>> An example of hand-optimized code, is Prime95, where
>>> a lot of the code used to search for Mersenne Primes
>>> is written in assembler. (Custom FFTs.) A developer
>>> at Microsoft, making a copy of Word, uses a high level
>>> language, and the "power sucking code density" isn't all
>>> that great in regular compiler output. But there
>>> are still twits around, who write programs entirely
>>> in assembler (twits who do it for no demonstrably
>>> good reason). And they insist on showing you stacks
>>> of paper printout, to demonstrate all the work and
>>> agony they went through (I've worked with a couple
>>> of those people :-) ) People who program in high
>>> level languages, don't usually try to impress you
>>> with stacks of paper output. The assembler people
>>> seem to like to print out their work, and then
>>> wave it around (or, use it as a seat to sit on).
>>>
>>> In the GMP library, sequences of 32 bit math instructions,
>>> can be replaced by sequences of half as many 64 bit
>>> instructions. And that ends up being around 70% faster.
>>> Normal code doesn't have the density of such improvements,
>>> to see that kind of speedup. Compares and branches,
>>> the speed doesn't change.
>>>
>> Interesting post. When I was an IBM SE many decades ago I had a seismic
>> customer that was very proud of its software being more advanced than
>> many of its competitors because of the brilliance of their algorithms
>> AND the fact that they wrote in assembler for Univac 1108s, taking
>> advantage of detailed knowledge of the hardware. They had a dozen or
>> more mathematicians working full time on this stuff, which they felt
>> gave them the edge.
>>
>
>If you have an instruction level simulator, then yes, you
>might be able to hand tune certain loops that are the
>critical path on some code. But behavioral simulators
>aren't always available.
>
>Modern architectures are complicated enough, you can't hope
>to win the optimization battle, using only the instruction set
>manual. Even with a behavioral simulator, it's still tough to do,
>and takes hours to make the smallest improvement. Some processors
>now are approaching ~1000 possible instructions, and that means
>there could be multiple means to write short code segments. The
>compiler contents itself with only a tiny percentage of those
>instructions. Which begs the question, why do they keep adding
>more instructions to the processors ? At least one person wrote
>an article, asking them to stop :-)
>
Heh! I wonder how much continued effort is put into application
development systems such as Delphi in order to get the best translation
from HLL to running code.
--
Robin Bignall
Herts, England

charlie[_2_]

November 29th 12, 10:48 PM

On 11/28/2012 11:18 AM, Robin Bignall wrote:
> On Tue, 27 Nov 2012 19:44:14 -0500, Paul > wrote:
>
>> Robin Bignall wrote:
>>> On Tue, 27 Nov 2012 16:36:15 -0500, Paul > wrote:
>>>
>>>> Timothy Daniels wrote:
>>>>> "Paul" wrote:
>>>>>> [ ..... ]
>>>>>> as continuous math operations are not typical of things
>>>>>> like Microsoft Word, or perhaps your web browser.
>>>>> By "continuous" do you mean "floating point"?
>>>>>
>>>>> *TimDaniels*
>>>> Microsoft Word would likely have no long sequences of
>>>> floating point instructions
>>>>
>>>> FMUL
>>>> FDIV
>>>>
>>>> nor would it have long sequences of integer operations, like
>>>>
>>>> MUL
>>>> DIV
>>>>
>>>> That sort of thing.
>>>>
>>>> The occasional INC or DEC, shift_left, shift_right,
>>>> comparison, AND, OR, XOR, that's not "math" for me.
>>>> A lot of those can be done with the regular ALU.
>>>> Whereas a MUL or DIV or FMUL or FDIV, requires something
>>>> with a lot more gates and complexity (like a different
>>>> functional unit).
>>>>
>>>> Lots of programs do a ton of branches and comparisons,
>>>> using variables to store logic states. So the code
>>>> doesn't really challenge the processor that much.
>>>> There have been processors in the past, if you
>>>> write assembler code and put a hundred FMUL, FDIV...
>>>> type instructions together, the processor will actually
>>>> tip over :-) It's because compilers don't produce such
>>>> sequences, that the affected processors work just fine,
>>>> and "nobody notices". The processor internal noise problem
>>>> was only noticed by synthetic testing (and only months
>>>> after the processor was released), using sequences a
>>>> compiler would not normally produce. But a practitioner,
>>>> using carefully crafted assembler code, might succeed.
>>>>
>>>> An example of hand-optimized code, is Prime95, where
>>>> a lot of the code used to search for Mersenne Primes
>>>> is written in assembler. (Custom FFTs.) A developer
>>>> at Microsoft, making a copy of Word, uses a high level
>>>> language, and the "power sucking code density" isn't all
>>>> that great in regular compiler output. But there
>>>> are still twits around, who write programs entirely
>>>> in assembler (twits who do it for no demonstrably
>>>> good reason). And they insist on showing you stacks
>>>> of paper printout, to demonstrate all the work and
>>>> agony they went through (I've worked with a couple
>>>> of those people :-) ) People who program in high
>>>> level languages, don't usually try to impress you
>>>> with stacks of paper output. The assembler people
>>>> seem to like to print out their work, and then
>>>> wave it around (or, use it as a seat to sit on).
>>>>
>>>> In the GMP library, sequences of 32 bit math instructions,
>>>> can be replaced by sequences of half as many 64 bit
>>>> instructions. And that ends up being around 70% faster.
>>>> Normal code doesn't have the density of such improvements,
>>>> to see that kind of speedup. Compares and branches,
>>>> the speed doesn't change.
>>>>
>>> Interesting post. When I was an IBM SE many decades ago I had a seismic
>>> customer that was very proud of its software being more advanced than
>>> many of its competitors because of the brilliance of their algorithms
>>> AND the fact that they wrote in assembler for Univac 1108s, taking
>>> advantage of detailed knowledge of the hardware. They had a dozen or
>>> more mathematicians working full time on this stuff, which they felt
>>> gave them the edge.
>>>
>>
>> If you have an instruction level simulator, then yes, you
>> might be able to hand tune certain loops that are the
>> critical path on some code. But behavioral simulators
>> aren't always available.
>>
>> Modern architectures are complicated enough, you can't hope
>> to win the optimization battle, using only the instruction set
>> manual. Even with a behavioral simulator, it's still tough to do,
>> and takes hours to make the smallest improvement. Some processors
>> now are approaching ~1000 possible instructions, and that means
>> there could be multiple means to write short code segments. The
>> compiler contents itself with only a tiny percentage of those
>> instructions. Which begs the question, why do they keep adding
>> more instructions to the processors ? At least one person wrote
>> an article, asking them to stop :-)
>>
> Heh! I wonder how much continued effort is put into application
> development systems such as Delphi in order to get the best translation
> from HLL to running code.
>

The problems I saw some years back were related to unused and unneeded
code generated by then popular compilers. It was significant enough
that third party code analyzers were developed/used to locate extra
unused code in the compiler outputs. Sometimes the code had to be hand
patched to eliminate the problems.

In one case I'm aware of, C++ originated code was so slow and
inefficient, that the software was rewritten in assembly and machine,
using whatever was salvageable from the C++ output coding. The system
and software is still in use today, and is deployed around the world.

When someone is trying to shoot a missile up your rear, there isn't a
whole lot of time to do something about it!

Paul

November 29th 12, 11:36 PM

charlie wrote:
> On 11/28/2012 11:18 AM, Robin Bignall wrote:
>> On Tue, 27 Nov 2012 19:44:14 -0500, Paul > wrote:
>>
>>> Robin Bignall wrote:
>>>> On Tue, 27 Nov 2012 16:36:15 -0500, Paul > wrote:
>>>>
>>>>> Timothy Daniels wrote:
>>>>>> "Paul" wrote:
>>>>>>> [ ..... ]
>>>>>>> as continuous math operations are not typical of things
>>>>>>> like Microsoft Word, or perhaps your web browser.
>>>>>> By "continuous" do you mean "floating point"?
>>>>>>
>>>>>> *TimDaniels*
>>>>> Microsoft Word would likely have no long sequences of
>>>>> floating point instructions
>>>>>
>>>>> FMUL
>>>>> FDIV
>>>>>
>>>>> nor would it have long sequences of integer operations, like
>>>>>
>>>>> MUL
>>>>> DIV
>>>>>
>>>>> That sort of thing.
>>>>>
>>>>> The occasional INC or DEC, shift_left, shift_right,
>>>>> comparison, AND, OR, XOR, that's not "math" for me.
>>>>> A lot of those can be done with the regular ALU.
>>>>> Whereas a MUL or DIV or FMUL or FDIV, requires something
>>>>> with a lot more gates and complexity (like a different
>>>>> functional unit).
>>>>>
>>>>> Lots of programs do a ton of branches and comparisons,
>>>>> using variables to store logic states. So the code
>>>>> doesn't really challenge the processor that much.
>>>>> There have been processors in the past, if you
>>>>> write assembler code and put a hundred FMUL, FDIV...
>>>>> type instructions together, the processor will actually
>>>>> tip over :-) It's because compilers don't produce such
>>>>> sequences, that the affected processors work just fine,
>>>>> and "nobody notices". The processor internal noise problem
>>>>> was only noticed by synthetic testing (and only months
>>>>> after the processor was released), using sequences a
>>>>> compiler would not normally produce. But a practitioner,
>>>>> using carefully crafted assembler code, might succeed.
>>>>>
>>>>> An example of hand-optimized code, is Prime95, where
>>>>> a lot of the code used to search for Mersenne Primes
>>>>> is written in assembler. (Custom FFTs.) A developer
>>>>> at Microsoft, making a copy of Word, uses a high level
>>>>> language, and the "power sucking code density" isn't all
>>>>> that great in regular compiler output. But there
>>>>> are still twits around, who write programs entirely
>>>>> in assembler (twits who do it for no demonstrably
>>>>> good reason). And they insist on showing you stacks
>>>>> of paper printout, to demonstrate all the work and
>>>>> agony they went through (I've worked with a couple
>>>>> of those people :-) ) People who program in high
>>>>> level languages, don't usually try to impress you
>>>>> with stacks of paper output. The assembler people
>>>>> seem to like to print out their work, and then
>>>>> wave it around (or, use it as a seat to sit on).
>>>>>
>>>>> In the GMP library, sequences of 32 bit math instructions,
>>>>> can be replaced by sequences of half as many 64 bit
>>>>> instructions. And that ends up being around 70% faster.
>>>>> Normal code doesn't have the density of such improvements,
>>>>> to see that kind of speedup. Compares and branches,
>>>>> the speed doesn't change.
>>>>>
>>>> Interesting post. When I was an IBM SE many decades ago I had a
>>>> seismic
>>>> customer that was very proud of its software being more advanced than
>>>> many of its competitors because of the brilliance of their algorithms
>>>> AND the fact that they wrote in assembler for Univac 1108s, taking
>>>> advantage of detailed knowledge of the hardware. They had a dozen or
>>>> more mathematicians working full time on this stuff, which they felt
>>>> gave them the edge.
>>>>
>>>
>>> If you have an instruction level simulator, then yes, you
>>> might be able to hand tune certain loops that are the
>>> critical path on some code. But behavioral simulators
>>> aren't always available.
>>>
>>> Modern architectures are complicated enough, you can't hope
>>> to win the optimization battle, using only the instruction set
>>> manual. Even with a behavioral simulator, it's still tough to do,
>>> and takes hours to make the smallest improvement. Some processors
>>> now are approaching ~1000 possible instructions, and that means
>>> there could be multiple means to write short code segments. The
>>> compiler contents itself with only a tiny percentage of those
>>> instructions. Which begs the question, why do they keep adding
>>> more instructions to the processors ? At least one person wrote
>>> an article, asking them to stop :-)
>>>
>> Heh! I wonder how much continued effort is put into application
>> development systems such as Delphi in order to get the best translation
>> from HLL to running code.
>>
>
> The problems I saw some years back were related to unused and unneeded
> code generated by then popular compilers. It was significant enough
> that third party code analyzers were developed/used to locate extra
> unused code in the compiler outputs. Sometimes the code had to be hand
> patched to eliminate the problems.
>
> In one case I'm aware of, C++ originated code was so slow and
> inefficient, that the software was rewritten in assembly and machine,
> using whatever was salvageable from the C++ output coding. The system
> and software is still in use today, and is deployed around the world.
>
> When someone is trying to shoot a missile up your rear, there isn't a
> whole lot of time to do something about it!

I can't say I've looked at a lot of object oriented code.
The few times I have (and that was years ago), I see "name mangling"
adding to the size of the code.

http://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B_Name_Mangling

"It provides a way of encoding name and additional information about
a function, structure, class or another datatype in order to pass
more semantic information"

You'd see code mixed with text strings. And I don't know if selecting
an optimization level (like not using -g) would remove those or not.
Maybe stripped code would be missing those.
The code itself didn't seem that out of the ordinary.
(Looking with the "microscope", the code didn't seem wasteful.
The waste might be apparent when looking at more of the code.)

When I was playing with my GMP/Mersenne Prime example, I changed
from C++ code (the original code snippet) to C code (as the GMP
library supports both), and the biggest saving was the ability
to remove a few intermediate variables. And when a single variable
holds a 40,000,000 digit number, that's a significant saving. (A
single number in that case, busts the L3 cache on the processor.)
But in that case, I'm changing the style of the code. The resulting
C code is less readable for another reviewer. But, it gets the job
done faster.

It depends on the size of the project, just how practical it is
to manage the project with the older languages. You could start
the project with the objective of having code fast enough to
avoid "trying to shoot a missile up your rear", and end up with
the project failing entirely, and producing no finished code at all.
That's the danger. If you look at the history of large software
projects, it's not very encouraging at all.

One project I worked on, we had a software architect. No coders
hired yet (not the time for them). He estimated the size of code
needed for our product. And his estimate showed it would take
twenty minutes for all the object oriented code to just
*load* into the processor (or processor complex). Nothing
executed yet. Just loading. Well, we laughed our asses off
about that - at least I did. It's a good thing I wasn't
the manager, having to deal with that snippet of info.
I don't think the software architect and the manager, got
along that well.

Paul

charlie[_2_]

November 30th 12, 04:55 AM

On 11/29/2012 6:36 PM, Paul wrote:
> charlie wrote:
>> On 11/28/2012 11:18 AM, Robin Bignall wrote:
>>> On Tue, 27 Nov 2012 19:44:14 -0500, Paul > wrote:
>>>
>>>> Robin Bignall wrote:
>>>>> On Tue, 27 Nov 2012 16:36:15 -0500, Paul > wrote:
>>>>>
>>>>>> Timothy Daniels wrote:
>>>>>>> "Paul" wrote:
>>>>>>>> [ ..... ]
>>>>>>>> as continuous math operations are not typical of things
>>>>>>>> like Microsoft Word, or perhaps your web browser.
>>>>>>> By "continuous" do you mean "floating point"?
>>>>>>>
>>>>>>> *TimDaniels*
>>>>>> Microsoft Word would likely have no long sequences of
>>>>>> floating point instructions
>>>>>>
>>>>>> FMUL
>>>>>> FDIV
>>>>>>
>>>>>> nor would it have long sequences of integer operations, like
>>>>>>
>>>>>> MUL
>>>>>> DIV
>>>>>>
>>>>>> That sort of thing.
>>>>>>
>>>>>> The occasional INC or DEC, shift_left, shift_right,
>>>>>> comparison, AND, OR, XOR, that's not "math" for me.
>>>>>> A lot of those can be done with the regular ALU.
>>>>>> Whereas a MUL or DIV or FMUL or FDIV, requires something
>>>>>> with a lot more gates and complexity (like a different
>>>>>> functional unit).
>>>>>>
>>>>>> Lots of programs do a ton of branches and comparisons,
>>>>>> using variables to store logic states. So the code
>>>>>> doesn't really challenge the processor that much.
>>>>>> There have been processors in the past, if you
>>>>>> write assembler code and put a hundred FMUL, FDIV...
>>>>>> type instructions together, the processor will actually
>>>>>> tip over :-) It's because compilers don't produce such
>>>>>> sequences, that the affected processors work just fine,
>>>>>> and "nobody notices". The processor internal noise problem
>>>>>> was only noticed by synthetic testing (and only months
>>>>>> after the processor was released), using sequences a
>>>>>> compiler would not normally produce. But a practitioner,
>>>>>> using carefully crafted assembler code, might succeed.
>>>>>>
>>>>>> An example of hand-optimized code, is Prime95, where
>>>>>> a lot of the code used to search for Mersenne Primes
>>>>>> is written in assembler. (Custom FFTs.) A developer
>>>>>> at Microsoft, making a copy of Word, uses a high level
>>>>>> language, and the "power sucking code density" isn't all
>>>>>> that great in regular compiler output. But there
>>>>>> are still twits around, who write programs entirely
>>>>>> in assembler (twits who do it for no demonstrably
>>>>>> good reason). And they insist on showing you stacks
>>>>>> of paper printout, to demonstrate all the work and
>>>>>> agony they went through (I've worked with a couple
>>>>>> of those people :-) ) People who program in high
>>>>>> level languages, don't usually try to impress you
>>>>>> with stacks of paper output. The assembler people
>>>>>> seem to like to print out their work, and then
>>>>>> wave it around (or, use it as a seat to sit on).
>>>>>>
>>>>>> In the GMP library, sequences of 32 bit math instructions,
>>>>>> can be replaced by sequences of half as many 64 bit
>>>>>> instructions. And that ends up being around 70% faster.
>>>>>> Normal code doesn't have the density of such improvements,
>>>>>> to see that kind of speedup. Compares and branches,
>>>>>> the speed doesn't change.
>>>>>>
>>>>> Interesting post. When I was an IBM SE many decades ago I had a
>>>>> seismic
>>>>> customer that was very proud of its software being more advanced than
>>>>> many of its competitors because of the brilliance of their algorithms
>>>>> AND the fact that they wrote in assembler for Univac 1108s, taking
>>>>> advantage of detailed knowledge of the hardware. They had a dozen or
>>>>> more mathematicians working full time on this stuff, which they felt
>>>>> gave them the edge.
>>>>>
>>>>
>>>> If you have an instruction level simulator, then yes, you
>>>> might be able to hand tune certain loops that are the
>>>> critical path on some code. But behavioral simulators
>>>> aren't always available.
>>>>
>>>> Modern architectures are complicated enough, you can't hope
>>>> to win the optimization battle, using only the instruction set
>>>> manual. Even with a behavioral simulator, it's still tough to do,
>>>> and takes hours to make the smallest improvement. Some processors
>>>> now are approaching ~1000 possible instructions, and that means
>>>> there could be multiple means to write short code segments. The
>>>> compiler contents itself with only a tiny percentage of those
>>>> instructions. Which begs the question, why do they keep adding
>>>> more instructions to the processors ? At least one person wrote
>>>> an article, asking them to stop :-)
>>>>
>>> Heh! I wonder how much continued effort is put into application
>>> development systems such as Delphi in order to get the best translation
>>> from HLL to running code.
>>>
>>
>> The problems I saw some years back were related to unused and unneeded
>> code generated by then popular compilers. It was significant enough
>> that third party code analyzers were developed/used to locate extra
>> unused code in the compiler outputs. Sometimes the code had to be hand
>> patched to eliminate the problems.
>>
>> In one case I'm aware of, C++ originated code was so slow and
>> inefficient, that the software was rewritten in assembly and machine,
>> using whatever was salvageable from the C++ output coding. The system
>> and software is still in use today, and is deployed around the world.
>>
>> When someone is trying to shoot a missile up your rear, there isn't a
>> whole lot of time to do something about it!
>
> I can't say I've looked at a lot of object oriented code.
> The few times I have (and that was years ago), I see "name mangling"
> adding to the size of the code.
>
> http://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B_Name_Mangling
>
> "It provides a way of encoding name and additional information about
> a function, structure, class or another datatype in order to pass
> more semantic information"
>
> You'd see code mixed with text strings. And I don't know if selecting
> an optimization level (like not using -g) would remove those or not.
> Maybe stripped code would be missing those.
> The code itself didn't seem that out of the ordinary.
> (Looking with the "microscope", the code didn't seem wasteful.
> The waste might be apparent when looking at more of the code.)
>
> When I was playing with my GMP/Mersenne Prime example, I changed
> from C++ code (the original code snippet) to C code (as the GMP
> library supports both), and the biggest saving was the ability
> to remove a few intermediate variables. And when a single variable
> holds a 40,000,000 digit number, that's a significant saving. (A
> single number in that case, busts the L3 cache on the processor.)
> But in that case, I'm changing the style of the code. The resulting
> C code is less readable for another reviewer. But, it gets the job
> done faster.
>
> It depends on the size of the project, just how practical it is
> to manage the project with the older languages. You could start
> the project with the objective of having code fast enough to
> avoid "trying to shoot a missile up your rear", and end up with
> the project failing entirely, and producing no finished code at all.
> That's the danger. If you look at the history of large software
> projects, it's not very encouraging at all.
>
> One project I worked on, we had a software architect. No coders
> hired yet (not the time for them). He estimated the size of code
> needed for our product. And his estimate showed it would take
> twenty minutes for all the object oriented code to just
> *load* into the processor (or processor complex). Nothing
> executed yet. Just loading. Well, we laughed our asses off
> about that - at least I did. It's a good thing I wasn't
> the manager, having to deal with that snippet of info.
> I don't think the software architect and the manager, got
> along that well.
>
> Paul
One of the common trouble areas we found had to do with compiler
generated "subroutines" supposedly used to decrease the memory footprint.
It turned out that the more or less "generic" subs carried a lot of
extra code that would never be executed.

The real reason behind all the problems turned out to be "smart"
hardware sub assemblies that were more or less independent machines with
imbedded cpu's.
They all had access to common memory, and a controlling CPU passed data
and instructions to them via RAM and ROM. Then there were multiple
classes of interrupts, from "I'm ready, doing what I was told, to can't
do that, or an almost last resort, "I'm dead, shut me down and leave me
alone".

Almost none of the C++ programmers had any experience with such an
environment, and the compilers were originally written around the
expectation of generating code suitable for "serial" execution.

Glad I got out of that general area of endeavor, and moved on to less
frustrating and better paid parts of my field. When I retired, a few
years ago, I was amused to find out that the systems were still in use,
along with the major part of the software, and no one had figured out
how to do things any differently when it came to "modernization".