If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Rate Thread | Display Modes |
#16
|
|||
|
|||
CPU generation question
On 2019-07-26 11:08 p.m., T wrote:
On 7/26/19 9:01 PM, lonelydad wrote: T wrote in : As far a generation of processors goes, the higher the generation, the better the power consumption.Â* I haven't seen more than four cores making any practical difference with Windows.Â* And multi-threading doesn't seem to matter on Windows after four real cores (Linux does make a big difference). I know I will probably catch some flack for this, but the only reason for more than four cores would be if the user is going to run a specially written massively parallel program. An example would be the simiulation programs urn at Los Alamos, et el, when they do things like simulate nuclear explosions, or when NOAA and others are doing weather forcasts. There are really very few truly parallel processes required in the programs most run on their desktop PCs. I have to agree.Â* So no flack from here.Â* And I have not seen Windows being able to take advantage of more than four real cores either.Â* Linux does, but that is a totally different technology. Prime 95 uses all 6 real cores and all 6 virtual cores at 100 % CPU usage, I'm sure there are others. Rene |
Ads |
#17
|
|||
|
|||
CPU generation question
T wrote:
On 7/26/19 11:26 PM, Paul wrote: T wrote: On 7/26/19 9:01 PM, lonelydad wrote: T wrote in : As far a generation of processors goes, the higher the generation, the better the power consumption. I haven't seen more than four cores making any practical difference with Windows. And multi-threading doesn't seem to matter on Windows after four real cores (Linux does make a big difference). I know I will probably catch some flack for this, but the only reason for more than four cores would be if the user is going to run a specially written massively parallel program. An example would be the simiulation programs urn at Los Alamos, et el, when they do things like simulate nuclear explosions, or when NOAA and others are doing weather forcasts. There are really very few truly parallel processes required in the programs most run on their desktop PCs. I have to agree. So no flack from here. And I have not seen Windows being able to take advantage of more than four real cores either. Linux does, but that is a totally different technology. What does your statement mean exactly ? Here's a quick 7ZIP compression run. All cores in use. https://i.postimg.cc/852B2nd2/compression.gif Paul I mean just in "observing" how fast things run, I am not observing any improvement over 4 real cores. Well, in Windows. So you're looking for a linearity test of some sort ? I already know how that works on my processor. It's an Intel issue (a hardware issue), not the OS as such. My 6C 12T processor scales to 5 cores of performance or so. That means the ring bus is starving out on average, about one core of performance at full load. (If there is good locality of reference, performance could stay at 6 cores, but any real world loads tend to give about 5 cores or so.) These effects were also seen on inferior schemes. The first dual socket consumer motherboards (and likely Intel Xeon multi-socket offerings as well), they were using a shared bus with snoop traffic running over it for cache coherency. For example, you could take two E8450 silicon dies and put them on a common substrate, and call that the 9650. From the outside, it looked like a quad core. But the shared bus held that back, and it would give "3.5 cores" of performance, and you would lose half a core of potential performance, because the shared bus was clogged (and the memory controller was on another chip at the time). Intel tried counterrotating rings on processors like mine, and tried that up to a fairly high core count. That means people who spent a lot more money than I did, lost multiple cores of potential performance on their high core count processors. The rings can't keep up. Later generations switched to a mesh of parallel buses, where I presume traffic would go along a horizontal bus, then along a vertical bus, to go from one core to another. I've not seen any comparisons of performance loss on those due to the bus scheme. The aggregate bandwidth of the mesh in that case, should be quite a bit higher than the handful of serially chained counterrotating rings. Other than that, Windows would do little to screw up the resources. It sees cores ready-to-run, the scheduler schedules stuff to run on them. It does all the stuff that schedulers do. Every developer who gets a CS degree, is taught how to do these things, so it's not like there are secrets or something. In terms of visual output, everything has scroll throttles and the OS does not look like Temple OS. The graphics on Temple OS are frenetic and abrupt, because there is no throttling at all on visuals. Does that look nice ? Not particularly. If I was on meth, I'd want to scratch myself. Because of throttling, you can take low-resource hardware products, and while you can "feel" the slow in them, the graphics molasses is about the same on all of them. Think of it as making the graphics on a Windows 10 desktop, look like a jolly big smartphone. If I had a 64-core processor, my 7ZIP run would exhibit a higher speed rating on the processing rate field in the display. In the example I showed, it was compressing at 103MB/sec. If I had more cores, I would expect a higher number, until the system memory performance (which isn't very good), prevents it from going faster. 7ZIP is one of the few benchmarks that benefits from CPU cache (the more you can keep cached, the faster it runs). Intel memory implementations don't buy you much. Intel made dual channel, triple channel, and quad channel systems. The user is supposed to "believe" the quad channel one is twice as fast as the dual channel one. However, that's not how it works. and while Intel has invented CPUs with a multitude of channels, it's hard to say whether they're doing a better job than an extrapolation of what came before. I don't own anything like that to test. My quad channel system is not really any faster than a dual channel system (bus efficiency is below 25%). If you do scaling testing with Cinebench, you should find linear speedup behavior, implying scheduling isn't a problem. Cinebench is pretty lame, in terms of the problem it is solving, and that's what gives the linear speedup (the task of computing a graphic, is not related to any other graphic, except when the results are plunked on the screen). When a core comes free, the next graphic calc starts. And that's in the figurative sense, since during the time a single calc runs, it can be moved from core to core at the whim of the scheduler. On Windows 10, on an AMD processor, the graphics calc should stay on the same CCX and not migrate, for best efficiency. I saw a claim about a week ago, that AMD was aiding Microsoft (somehow) in tuning like that (especially for the just-released silicon). Microsoft was claiming maybe three or four OSes ago, that this was a solved problem, but arch analysis still seems to be done by hand, and tuned by hand (if tuned at all). The more whacky an AMD arch, the more tuning it needs (or even needs new code written). And this is so so challenging, we can't backport it to Windows 7 :-/ So so challenging. If I was on meth, I'd want to scratch myself. ******* Summary: I've seen nothing on the two platforms that I would call a "unique advantage". Windows *is* reserving some CPU, as part of the weakness in its design (Vista+), but once you know how that works, you can select conditions (oversubscription) to rip the reservation away from the OS. I did that for the compression picture, which is why that one is running at 100% in task manager. And doing these things, is a natural part of using any OS, doing whatever is necessary to get the very best overall performance. I know where some of the weaknesses are. NTFS handles about 4K file operations per second, while on Linux, TMPFS can do 186K file operations per second equivalent to that. If I were to set up a 7ZIP test involving compressing a million 4KB files, 7ZIP wouldn't even hit 40%, because it would be waiting on the file system all the time. Even loading the entire file set into System Read Cache does not help on Windows. The same 4K file per second cap applies. But if we're interested in "only showing the scheduler at its best", I will construct a test to show that the scheduler gives me 100% CPU (multicore) when I ask for it. I could do that with Cinebench, but that would take me longer to do. The 7ZIP picture takes no time at all to set up. Paul |
#18
|
|||
|
|||
CPU generation question
On 7/27/19 8:11 AM, Paul wrote:
SoÂ*you'reÂ*lookingÂ*forÂ*aÂ*linearityÂ*testÂ*ofÂ* someÂ*sortÂ*? No not really. My customer will only tolerate a certain amount of expense. And I "observe" no real change in more than four cores. So I put their money towards NVMe drives and fast memory buses. Now that make a TON of difference. The processor is sitting there doing nothing most of the time anyway. Well, almost nothing. I will have to read your write up over slowly. Thank you! |
#19
|
|||
|
|||
CPU generation question
On 7/27/19 6:10 AM, Rene Lamontagne wrote:
Prime 95 uses all 6 real cores and all 6 virtual cores at 100 % CPU usage,Â*I'mÂ*sureÂ*thereÂ*areÂ*others. Sounds like it is well written. The 64 thousand dollar question would be is there any real noticeable difference over a four core machine? Most applications do not thread worth beans. I instead put my money towards NVMe drives and fast memory buses. Now that makes a YUGE difference. |
#20
|
|||
|
|||
CPU generation question
On 2019-07-27 3:14 p.m., T wrote:
On 7/27/19 6:10 AM, Rene Lamontagne wrote: Prime 95 uses all 6 real cores and all 6 virtual cores at 100 % CPU usage,Â*I'mÂ*sureÂ*thereÂ*areÂ*others. Sounds like it is well written.Â* The 64 thousand dollar question would be is there any real noticeable difference over a four core machine? Most applications do not thread worth beans. I instead put my money towards NVMe drives and fast memory buses.Â* Now that makes a YUGE difference. Yep, fast M.2 NVMe drives like my two AData SX8200pnp drives reading at 3450 MB/s and writing at 2370 MB/s and 3200 MHz Ram coupled with my Intel i7 8700 Turboing at 4.28 GHz on an Asus z390 prime motherboard can sure turn over a lot of ground in a hurry. It also plays a hell of a game of Chess. Rene : |
#21
|
|||
|
|||
CPU generation question
T wrote:
On 7/27/19 8:11 AM, Paul wrote: So you're looking for a linearity test of some sort ? No not really. My customer will only tolerate a certain amount of expense. And I "observe" no real change in more than four cores. So I put their money towards NVMe drives and fast memory buses. Now that make a TON of difference. The processor is sitting there doing nothing most of the time anyway. Well, almost nothing. I will have to read your write up over slowly. Thank you! If you look at the benchmark site, there are two kinds of charts. https://www.cpubenchmark.net/ The red one on the right is Single Threaded CPU. Many legacy programs run on one core. A high core clock might make such programs feel "faster" than on the previous system. The other kind of chart, measures multithreaded performance. The CPUs with more cores, generally end up higher on the chart. The purpose of this observation, is to capture the improved 7ZIP performance (less time to finish compression job). Photoshop is a mix of both types of code. Photoshop is split in half, with half being single threaded filters, and half being multithreaded. Some of the image filters use "divide and conquer". With four cores, the image is cut in four pieces, and algo carried out, and at the end, a "stitching" process handles the exception pixels on the edges. If you have enough cores, you could cut the image in 6 or 8 or 12 pieces. And it would finish 6 or 8 or 12 times faster. Selecting a processor then, requires: 1) "Checking the customer's lifestyle". 2) Pointing out that lower core count processors, can have the highest clock. Like, hitting 5GHz. 3) If all the work the individual does, scales with core count (fluid simulation through a particle bed), then perhaps a 64-core machine running at 2.2GHz, gets on aggregate, way more done than 4 cores at 5GHz. 4) A 4C 8T machine at 5GHz, is not a bad choice. It has good economics (could save a couple hundred on mobo, could save on RAM sticks when RAM is expensive). The only time it's a bad idea is when the customer says "I need good 7ZIP, RAR, movie encoding performance", in which case a 64-core processor could probably encode eight movies at once, and still have enough horsepower left to surf in Firefox and not even notice the load. (Movie encoding on a CPU can have slightly better appearance, and until a video card encoder can match that, there will still be people doing movies with the CPU.) As an example of parallelism, I wanted to re-encode a movie in Cinepak. Only problem was, the Cinepak CODEC is single threaded. (It's the nature of the divide-and-conquer algorithm used.) So I split the movie into twelve equal length pieces (each movie being a multiple of the Group-Of-Pictures for the format). I ran twelve copies of the encoding software. The job finished in a day, instead of taking an entire week. I was well ahead of schedule when I finished. And that's an example where moar cores helps. The resulting 12 movie segments were then joined via lossless join, at around 150MB/sec. At that point, I had my finished movie. If I'd had a lower core count processor, it would have been late the next day before it would have been ready for join. If I ran the entire movie through one core, it would have taken forever. But that's what the software wanted to do. Paul |
#22
|
|||
|
|||
FFmPeg ?
In article , @. wrote:
If I ran the entire movie through one core, it would have taken forever. But that's what the software wanted to do. Googling " multi-threaded video editor " doesn't look promising; I doubt video editors use cores efficiently. they do. |
#23
|
|||
|
|||
FFmPeg ?
Jeff-Relf.Me @. wrote:
Paul wrote: The resulting 12 movie segments were then joined via lossless join, at around 150MB/sec. Using what, FFmPeg ? what are the commandline switches ? Can it save-off a 1 minute segment from 60 minute video ? 150 MegaBytes/sec ?! I thought a 26 MegaBits/sec video was a lot. What was the FPS and resolution ? If I ran the entire movie through one core, it would have taken forever. But that's what the software wanted to do. Googling " multi-threaded video editor " doesn't look promising; I doubt video editors use cores efficiently. The only interesting property Cinepak has, is the ability to seek and review a video so encoded. If you had a three hour presentation, and you needed to find the bit about topic X, it would be a good format to use for that reason. It is not an archive format. It is not bandwidth efficient. It does not encode easily. It has glitches in it. It is one of the few movie CODECs that does not use DCT or FFT or frequency domain analysis (the throwing away on purpose of high frequency information, to make a movie smaller). It uses some sort of arithmetic encoding method. And it does not lend itself to parallelization. The N+1th calc relies on the result of the Nth calc. FFMPEG has the sample code for single-threaded Cinepak. Chop (to uncompressed intermediate format): ffmpeg -i KEY01.mp4 -map 0 -copytb 1 -c:v rawvideo -pix_fmt bgr24 -af "volume=6dB" -ac 2 -c:a pcm_s16le -f segment -segment_list out.list -segment_frames 26267,52535,78803,105071,131339,157607,183875,2101 43,236411,262679,288947 G:\WORK\out%03d.mov Then, twelve encoding instances similar to this: cd /d C:\FFMPEG\bin ffmpeg -i G:\WORK\out000.mov -vcodec cinepak -ac 2 -acodec pcm_s16le G:\WORK\a00.mov The join operation is almost purely a file copy operation, with only the ends being spliced in a lossless way. The rate operator corrects the damaged frame rate. This command completes relatively quickly, and is disk limited. ffmpeg -f concat -r 30000/1001 -i filelist.txt -c copy N:\concattest2.avi The filelist.txt has twelve lines similar to this. file 'G:\WORK\a00.avi' The encoder ("twelve encoding instances"), process video frames at well less than 1 frame per second. Unlike most of the CODECs you're used to in ffmpeg. It is god-awful slow. But so is arithmetic compression. Paul |
#24
|
|||
|
|||
FFmPeg ?
In article , Paul
wrote: The only interesting property Cinepak has, is the ability to seek and review a video so encoded. If you had a three hour presentation, and you needed to find the bit about topic X, it would be a good format to use for that reason. no. cinepak is ~30 years old and long obsolete and seeking can be done with most codecs. |
#25
|
|||
|
|||
FFmPeg ?
nospam wrote:
In article , Paul wrote: The only interesting property Cinepak has, is the ability to seek and review a video so encoded. If you had a three hour presentation, and you needed to find the bit about topic X, it would be a good format to use for that reason. no. cinepak is ~30 years old and long obsolete and seeking can be done with most codecs. Back in the day, it was encoding 160x120 videos or 320x240, and nobody considered what would happen to it if the frame was 1920x1080. It's a codec known for the compression being hard, while the playback is "easy", so a gutless processor is more likely to give a decent experience on playback. ******* There are encoding options for at least one popular movie format, that include "bidirectional" encoding, and that gives better forward and rewind (seek) when played back. It likely makes the video a bit bigger, but gives better seek behavior. In my experience, things like large AVI movies (4GB) encoded as AVI2 OpenDML, those tend to suck when seeking back and forth. (That's at least one of the formats where sometimes you have to restart the player to recover.) Cinepak is an order of magnitude better at it. But Cinepak is probably two to three times bigger, in terms of file size, and that's why it isn't an archive format. It's fun to play with, for one video experiment, but isn't going into anyones work flow. Paul |
#26
|
|||
|
|||
FFmPeg ?
On Sat, 27 Jul 2019 19:01:54 -0400, nospam wrote:
I doubt video editors use cores efficiently. they do. The main problem with EVERYTHING you say, nospam, is: a. You just guess (where your credibility is worse than a coin toss) b. You don't back anything up with any facts because you have no intention on being actually helpful. The stellar opposite of you, is Paul a. He's purposefully helpful, which means he backs everything up, and, b. His credibility is stellar because he cares about his credibility. Whatever you say, nospam, is just a mere guess. o The entire output of your brain could be replaced by a simple coin toss. That's how I know you have absolutely zero formal education, nospam. o You couldn't last a week in grad school or in the Silicon Valley being wrong more than half the time which is what your record shows you to be. Back on topic ... one problem I have with FFMPEG is that there are so many of them out there - where it's not clear what the difference is between them. The FFMPEG I generally use is for the Youtube-dl.exe conversion tool o https://youtube-dl.org/downloads/latest/youtube-dl.exe o http://ffmpeg.zeranoe.com/builds/ Generally that ffmpeg build contains three different executables: o ffmpeg.exe o ffprobe.exe o ffplay.exe I also use ffmpeg, at times, to save frames from a video: o ffmpeg -i L:\movie.mpg -f image2 -q:v 1 -c:v mjpeg a%03d.jpg One oddity is that both the European and US patents have expired on ffmpeg o https://www.ffmpeg.org/legal.html So, I would think we can now get it compiled inside of things like the youtube downloader functionality. |
#27
|
|||
|
|||
FFmPeg ?
In article , Paul
wrote: The only interesting property Cinepak has, is the ability to seek and review a video so encoded. If you had a three hour presentation, and you needed to find the bit about topic X, it would be a good format to use for that reason. no. cinepak is ~30 years old and long obsolete and seeking can be done with most codecs. Back in the day, it was encoding 160x120 videos or 320x240, and nobody considered what would happen to it if the frame was 1920x1080. they did, but the hardware back then couldn't handle the bandwidth or storage for 1080p. regardless, cinepak is obsolete and anything encoded with it is difficult to impossible to play today. It's a codec known for the compression being hard, while the playback is "easy", so a gutless processor is more likely to give a decent experience on playback. most codecs are asymmetrical, although it no longer matters since encode/decode is now done in hardware and faster than real time. many phones can shoot 4k/60 video, with high end desktop systems able to handle multiple simultaneous 8k streams, all without a hiccup. |
#28
|
|||
|
|||
CPU generation question
On 7/27/19 1:49 PM, Rene Lamontagne wrote:
On 2019-07-27 3:14 p.m., T wrote: On 7/27/19 6:10 AM, Rene Lamontagne wrote: Prime 95 uses all 6 real cores and all 6 virtual cores at 100 % CPU usage,Â*I'mÂ*sureÂ*thereÂ*areÂ*others. Sounds like it is well written.Â* The 64 thousand dollar question would be is there any real noticeable difference over a four core machine? Most applications do not thread worth beans. I instead put my money towards NVMe drives and fast memory buses.Â* Now that makes a YUGE difference. Yep, fast M.2 NVMe drives like my two AData SX8200pnp drives reading at 3450 MB/s and writing at 2370 MB/s and 3200 MHz Ram coupled with my Intel i7 8700 Turboing at 4.28 GHz on an Asus z390 prime motherboard can sure turn over a lot of ground in a hurry. It also plays a hell of a game of Chess. Rene Â*: I think the guy who bragged about what generation of CPU he was using was just condescended. To me the generation of CPU is just what fits on the motherboard that you spec out. This guy was actually demonstration what little clue about I.T. he had. There is a lot more to I.T. than assembling pop beads. Your system sounds awesome by the way! |
#29
|
|||
|
|||
CPU generation question
On 2019-07-28 8:15 p.m., T wrote:
On 7/27/19 1:49 PM, Rene Lamontagne wrote: On 2019-07-27 3:14 p.m., T wrote: On 7/27/19 6:10 AM, Rene Lamontagne wrote: Prime 95 uses all 6 real cores and all 6 virtual cores at 100 % CPU usage,Â*I'mÂ*sureÂ*thereÂ*areÂ*others. Sounds like it is well written.Â* The 64 thousand dollar question would be is there any real noticeable difference over a four core machine? Most applications do not thread worth beans. I instead put my money towards NVMe drives and fast memory buses.Â* Now that makes a YUGE difference. Yep, fast M.2 NVMe drives like my two AData SX8200pnp drives reading at 3450 MB/s and writing at 2370 MB/s and 3200 MHz Ram coupled with my Intel i7 8700 Turboing at 4.28 GHz on an Asus z390 prime motherboard can sure turn over a lot of ground in a hurry. It also plays a hell of a game of Chess. Rene Â*Â*: I think the guy who bragged about what generation of CPU he was using was just condescended.Â* To me the generation of CPU is just what fits on the motherboard that you spec out.Â*Â* This guy was actually demonstration what little clue about I.T. he had.Â* There is a lot more to I.T. than assembling pop beads. Your system sounds awesome by the way! thanks, It is way overkill for my needs, I like to build with longevity in mind as well as quality. At 85 this could very well be my last build and it would be passed on to my son if anything should happen so I would like it to be the best I can build. Besides I enjoy building and configuring the machines to my liking. Rene Rene |
#30
|
|||
|
|||
CPU generation question
T wrote:
Hi All, I got talking to a guy yesterday whilst handing out cards. He started expounding on how he built his own computer and from what I saw, he did a pretty good job. He was able to move 3D graphics in real time. The thing he was the most proud of was the "generation" of the processors he picks. I presume he means Intel's processors. Now, to me the generation of the processor does not mean a lot. When building a customer computer, I first find the motherboard I want and then look at the specs to see what processor it takes. Then I check my suppliers stock to see what is in stick and what is the best value for what is needed. This usually is the current generation and one back. As far a generation of processors goes, the higher the generation, the better the power consumption. I haven't seen more than four cores making any practical difference with Windows. And multi-threading doesn't seem to matter on Windows after four real cores (Linux does make a big difference). As far a performance goes, the big bottleneck it the hard drive. I adore using NVMe drives ans they make a YUGE difference. Next would be the memory bus speed. Last of all would be the generation of the processor. I go for the motherboard that meets the customer's needs. To me the generator of the processor is what fits on the motherboard. Am I missing something? Does the "generation" of the processor really make that much difference? -T OK, here is a table I found, one where I didn't have to work very hard. The number on the right, is "normalized" for frequency. Why I am doing that, is to see whether the arch of the processor is magically more powerful than previous generations. I moved the items around in the table a bit, since a "simple-minded" classification scheme someone mentioned, isn't exactly right. The "lead digit" in the model number, isn't the generation. It's close, but they spread the models around. Really, no method is a reliable method at this level (and on the Ark web pages at Intel, Intel has on purpose not put that info in the entries of the *expensive* processors). And we know that some devices, like say comparing an IvyBridge to an IvyBridgeE, they could in fact be different generations. The E,X,EX and so on, usually got a crusty chipset with spiffy features missing, and you tended to get that feeling that the high end stuff came out on a different process or node. https://cpugrade.com/articles/cinebe...arison-graphs/ 223 9900K 5.00GHz Coffee Lake 9th 223/5 =44.6 201 8700K 4.70GHz Coffee Lake 9th 201/4.7 =42.8 189 7700K 4.59Ghz Kaby Lake 7th 2016-17 189/4.59 =41.1 190 9900X 4.40GHz Skylake 190/4.40 =43.2 184 7900X 4.30GHz Skylake 184/4.30 =42.8 182 6700K 4.20GHz Skylake 6th 2015-16 182/4.20 =43.3 155 6900K 3.70Ghz Broadwell 155/3.70 =41.9 153 5775C 3.70Ghz Broadwell 5th 2014-15 153/3.70 =41.4 141 5960X 3.50Ghz Haswell 2013 141/3.50 =40.3 159 4770K 3.90GHz Haswell 4th 159/3.90 =40.8 144 4960X 4.00Ghz Ivy Bridge 2012 144/4.00 =36.0 135 3960X 3.90GHz Sandy Bridge 135/3.90 =34.6 131 2600K 3.80Ghz Sandy bridge 2011 131/3.80 =34.5 But anyway, the message in that table, is for the most part, innovation stopped around Haswell or so. It's hard to explain why the top item in the table is improved. Why the 9900K is better than the 8700K. Unless the memory on the two systems was quite different or something. There just isn't the level of detail to spot a difference. Maybe they're actually different tech, or the mesh bus setup is different, or... whatever. https://www.anandtech.com/show/13591...-power-for-sff Core 1 2 3 4 5 6 7 8 Freq 5.0 5.0 4.8 4.8 4.7 4.7 3.6 3.6 https://www.anandtech.com/show/11859...nitial-numbers Core 1 2 3 4 5 6 Freq 4.7 4.6 4.5 4.4 4.4 4.3 If I had a Haswell, I probably wouldn't be feeling too bad at this point. Paul |
Thread Tools | |
Display Modes | Rate This Thread |
|
|