NVMe drives in different operating modes of the PCI Express interface: a practical study of interface scalability in data transfer tasks. PCI interface in a computer: types and purpose. Photo Types of devices using PCI Express x2, x4, x8, x12,

Readers of the site will surely remember our similar project, which we already carried out about two and a half years ago. We analyzed the throughput of PCI Express in November 2004, when the PCI Express (PCIe) interface was still new and did not provide a significant advantage over AGP graphics cards. Today, almost every new computer has a PCI Express interface; it is also used to connect a video card, both built-in and external. Over the past time, video cards have made significant progress, so it seemed to us that the time had come for a new analysis that would allow us to answer the question: what throughput Do graphics cards really need tires?

The PCI Express interface quickly fueled the growth of the graphics industry, as it allowed nVidia and ATi/AMD to install two or even four graphics cards in a computer. In addition, PCI Express is required for expansion cards with high bandwidth requirements, such as RAID controllers, gigabit network adapters or physical accelerators for 3D applications and games. The processing power of additional graphics cards can be used to increase performance at high resolutions, add visual features, or increase speed at standard resolutions and quality settings. However, the last option is not always interesting, since many modern video cards are powerful enough for standard resolutions of 1024x768 and 1280x1024. The growth potential from ATi CrossFire and nVidia SLI solutions is impressive, but both solutions require the right platform. But universal, that is motherboard, which would support CrossFire and SLI at the same time, does not exist. At least for now.

However, configurations on two and four video cards are only part of the graphics market. Most computers and upgrade scenarios are still built on a single graphics card, which is why we decided not to expand our PCI Express scaling tests to two graphics cards. We took typical high-end ATi and nVidia graphics cards and put them through a series of tests in different PCI Express modes.

The most common PCI Express slots: a large one supports 16 lines, and a small one - one line for the simplest expansion cards.

Unlike the PCI and PCI-X buses, the PCI Express interface is based on a point-to-point serial protocol. That is, the PCI Express interface requires a relatively small number of conductors. However, the interface uses much higher clock rates than parallel buses, resulting in high bandwidth. In addition, bandwidth can be easily increased by tying multiple PCI Express lanes together. The most commonly used slot types are x16, x8, x4, x2, and x1, where the numbers indicate the number of PCI Express lanes.

PCI Express is a bidirectional point-to-point interface that provides the same bandwidth in both directions, and does not need to share the bandwidth with other devices, as was the case with PCI. Thanks to the modular architecture, motherboard manufacturers can allocate available PCI Express lanes to the slots they need. Let's say 20 available PCI Express lanes can be routed to one x16 PCIe slot and four x1 PCIe slots. This is what happens with many chipsets. And for server systems, for example, you can install five x4 PCIe ports. In general, with PCI Express you can create any mathematical configuration. Finally, PCI Express allows you to mix chipset bridges from different manufacturers.

However, PCI Express has one drawback: the more PCIe lanes, the higher the power consumption of the chipset. It is for this reason that chipsets with 40 or more PCI Express lanes require more power. As a rule, 16 additional PCI Express lanes increase the power consumption of modern chipsets by 10 watts.

Number of PCI Express lanes	Throughput in one direction	Total throughput
1	256 MB/s	512 MB/s
2	512 MB/s	1 GB/s
4	1 GB/s	2 GB/s
8	2 GB/s	4 GB/s
16	4 GB/s	8 GB/s

On most motherboards, 16 PCI Express lanes are used to connect the video card.

On many systems with two graphics cards, the two physical x16 PCI Express slots operate in x8 lane mode each.

In order for the video card to work in x8 PCI Express mode, we sealed some of the contacts with adhesive tape.

In order for the video card to work in x4 PCI Express mode, we had to tape even more contacts.

The same video card, but more contacts are sealed. It works in x4 PCI Express mode.

The same can be said about x1 PCI Express. We sealed all contacts that were not required in x1 mode.

If you seal the extra contacts, then the PCI Express video card will work in the mode of only x1 PCI Express. Throughput is 256 MB/s in both directions.

Keep in mind that not every motherboard can work with video cards with a low number of PCI Express lanes. In our first article, we had to change the BIOS of the DFI LANParty 925X-T2 motherboard to make it support "low" modes. As for the new motherboards, we also had to check several models before we found the right one. We ended up with the MSI 975X Platinum PowerUp Edition. The Gigabyte 965P-DQ6 board didn't work right from the start, and Asus Commando refused to work in "low" modes after updating the BIOS.

Schematic of x16 PCI Express slot. It can be used to determine which contacts needed to be sealed with tape. Click on the picture to enlarge.

Competitors: ATi Radeon X1900 XTX and nVidia GeForce 8800 GTS

We took two high-end video cards from two competitors: AMD/ATi and nVidia, namely the Radeon X1900 XTX and the GeForce 8800 GTS. Models, of course, are not the most top-end, but definitely high-end.

The ATi Radeon X1900 XTX has 384 million transistors and offers 48 pixel shaders. They are organized by four blocks into so-called "quads". The GPU runs at 675 MHz, the graphics card has 512 MB of GDDR3 memory running at 775 MHz (1.55 GHz DDR). Note that ATi's X1xxx graphics cards are not DirectX 10 compliant.

We took the HIS X1900 XTX IceQ3 model, which uses an improved cooling system. Since the design is reference, the card's fan is still radial, but there is a heat pipe system and a massive heatsink. In our experience, a HIS graphics card is quieter than reference models ATi.

The GeForce 8 line from nVidia is at the forefront of this company. Although we have the first consumer-grade DirectX 10-class video cards, nVidia did not start very well under Windows Vista due to driver issues. The chip operates at a frequency of 500 MHz, and the pixel processors at 1.2 GHz. There are cards available with 320 and 640 MB of RAM, all of which use 800 MHz memory (1.6 GHz DDR).

We took a GeForce 8800 GTS with 320MB of GDDR3 memory from Zotec. The card is based on the nVidia reference design.

Test configuration

System hardware
socket 775	Intel Core 2 Extreme X6800 (Conroe 65 nm, 2.93 GHz, 4 MB L2 cache)
Motherboard	MSI 975X Platinum PowerUp Edition, chipset: Intel 975X, BIOS: 2007-01-24
General hardware
Memory	2x 1024MB DDR2-8000 (CL 4.0-4-4-12), Corsair CM2X1024-6400C3 XMS6403v1.1
Video card I	HIS X1900 XTX IceQ3, GPU: ATi Radeon X1900 XTX (650MHz), Memory: 512MB GDDR3 (1550MHz)
Video Card II	Zotec GeForce 8800 GTS, GPU: GeForce 8800 GTS (500 MHz), Memory: 320 MB GDDR3 (1200 MHz)
HDD	400 GB, 7200 rpm, 16 MB cache, SATA/300, Western Digital WD4000KD
DVD-ROM	Gigabyte GO-D1600C (16x)
Software
Graphics Driver I	ATi Catalyst Suite 7.2
Graphics Driver II	nVidia ForceWare 97.92
Intel platform drivers	Chipset Installation Utility 8.1.1.1010
DirectX	Version: 9.0c (4.09.0000.0904)
OS	Windows XP Professional, Build 2600 SP2

Tests and settings

Tests and settings
3D games
	Version: 1.3 Video Mode: 1600x1200 Anti Aliasing: 4x Texture Filter: Anisotropic Timedemo demo2
	Version: 1.2 (Dual Core Patch) Video Mode: 1600x1200 Video Quality: Ultra(ATI)/High(Nvidia) Anti Aliasing: 4x Multi CPU: Yes THG Timedemo waste.map timedemo demo8.demo 1 (1 = load textures)
Applications
SPECviewperf 9	Version: 9.03 All tests
3D Mark06	Version: 1.1 Video Mode: 1600x1200 Anti Aliasing: 4x Anisotropic Filter: 8x

Test results

As you can see, the nVidia GeForce 8800 GTS performs terribly at x1 and x4 speeds, noticeably below the maximum performance level that can only be achieved at x16 speeds. The ATi Radeon X1900 XTX, on the other hand, requires no more than x4 PCI Express bandwidth to run Call of Duty 2 properly.

The situation in Quake 4 is completely different. Here, the ATi Radeon X1900 XTX and nVidia GeForce 8800 GTS start to work quite normally at x4 PCI Express speed, and when switching to x8 or x16, they win slightly.

Futuremark's 3DMark06 3D graphics benchmark is very GPU-intensive since it was designed from the start for that purpose. Therefore, the requirements for the interface are small. The nVidia GeForce 8800 GTS reacts more strongly to the reduced bandwidth of the PCI Express interface compared to the ATi Radeon X1900 XTX, which runs close to the maximum already at x4 PCI Express speed.

The professional graphics OpenGL test SPECviewperf 9.03 is very heavy CPU and graphics subsystem. As you can see, the results significantly depend on the speed of the interface. It was quite interesting to note how performance scaled from x1 to x4 to x8 PCI Express. The transition to x16 PCI Express gives a performance boost, but not so significant. In any case, it can be said with certainty that professional graphic applications require a high bandwidth interface. Therefore, if you want to work with 3DSMax, Catia, Ensight, Lightscape, Maya, Pro Engineer or SolidWorks, then x16 PCI Express is indispensable.

Conclusion

Conclusion of our 2004 PCI Express Scaling Analysis was simple: x4 PCIe bandwidth is enough for single video cards to work, it does not create a bottleneck. At that time, the throughput of x8 or x16 PCIe interfaces did not give any gain, and the AGP interface, in principle, was also enough.

But nowadays the situation has changed. As you can see, four PCI Express lanes are no longer enough to get maximum performance. While we see differences both between ATi/AMD and nVidia, as well as between games and professional applications, in most cases maximum performance is achieved only with the x16 PCI Express interface. We tested two 3D games, Quake 4 and Call of Duty 2, which aren't the most demanding of today, but they definitely benefit from a faster interface. But we got the most curious results in the SPECviewperf 9.03 test, as it showed a significant drop in performance when the PCI Express interface speed was reduced below x16.

The performance results clearly show that motherboards and chipsets today must support all graphics cards at full x16 PCI Express speed. If you install high-performance video cards on a "weak" interface, such as PCI Express x8, you will have to sacrifice performance.

In the spring of 1991, Intel completed the development of the first breadboard version of the PCI bus. The engineers were tasked with developing a low-cost and high-performance solution that would allow them to realize the capabilities of the 486, Pentium and Pentium Pro processors. In addition, it was necessary to take into account the mistakes made by VESA when designing the VLB bus (the electrical load did not allow connecting more than 3 expansion cards), as well as to implement automatic device configuration.

In 1992, the first version of the PCI bus appears, Intel announces that the bus standard will be open, and creates the PCI Special Interest Group. Thanks to this, any interested developer gets the opportunity to create devices for the PCI bus without the need to purchase a license. The first version of the bus had a clock speed of 33 MHz, could be 32- or 64-bit, and devices could work with signals of 5 V or 3.3 V. Theoretically, the bus bandwidth was 133 MB / s, but in reality the bandwidth was about 80 MB/s

Main characteristics:

bus frequency - 33.33 or 66.66 MHz, synchronous transmission;
bus width - 32 or 64 bits, multiplexed bus (address and data are transmitted over the same lines);
peak throughput for the 32-bit version running at 33.33 MHz is 133 MB/s;
memory address space - 32 bits (4 bytes);
address space of input-output ports - 32 bits (4 bytes);
configuration address space (for one function) - 256 bytes;
voltage - 3.3 or 5 V.

Photo connectors:

MiniPCI - 124 pin
MiniPCI Express MiniSata/mSATA - 52 pin

Apple MBA SSD, 2012
Apple SSD, 2012
Apple PCIe SSD
MXM, Graphics Card, 230 / 232 pin
MXM2 NGIFF 75 pins KEY A PCIe x2 KEY B PCIe x4 Sata SMBus
MXM3, Graphics Card, 314 pin
PCI 5V
PCI Universal
PCI-X 5v
AGP Universal
AGP 3.3v
AGP 3.3 v + ADS Power
PCIe x1
PCIe x16
Custom PCIe
ISA 8bit
ISA 16bit
eISA
VESA
NuBus
PDS
PDS
Apple II / GS Expansion slot
PC/XT/AT expansion bus 8bit
ISA (industry standard architecture) - 16 bit
eISA
MBA - Micro Bus architecture 16 bit
MBA - Micro Bus architecture with video 16 bit
MBA - Micro Bus architecture 32 bit
MBA - Micro Bus architecture with video 32 bit
ISA 16 + VLB (VESA)
Processor Direct Slot PDS
601 Processor Direct Slot PDS
LC Processor Direct Slot PERCH
NuBus
PCI (Peripheral Computer Interconnect) - 5v
PCI 3.3v
CNR (Communications/network Riser)
AMR (Audio / Modem Riser)
ACR (Advanced Communication Riser)
PCI-X (Peripheral PCI) 3.3v
PCI-X 5v
PCI 5v + RAID option - ARO
AGP 3.3v
AGP 1.5v
AGP Universal
AGP Pro 1.5v
AGP Pro 1.5v+ADC power
PCIe (peripheral component interconnect express) x1
PCIe x4
PCIe x8
PCIe x16

PCI 2.0

The first version of the basic standard, which was widely adopted, used both cards and slots with a signal voltage of only 5 volts. Peak bandwidth - 133 MB / s.

PCI 2.1 - 3.0

They differed from version 2.0 in the possibility of simultaneous operation of several bus masters (eng. bus-master, the so-called competitive mode), as well as the appearance of universal expansion cards capable of operating both in slots using a voltage of 5 volts and in slots using 3 .3 volts (with a frequency of 33 and 66 MHz, respectively). Peak throughput for 33 MHz is 133 MB/s and for 66 MHz is 266 MB/s.

Version 2.1 - work with cards designed for voltage of 3.3 volts and the presence of appropriate power lines were optional.
Version 2.2 - expansion cards made in accordance with these standards have a universal power connector key and are able to work in many later varieties of PCI bus slots, and also, in some cases, in version 2.1 slots.
Version 2.3 - Not compatible with PCI cards designed to use 5 volts, despite continued use of 32-bit 5-volt keyed slots. Expansion cards have a universal connector, but are not able to work in 5-volt slots early versions(up to 2.1 inclusive).
Version 3.0 - completes the transition to 3.3 volt PCI cards, 5 volt PCI cards are no longer supported.

PCI 64

An extension to the core PCI standard introduced in version 2.1 that doubles the number of data lanes, and thus the bandwidth. The PCI 64 slot is an extended version of the regular PCI slot. Formally, the compatibility of 32-bit cards with 64-bit slots (provided there is a common supported signal voltage) is complete, while the compatibility of a 64-bit card with 32-bit slots is limited (in any case, there will be a performance loss). Operates at a clock frequency of 33 MHz. Peak bandwidth - 266 MB / s.

Version 1 - uses a 64-bit PCI slot and a voltage of 5 volts.
Version 2 - uses a 64-bit PCI slot and a voltage of 3.3 volts.

PCI 66

PCI 66 is a 66 MHz evolution of PCI 64; uses a voltage of 3.3 volts in the slot; cards have a universal or 3.3 V form factor. Peak throughput is 533 MB/s.

PCI 64/66

The combination of PCI 64 and PCI 66 allows four times the data transfer rate compared to the base PCI standard; uses 64-bit 3.3-volt slots compatible only with universal ones, and 3.3-volt 32-bit expansion cards. PCI64/66 cards have either universal (but limited compatibility with 32-bit slots) or 3.3-volt form factor (the latter option is fundamentally incompatible with 32-bit 33 MHz slots of popular standards). Peak bandwidth - 533 MB / s.

PCI-X

PCI-X 1.0 is an expansion of the PCI64 bus with the addition of two new operating frequencies, 100 and 133 MHz, as well as a separate transaction mechanism to improve performance when multiple devices are working simultaneously. Generally backwards compatible with all 3.3V and universal PCI cards. PCI-X cards are usually made in 64-bit 3.3 format and have limited backward compatibility with PCI64/66 slots, and some PCI-X cards are in universal format and are able to work (although this has almost no practical value) in the usual PCI 2.2/2.3. In complex cases, in order to be completely confident in the performance of the combination of the motherboard and expansion card, you need to look at the compatibility lists (compatibility lists) of the manufacturers of both devices.

PCI-X 2.0

PCI-X 2.0 - further expansion of PCI-X 1.0 capabilities; frequencies 266 and 533 MHz have been added, as well as parity error correction during data transmission (ECC). Allows splitting into 4 independent 16-bit buses, which is used exclusively in embedded and industrial systems; the signal voltage is reduced to 1.5 V, but the connectors are backward compatible with all cards using a 3.3 V signal voltage. Currently, for the non-professional segment of the market for high-performance computers (powerful entry level) that use the PCI-X bus, there are very few motherboards that support the bus. An example of a motherboard for this segment is the ASUS P5K WS. In the professional segment, it is used in RAID controllers, in SSD drives for PCI-E.

Mini PCI

Form factor PCI 2.2, intended for use mainly in laptops.

PCI Express

PCI Express, or PCIe, or PCI-E (also known as 3GIO for 3rd Generation I/O; not to be confused with PCI-X and PXI) - computer bus(although it is not a bus at the physical layer, being a point-to-point connection) using programming model PCI bus and high performance physical protocol based on serial communication. The development of the PCI Express standard has been started by Intel after the abandonment of the InfiniBand bus. Officially, the first basic PCI Express specification appeared in July 2002. The PCI Special Interest Group is involved in the development of the PCI Express standard.

Unlike the PCI standard, which used a common bus for data transfer with several devices connected in parallel, PCI Express, in general, is a packet network with star topology. PCI Express devices communicate with each other through a medium formed by switches, with each device directly connected by a point-to-point connection to the switch. In addition, the PCI Express bus supports:

hot swapping of cards;
guaranteed bandwidth (QoS);
energy management;
integrity control of transmitted data.

The PCI Express bus is intended to be used as a local bus only. Since the PCI Express software model is largely inherited from PCI, existing systems and controllers can be modified to use the PCI Express bus by replacing only physical layer, without modification software. The high peak performance of the PCI Express bus allows it to be used instead of AGP buses, and even more so PCI and PCI-X. The de facto PCI Express has replaced these buses in personal computers.

MiniCard (Mini PCIe) is a replacement for the Mini PCI form factor. Buses are displayed on the Mini Card connector: x1 PCIe, 2.0 and SMBus.
- M.2 - second Mini version PCIe, up to x4 PCIe and SATA.
ExpressCard - Similar to the PCMCIA form factor. The x1 PCIe and USB 2.0 buses are output to the ExpressCard connector, ExpressCard cards support hot plugging.
AdvancedTCA, MicroTCA - form factor for modular telecommunications equipment.
Mobile PCI Express Module (MXM) is an industrial form factor created for laptops by NVIDIA. It is used to connect graphics accelerators.
Cable specifications PCI Express allow you to bring the length of one connection to tens of meters, which makes it possible to create a computer, the peripherals of which are located at a considerable distance.
StackPC - specification for building stackable computer systems. This specification describes the StackPC , FPE expansion connectors and their relative position.

Despite the fact that the standard allows x32 lines per port, such solutions are physically cumbersome and are not available.

Year release	Version PCI Express	Coding	Speed transmission	Bandwidth per x lines
Year release	Version PCI Express	Coding	Speed transmission	×1	×2	×4	×8	×16
2002	1.0	8b/10b	2.5 GT/s	2	4	8	16	32
2007	2.0	8b/10b	5 GT/s	4	8	16	32	64
2010	3.0	128b/130b	8 GT/s	~7,877	~15,754	~31,508	~63,015	~126,031
2017	4.0	128b/130b	16 GT/s	~15,754	~31,508	~63,015	~126,031	~252,062
2019	5.0	128b/130b	32 GT/s	~32	~64	~128	~256	~512

PCI Express 2.0

The PCI-SIG released the PCI Express 2.0 specification on January 15, 2007. Main innovations in PCI Express 2.0:

Increased throughput: 500 MB/s single line bandwidth, or 5 GT/s ( Gigatransactions/s).
Improvements have been made to the transfer protocol between devices and the software model.
Dynamic speed control (to control the speed of communication).
Bandwidth Alert (to notify software of changes in bus speed and width).
Access Control Services - Optional point-to-point transaction management capabilities.
Execution timeout control.
Reset at the function level - an optional mechanism for resetting functions (eng. PCI functions) inside the device (eng. PCI device).
Power limit override (to override the slot power limit when connecting devices that consume more power).

PCI Express 2.0 is fully compatible with PCI Express 1.1 (old ones will work in motherboards with new connectors, but only at 2.5GT/s, since older chipsets cannot support double data transfer rates; newer video adapters will work without problems in old PCI Express 1.x standard slots).

PCI Express 2.1

In terms of physical characteristics (speed, connector) it corresponds to 2.0, the software part has added functions that are planned to be fully implemented in version 3.0. Since most motherboards are sold with version 2.0, having only a video card with 2.1 does not allow 2.1 mode to be enabled.

PCI Express 3.0

In November 2010, the PCI Express 3.0 version specifications were approved. The interface has a data transfer rate of 8 GT/s ( Gigatransactions/s). But despite this, its real throughput was still doubled compared to the PCI Express 2.0 standard. This was achieved thanks to the more aggressive 128b/130b encoding scheme, where 128 bits of data sent over the bus are encoded in 130 bits. At the same time, full compatibility with previous versions PCI Express. PCI Express 1.x and 2.x cards will work in slot 3.0 and vice versa, PCI Express 3.0 card will work in slots 1.x and 2.x.

PCI Express 4.0

The PCI Special Interest Group (PCI SIG) has stated that PCI Express 4.0 may be standardized before the end of 2016, but as of mid-2016, when a number of chips were already in production, media reported that standardization is expected in early 2017. It is expected that he will have a bandwidth of 16 GT / s, that is, it will be twice as fast as PCIe 3.0.

Leave your comment!

The PCI Express standard is one of the foundations modern computers. PCI Express slots have long taken a firm place on any desktop computer motherboard, supplanting other standards such as PCI. But even the PCI Express standard has its own varieties and connection patterns that differ from each other. On new motherboards, starting around 2010, you can see a whole bunch of ports on one motherboard, labeled as PCIe or PCI-E, which may differ in the number of lines: one x1 or several x2, x4, x8, x12, x16 and x32.

So, let's find out why there is such confusion among the seemingly simple PCI Express peripheral port. And what is the purpose of each PCI Express x2, x4, x8, x12, x16 and x32 standard?

What is a PCI Express bus?

Back in the 2000s, when the aging PCI (extended - interconnection of peripheral components) standard took place to PCI Express, the latter had one huge advantage: instead of a serial bus, which was PCI, a point-to-point access bus was used. This meant that each individual PCI port and the cards installed in it could make full use of the maximum bandwidth without interfering with each other, as happened when connected to PCI. In those days, the number of peripherals inserted into expansion cards was plentiful. Network cards, audio cards, TV tuners and so on - all required a sufficient amount of PC resources. But unlike the PCI standard, which used a common bus for data transfer with several devices connected in parallel, PCI Express, if considered in general, is a packet network with a star topology.

PCI Express x16, PCI Express x1 and PCI on one board

In layman's terms, imagine your desktop PC as a small store with one or two salespeople. The old PCI standard was like a grocery store: everyone was waiting in line to be served, having problems with the speed of service, limited to one seller behind the counter. PCI-E is more like a hypermarket: each customer moves for groceries along their own individual route, and several cashiers take orders at the checkout at once.

Obviously, in terms of speed of service, the hypermarket outperforms a regular store by several times, due to the fact that the store cannot afford the throughput of more than one seller with one checkout.

Also with dedicated data lanes for each expansion card or built-in motherboard components.

The impact of the number of lines on throughput

Now, to expand on our store and hypermarket metaphor, imagine that each department of the hypermarket has its own cashiers reserved just for them. This is where the idea of multiple data lanes comes in.

PCI-E has gone through many changes since its inception. Currently, new motherboards usually use version 3 of the standard, with the faster version 4 becoming more common, with version 5 expected in 2019. But different versions use the same physical connections, and these connections can be made in four basic sizes: x1, x4, x8 and x16. (x32 ports exist, but are extremely rare on regular computer motherboards).

The different physical sizes of the PCI-Express ports make it possible to clearly distinguish them by the number of simultaneous connections with motherboard: the larger the port physically, the more maximum connections it can transfer to the card or vice versa. These compounds are also called lines. One line can be thought of as a track consisting of two signal pairs: one for sending data and the other for receiving.

Various versions of the PCI-E standard allow you to use different speeds on every lane. But generally speaking, the more lanes there are on a single PCI-E port, the faster data can flow between the peripheral and the rest of the computer.

Returning to our metaphor: if we are talking about one seller in the store, then the x1 lane will be this only seller serving one client. A store with 4 cashiers already has 4 lines x4. And so on, you can paint cashiers by the number of lines, multiplying by 2.

Various PCI Express cards

Device types using PCI Express x2, x4, x8, x12, x16, and x32

For the PCI Express 3.0 version, the total maximum data transfer rate is 8 GT / s. In reality, the speed for the PCI-E 3 version is slightly less than one gigabyte per second per lane.

Therefore, a device using a PCI-E x1 port, such as a low-power sound card or a Wi-Fi antenna will be able to transmit data at a maximum speed of 1 Gbps.

A card that physically fits into a larger slot - x4 or x8, for example, a USB 3.0 expansion card will be able to transfer data four or eight times faster, respectively.

The transfer rate of PCI-E x16 ports is theoretically limited to a maximum bandwidth of about 15 Gbps. This is more than enough in 2017 for all modern graphics cards developed by NVIDIA and AMD.

Most discrete graphics cards use a PCI-E x16 slot

The PCI Express 4.0 protocol allows you to use already 16 GT / s, and PCI Express 5.0 will use 32 GT / s.

But currently there are no components that can use this amount of bandwidth with maximum bandwidth. Modern high-end graphics cards typically use the x16 PCI Express 3.0 standard. It makes no sense to use the same bandwidths for a network card that will use only one lane on an x16 port, since the Ethernet port is only capable of transmitting data up to one gigabit per second (which, about one-eighth of the bandwidth of a single PCI-E lane - remember: eight bits in one byte).

It is possible to find PCI-E SSDs on the market that support the x4 port, but they look like they will soon be superseded by the rapidly evolving new M.2 standard. for SSDs that can also use the PCI-E bus. high quality network cards and enthusiast hardware such as RAID controllers use a mix of x4 and x8 formats.

Port sizes and PCI-E lanes may vary

This is one of the more confusing tasks of PCI-E: a port can be made in the x16 form factor, but not have enough lanes to pass data, for example, only x4. This is because even though PCI-E can carry an unlimited number of individual connections, there is still a practical limit to the bandwidth of the chipset. Cheaper motherboards with more budget chipsets may only have one x8 slot, even though that slot can physically accommodate an x16 form factor card.

In addition, gamer-focused motherboards include up to four full x16 PCI-E slots and as many lanes for maximum throughput.

Obviously this can cause problems. If the motherboard has two x16 slots, but one of them has only x4 strips, then connecting a new graphic card will reduce the performance of the first one by as much as 75%. This, of course, is only a theoretical result. The architecture of the motherboards is such that you will not see a sharp drop in performance.

The correct configuration of two graphics cards must use exactly two x16 slots if you want the maximum comfort from a tandem of two video cards. The manual at the office will help you find out how many lines on your motherboard this or that slot has. manufacturer's website.

Sometimes manufacturers even mark the number of lines on the textolite of the motherboard next to the slot.

One thing to be aware of is that a shorter x1 or x4 card can physically fit into a longer x8 or x16 slot. The contact configuration of the electrical contacts makes this possible. Naturally, if the card is physically larger than the slot, then it will not work to insert it.

So remember, when buying expansion cards or upgrading current ones, you must always remember both the size of the PCI Express slot and the number of required lanes.

If you ask which interface should be used for an NVMe-enabled SSD, then anyone (who knows what NVMe is) will answer: of course PCIe 3.0 x4! True, with the justification, he is likely to have difficulties. At best, we get the answer that such drives support PCIe 3.0 x4, and the interface bandwidth matters. It has something, but all the talk about it began only when it became crowded for some drives in some operations within the framework of the "regular" SATA. But between its 600 MB/s and the (equally theoretical) 4 GB/s of the PCIe 3.0 x4 interface is just an abyss filled with a lot of options! What if one PCIe 3.0 line is enough, since it is already one and a half times more than SATA600? Fuels are added to the fire by controller manufacturers who threaten to switch to PCIe 3.0 x2 in budget products, as well as the fact that many users do not have such and such. More precisely, theoretically there are, but you can release them only by reconfiguring the system or even changing something in it, which you don’t want to do. But buy a top solid state drive- I want to, but there are fears that there will be absolutely no benefit from this (even moral satisfaction from the results of test utilities).

But is it so or not? In other words, is it really necessary to focus exclusively on the supported mode of operation - or is it still possible in practice give up principles? That is what we decided to check today. Let the check be quick and not claiming to be exhaustive, but the information received should be enough (as it seems to us) at least to think ... In the meantime, let's briefly get acquainted with the theory.

PCI Express: existing standards and their bandwidth

Let's start with what PCIe is and how fast this interface works. Often it is called a "bus", which is somewhat ideologically incorrect: as such, there is no bus to which all devices are connected. In fact, there is a set of point-to-point connections (similar to many other serial interfaces) with a controller in the middle and devices attached to it (each of which can itself be a next level hub).

The first version of PCI Express appeared almost 15 years ago. Orientation to use inside the computer (often within the same board) made it possible to make the standard high-speed: 2.5 gigatransactions per second. Since the interface is serial and full duplex, a single PCIe lane (x1; actually an atomic unit) provides data transfer at speeds up to 5 Gbps. However, in each direction - only half of this, i.e. 2.5 Gb / s, and this is the full speed of the interface, and not "useful": to improve reliability, each byte is encoded with 10 bits, so the theoretical bandwidth of one PCIe line 1.x is approximately 250 MB/s each way. In practice, you still need to transfer service information, and as a result, it is more correct to talk about ≈200 MB / s of user data transfer. Which, however, at that time not only covered the needs of most devices, but also provided a solid supply: suffice it to recall that the predecessor of PCIe in the mass system interfaces, namely the PCI bus, provided a throughput of 133 MB / s. And even if we consider not only mass implementation, but also all PCI options, then the maximum was 533 MB / s, and for the entire bus, i.e. such a PS was divided into all devices connected to it. Here, 250 MB / s (since PCI usually gives full, not useful bandwidth) per line - in exclusive use. And for devices that need more, the possibility of aggregating several lines into a single interface was initially provided, by powers of two - from 2 to 32, i.e. the x32 version provided by the standard in each direction could already transmit up to 8 GB / s. In personal computers, x32 was not used due to the complexity of creating and breeding the corresponding controllers and devices, so the variant with 16 lines became the maximum. It was used (and is still used) mainly by video cards, since most devices do not need so much. In general, a considerable number of them and one line is enough, but some successfully use both x4 and x8: just on the storage topic - RAID controllers or SSDs.

Time did not stand still, and about 10 years ago, the second version of PCIe appeared. The improvements were not only about speeds, but a step forward was also taken in this regard - the interface began to provide 5 gigatransactions per second while maintaining the same coding scheme, i.e., the throughput doubled. And it doubled again in 2010: PCIe 3.0 provides 8 (instead of 10) gigatransactions per second, but the redundancy has decreased - now 130 is used to encode 128 bits, and not 160, as before. In principle, the PCIe 4.0 version with the next doubling of speeds is already ready to appear on paper, but in the near future we are unlikely to see it massively in hardware. In fact, PCIe 3.0 is still used in many platforms in conjunction with PCIe 2.0, because the performance of the latter is simply ... not needed for many applications. And where it is needed, the good old method of line aggregation works. Only each of them has become four times faster over the past years, i.e. PCIe 3.0 x4 is PCIe 1.0 x16, the fastest slot in mid-zero computers. This option is supported by top SSD controllers, and it is recommended to use it. It is clear that if such an opportunity exists - a lot is not enough. And if she is not? Will there be any problems, and if so, which ones? This is the question we have to deal with.

Test Methodology

Run tests with different versions The PCIe standard is not difficult: almost all controllers allow you to use not only the one they support, but also all earlier ones. It's more difficult with the number of lanes: we wanted to directly test variants with one or two PCIe lanes. The Asus H97-Pro Gamer board we usually use on Intel chipset H97 complete set does not support, but in addition to the “processor” x16 slot (which is usually used), it has another one that works in PCIe 2.0 x2 or x4 modes. We took advantage of this trio, adding to it the PCIe 2.0 mode of the “processor” slot in order to assess whether there is a difference. Still, in this case, there are no extraneous “intermediaries” between the processor and the SSD, but when working with the “chipset” slot, there is: the chipset itself, which is actually connected to the processor by the same PCIe 2.0 x4. We could add a few more modes of operation, but we were still going to conduct the main part of the study on a different system.

The fact is that we decided to take this opportunity and at the same time check one "urban legend", namely, the belief about the usefulness of using top-end processors for testing drives. So we took the eight-core Core i7-5960X - a relative of the Core i3-4170 usually used in tests (these are Haswell and Haswell-E), but which has four times as many cores. In addition, the Asus Sabertooth X99 board found in the bins is useful to us today by the presence of a PCIe x4 slot, which in fact can work as x1 or x2. In this system, we tested three x4 variants (PCIe 1.0/2.0/3.0) from the processor and chipset PCIe 1.0 x1, PCIe 1.0 x2, PCIe 2.0 x1 and PCIe 2.0 x2 (in all cases, chipset configurations are marked on the diagrams with the icon (c)). Does it make sense now to turn to the first version of PCIe, given the fact that there is hardly a single board that supports only this version of the standard and can boot from an NVMe device? From a practical point of view, no, but to check a priori the expected ratio of PCIe 1.1 x4 = PCIe 2.0 x2 and the like, it will come in handy for us. If the test shows that the bus scalability corresponds to the theory, then it does not matter that we have not yet been able to get practically meaningful ways PCIe 3.0 x1 / x2 connections: the first will be identical to just PCIe 1.1 x4 or PCIe 2.0 x2, and the second - PCIe 2.0 x4. And we have them.

In terms of software, we limited ourselves only to Anvil’s Storage Utilities 1.1.0: it measures various low-level characteristics of drives quite well, but we don’t need anything else. On the contrary: any influence of other components of the system is extremely undesirable, so low-level synthetics have no alternative for our purposes.

As a "working body" we used a 240 GB Patriot Hellfire. As it was found during testing, this is not a performance record holder, but its speed characteristics are quite consistent with the results. best SSD same class and same capacity. Yes, and there are already slower devices on the market, and there will be more of them. In principle, it will be possible to repeat the tests with something faster, however, as it seems to us, there is no need for this - the results are predictable. But let's not get ahead of ourselves, but let's see what we got.

Test results

When testing Hellfire, we noticed that the maximum speed on sequential operations can only be “squeezed” out of it by a multi-threaded load, so this should also be taken into account for the future: the theoretical throughput is theoretical, because “real” data, received in different programs according to different scenarios, will depend more not on it, but on these very programs and scenarios - in the case, of course, when force majeure circumstances do not interfere :) Just such circumstances we are now observing: it has already been said above that PCIe 1.x x1 is ≈200 MB/s, and that's what we're seeing. Two PCIe 1.x lanes or one PCIe 2.0 lane is twice as fast, and that's exactly what we're seeing. Four PCIe 1.x lanes, two PCIe 2.0 lanes, or one PCIe 3.0 lane is twice as fast, which was confirmed for the first two options, so the third is unlikely to be different. That is, in principle, scalability, as expected, is ideal: the operations are linear, Flash copes with them well, so the interface matters. Flash stops do well to PCIe 2.0 x4 for writing (so PCIe 3.0 x2 will do). Reading "may" more, but the last step already gives one and a half, and not two (as it potentially should be) increase. We also note that there is no noticeable difference between the chipset and processor controllers, and also between the platforms. However, LGA2011-3 is a little ahead, but only a little.

Everything is smooth and beautiful. But templates do not tear: the maximum in these tests is only a little more than 500 MB / s, and even SATA600 or (in the appendix to today's testing) PCIe 1.0 x4 / PCIe 2.0 x2 / PCIe 3.0 x1. That's right: do not be afraid of the release of budget controllers for PCIe x2 or the presence of only so many lines (and the version of the 2.0 standard) in the M.2 slots on some boards, when more is not needed. Sometimes so much is not needed: the maximum results are achieved with a queue of 16 commands, which is not typical for mass software. More often there is a queue with 1-4 commands, and for this you can get by with one line of the very first PCIe and even the very first SATA. There are overheads and such, though, so a quick interface is useful. However, too fast - perhaps not harmful.

And in this test, the platforms behave differently, and with a single command queue, they behave fundamentally differently. The "trouble" is not at all that many cores are bad. They are still not used here, except perhaps one, and not so much that the boost mode unfolds with might and main. So we have a difference of about 20% in the frequency of the cores and one and a half times in the cache memory - in Haswell-E it operates at a lower frequency, and not synchronously with the cores. In general, the top platform can only be useful for kicking out the maximum "yops" through the most multi-threaded mode with a large command queue depth. The only pity is that from the point of view practical work this is a very spherical synthetic in a vacuum :)

On the record, the state of affairs has not fundamentally changed - in every sense. But, funny, on both systems, the PCIe 2.0 x4 mode in the “processor” slot turned out to be the fastest. On both! And with multiple checks/rechecks. At this point, you might wonder if you need these are your new standards Or is it better not to rush anywhere at all ...

When working with blocks of different sizes, the theoretical idyll breaks down that increasing the speed of the interface still makes sense. The resulting numbers are such that a couple of PCIe 2.0 lanes would be enough, but in reality, in this case, the performance is lower than that of PCIe 3.0 x4, albeit not at times. And in general here budget platform the top one "scores" to a much greater extent. But just such operations are mainly found in application software, i.e. this diagram is the closest to reality. As a result, there is nothing surprising that thick interfaces and trendy protocols do not give any “wow effect”. More precisely, those who are passing from mechanics will be given, but exactly the same as any solid-state drive with any interface will provide it.

Total

To make it easier to perceive the picture of the hospital as a whole, we used the score given by the program (total - for reading and writing), normalizing it according to the PCIe 2.0 x4 "chipset" mode: this moment it is he who is the most widely available, since it is found even on LGA1155 or AMD platforms without the need to "offend" the video card. In addition, it is equivalent to PCIe 3.0 x2, which budget controllers are preparing to master. And on the new AMD AM4 platform, again, this particular mode can be obtained without affecting the discrete video card.

So what do we see? The use of PCIe 3.0 x4, when possible, is certainly preferred, but not necessary: Mid-range NVMe drives (in their original top segment) it brings literally 10% additional performance. And even then - due to operations, in general, not so often encountered in practice. Why in this case is this option implemented? Firstly, there was such an opportunity, but the pocket does not pull the stock. Secondly, there are drives and faster than our test Patriot Hellfire. Thirdly, there are such areas of activity where loads that are “atypical” for a desktop system are just quite typical. And it is there that the performance of the storage system is most critical, or at least the ability to make part of it very fast. But to ordinary personal computers this is all irrelevant.

In them, as we can see, the use of PCIe 2.0 x2 (or, accordingly, PCIe 3.0 x1) does not lead to a dramatic decrease in performance - only by 15-20%. And this is despite the fact that in this case we limited the potential capabilities of the controller by four times! For many operations, this throughput is enough. Here, one PCIe 2.0 lane is no longer enough, so it makes sense for controllers to support exactly PCIe 3.0 - and in conditions of a severe shortage of lanes in modern system this will work well. In addition, the x4 width is useful - even if there is no support for modern PCIe versions in the system, it will still allow you to work at normal speed (albeit slower than it could potentially), if there is a more or less wide slot.

Basically, a large number of scenarios in which the flash memory itself turns out to be the bottleneck (yes, this is possible and inherent not only in mechanics), leads to the fact that the four lanes of the third version of PCIe on this drive overtake the first one by about 3.5 times - the theoretical throughput These two cases differ by 16 times. From which, of course, it does not follow that you need to rush to master very slow interfaces - their time has gone forever. It's just that many of the features of fast interfaces can only be implemented in the future. Or under the conditions regular user conventional computer will never directly collide in life (with the exception of those who like to measure themselves with what they know). Actually, that's all.

Operating modes system buses PCI and ISA are very important. Setting incorrect values can lead to unstable operation of expansion cards and conflicts between them. Options Location - Item CHIPSET FEATURES SETUP Advanced(AWARD BIOS 6.0), Advanced Chipset Features

PCI 2.1 Support- Support for specification 2.1 of the PCI bus. For all modern computers, this mode should be enabled. (enabled). An exception is possible only if your computer has older PCI bus expansion cards that do not support this specification. But then some PCI cards will refuse to work.

CPU to PCI Write Buffer- use of the buffer when transferring data from the processor to the PCI bus. Inclusion (enabled) This mode has a positive effect on the speed of the computer.

PCI Pipeline (PCI Pipelining)- inclusion (enabled) This option combines the accumulation of data from the processor to the PCI bus with their pipeline processing, which naturally increases performance.

PCI Dynamic Bursting- enabling burst mode of data transfer via the PCI bus. To improve performance, this option must be enabled (Enabled).

PCI Master About WS Write- disable the delay in the exchange between master devices on the PCI bus and RAM. When turned on (enabled) this mode increases the overall performance of the computer, but in case of unstable operation of expansion cards, this option will have to be turned off (Disabled).

Delayed Transaction (PCI Delay Transaction)- Enabling this option allows you to access both slow ISA cards and fast PCI cards simultaneously, which significantly increases the overall performance. Disabling this option makes it impossible to access devices using the PCI bus during access to cards connected to the ISA bus. Naturally, when using ISA cards in your computer, this parameter must be enabled (Enabled).

Peer Concurrency- allows parallel operation of several devices connected to the PCI bus. Naturally, to ensure maximum performance, the parameter must be enabled. (enabled). But not all expansion cards - especially older ones - support this feature. If after enabling this option you encounter unstable computer operation, specify the value disabled.

Passive Release- Allows parallel operation of PCI and ISA buses. Inclusion (enabled) This option has a positive effect on the performance of the computer.

PCI Latency Timer- the maximum number of PCI bus cycles during which a device connected to this bus can keep the bus busy in case another device also needs access to the bus. It is generally allowed to hold the bus for 32 cycles. If individual expansion cards show error messages or become unstable, increase this value.

16 Bit I/O Recovery Time- indicates the delay in cycles after issuing a read or write request and the operation itself for sixteen-bit expansion cards connected to the ISA bus. For starters, you can try to set the minimum delay to 1 clock. If errors occur when working with such devices, increase the delay (maximum 4 cycles). If no 16-bit expansion cards are connected to the ISA bus at all, you can specify the value NA .

AGP bus and video cards

Options Location - Menu Items BIOS FEATURES SETUP, CHIPSET FEATURES SETUP And INTEGRATED PERIPHERALS(AWARD BIOS 4.51PG and AMIBIOS 1.24), Advanced(AWARD BIOS 6.0), Advanced Chipset Features And Integrated Peripherals(AWARD BIOS 6.0PG and AMIBIOS 1.45).

AGP Aperture Size (Graphics Aperture Size, Graphics Windows Size)- maximum size random access memory, which can be used to store textures for an AGP graphics card. As a rule, the optimal allocation is 64 MB.

AGP-2X (4X, 8X) Mode (AGP 4X Supported, AGP 8X Supported)- support for AGP2x mode (4X, 8X). This parameter should be set only if your video card connected to the AGP bus is able to work in these modes without any problems. For all modern graphics cards support must be enabled (Enabled).

AGP Mode (AGP Capability)- allows you to specify the AGP mode to use. All modern video cards should have 8X mode enabled.

AGP Master1 WS Write- adding one waiting cycle when writing data via the AGP bus. Typically, this is not necessary and this option better off (disabled), and only if the video card became unstable after that, artifacts appeared, especially in games, turn on (enabled) additional wait cycle.

AGP Fast Write- actually the same as option AGP Master1 WS Write. When turned on (enabled) this option, data is written without delay, when you turn off (disabled) one wait cycle is added.

AGP Master1 WS Read- adding one waiting cycle when reading data via the AGP bus. The recommendations are the same.

AGP to DRAM Prefetch- enable prefetch mode, when the next data is read automatically. Usage (enabled) This option improves performance.

PCI/VGA Palette Snoop- allows you to synchronize the colors of the video card and the image captured using the video input / output card (video editing card). If video colors are displayed incorrectly when capturing, enable the option (Enabled).

Assign IRQ For VGA- Enabling this option instructs to reserve an interrupt for the video card. Although most modern video cards do not need a separate interrupt, it is still better to enable this option in terms of compatibility and stability. (enabled). And only in case of a shortage of free interrupts (with a large number of expansion cards), you can try to abandon the reservation (Disabled).

Thematic materials: