Monday, November 26, 2012

Data Center Transformation: Hierarchical Network to Flat Network

As I read literature on Data Center Networks with respect to enormous increase in data loads and virtualization of servers, I see that market is trending towards data center network architectures which are flat in nature. I hear a term called “Fabric” to refer to data center networks. In this post, I will try to express my understandings and opinions on this concept of transformation of Data center networks from 3-tier to flat.

Three-tier Network Architecture - Current Data center network

Network architecture that is dominant in current data centers is Three-tier network architecture. Most of current data center networks are built on this architecture. By three tiers, we mean access switches/Top-of-Rack (ToR) switches, or modular/End-of-Row (EoR) switches that connect to servers and IP based storage. These access switches are connected via Ethernet to aggregation switches. The aggregation switches are connected into a set of core switches or routers that forward traffic flows from servers to an intranet and internet, and between the aggregation switches. Typically this can be depicted as follows:

As you can see some blocked links, it is apparent that these links are blocked because of Spanning Tree Protocol (STP) running in the network.

For detailed connections with focus on Access(TOR/EOR) switches connected to servers, you can always refer to my previous post which shows a beautiful picture of interconnections.

In this 3-tier architecture, it is common that VLANs are constructed within access and aggregation switches, while layer 3 capabilities in the aggregation or core switches, route between them. Within the high-end data center market, where the number of servers is in the thousands to tens of thousands and east-west bandwidth(intra-server traffic) is significant, and also where applications need a single layer 2 domain, the existing Ethernet or Layer 2 capabilities within this tiered architecture do not meet emerging demands. When I say, Layer 2 capabilities, I mainly refer to Spanning-Tree protocol which keeps the network connected without any loops.

STP..STP…STP.. I thought it was good…what happened?

Radia Perlman created the Spanning Tree algorithm, which became part of the Spanning Tree Protocol (STP), to solve issues such as loops. Ms. Perlman certainly doesn’t need me to come to the defense of Spanning Tree–but I will. I like Spanning Tree, because it works. I would say that in at least 40% of the networks I see, Spanning Tree has never been changed from its default settings, but it keeps the network up, while at the same time providing some redundancy.

However, while STP solves significant problems,it also forces a network design that isn’t optimized for many of today’s data center requirements. For instance, STP paths are determined in a north-south tree, which forces traffic to flow from a top-of-rack switch out to a distribution switch and then back in again to another top-of-rack switch. By contrast, an eastwest path directly between the two top-of-rack switches would be more efficient, but this type of path isn’t allowed under STP. The original 802.1D Spanning Tree can take up to 52 seconds to fail to a redundant link. RSTP (802.1w) is much faster, but can still take up to 6 seconds to converge. It’s an improvement, but six seconds can still be an eternity in the data center.

So, what is needed???

The major problems that need to be solved in current networks which use spanning tree topologies are :

poor path optimization
failover timing
limited or expensive reachability
latency.

Simply put, we need to be able to reach any machine, wherever it is in the network,while using the best path through the LAN to do so. This will lower latency, provide access to more bandwidth and provide better ROI for the network infrastructure in the data center. If a device fails, we want to recover immediately and reroute traffic to redundant links.

How existing tiered architecture needs to be changed?

One way to design a scalable data center fabric is often called a “fat-tree” and has two kinds of switches; one that connects servers and the second that connect switches creating a non-blocking, low latency fabric. We use the terms ‘leaf’ switch to denote server connecting switches and ‘spine’ to denote switches that connect leaf switches. Together, a leaf and spin architecture create a scalable data center fabric. Another design is to connect every switch together in a full mesh, with every server being one hop away from each other. I know a picture can help here quite a lot….

How this flat network helps in DC networks???

The virtualization and consolidation of servers and workstations causes significant changes in network traffic, forcing IT to reconsider the traditional three-tier network design in favor of a flatter configuration. Tiered networks were designed to route traffic flows from the edge of the network through the core and back, which introduces choke points and delay while providing only rudimentary redundancy.

Enter the flat network. This approach, also called a fabric, allows for more paths through the network, and is better suited to the requirements of the data center, including the need to support virtualized networking, VM mobility, and high-priority storage traffic on the LAN such as iSCSI and FCoE. A flat network aims to minimize delay and maximize available bandwidth while providing the level of reachability demanded in a virtual world.

Don’t think flat network is Eutopia…

It is all not ready made or ready to deploy... a flat network also requires some tradeoffs, including the need to rearchitect your data center LAN and adopt either new standards such as TRILL (Transparent Interconnection of Lots of Links) and SPB (Shortest Path Bridging),or proprietary, vendor-specific approaches. It is a debate on how many people in the industry are willing to go for this rearchitecture. I could access a survey in this regard:

Commercial Sample Leaf & Spine Architecture

A commercial Leaf and Spine architecture built using Dell Force10 switches can be shown as follows. In this design Force 10 products are used.

Spine Switches – 4 switches – Z9000 (32 x 40GE)
Leaf Switches – 32 switches – S4810 (48 x10GE)

You can see that each S4810 switch has connections to four Z9000 switches. That is, each switch in Leaf network has multiple paths(4 paths) to reach spine network.

Conclusion….

These kind of flat networks are being proposed now a days to solve problems with Traditional STP based data centers. While it is not a simple decision to go from 3-tier to flat network, flat networks are gaining momentum. With server virtualization becoming more prominent in current data centers, several other technologies related to this Leaf & Spine architecture need to be considered for evaluating whether a network needs to go flat or not…These technologies mainly include Layer2 Multipathing Technologies such as TRILL,SPB,M-LAG, VCS etc which changed equations of typical STP based topologies..Also need to understand several Layer Extension technologies which are gaining prominence because of virtualization – NVGRE, VXLAN, CISCO OTV.. Another buzzword now a days I see is SDN(Software Defined Networking).. All these aspects need to be understood thoroughly for adopting new generation virtual networks for Data Centers…

[My next post contains my take on virtual networks with emphasis on L2 Multipathing, L2 extension and SDN]

Monday, November 19, 2012

Impact of Server Virtualization on Networking - 5

Port extension Technology

VEPA raised some issues which are being tackled by port extension technologies. There are two standards corresponding to port extension technologies – IEEE 802.1qbh and IEEE 802.1 BR. Among these, IEEE 802.1 qbh has been withdrawn by IEEE on September 10^th, 2011 while IEEE 802.1 BR is active.

Some years ago, Data center Networking got a new concept introduced by CISCO called – “Fabric Extenders”. Cisco used the term ‘fabric extender’ while IEEE uses terms ‘port extender’. Honestly, being marketing friendly – I like the term ‘fabric extender’.

Typically port extender technology connects Servers to Controlling switch(Edge switch) as shown follows:

Cisco’s proprietary technology used in its FEX products became the basis for 802.1Qbh, an IEEE draft that is supposed to standardize the port extender architecture.

The core ideas behind 802.1Qbh are very simple:

After power-up, the port extender finds its controlling bridge (connected to theupstream port)
Port extender tells the controlling bridge how many ports it has;
The controlling bridge creates a logical interface for each port extender port and associates a tag value with it;
Port extender tags all packets received trough its ports with tags assigned by the controlling bridge;

Here the concept of tags comes in order to segregate each logical interface.

The external network switch connects to an external port extender using logical E-channels .These logical channels appear as virtual ports in the external network switch. Because the port extender has limited functionality, the external network switch manages all the virtual ports and their associated traffic.

Port extenders either use existing proprietary Cisco technology with VN-tags or will use the upcoming E-tag from the draft IEEE 802.1 BR Port Extension specification. The E-tag is longer than the VN-tag. It has different field definitions and different field locations but serves the same purpose.

Port extenders use the information in VN-tags or 802.1 BR E-tags to:

• Map the physical ports on the port extenders as virtual ports on the upstream switches

• Control how they forward frames to or from upstream switches

• Control how they replicate broadcast or multicast traffic

Here is a pic depicting both CISCO VN-Tag and E-Tag(802.1BR)

So, How did Port Extender solve Network Management Visibility problem on VM traffic?

All this funda of Port extender started because with VEPA etc problem of management visibility into VM traffic came up. Introduction of Port Extension technology solves this problem by by reflecting all network traffic onto a central controlling bridge. This gives network administrators full access and control but at the cost of bandwidth and latency.

Hmmm...But.. There are problems with Port Extension Technologies

Port extension technology adds one or more extra hops to the typical three-tier architecture and can magnify congestion problems
As data centers support more clustered, virtualized, and cloud-based applications requiring high performance across hundreds or thousands of physical and virtual servers, port extension technology just seems to add cost and complexity.
Remember that the pre-standard VN-tags and the IEEE 802.1-BR standard E-tags use different formats. If you adopt VN-tag solutions in your data center, you will have to develop transition strategies when future hardware changes to the IEEE 802.1-BR E-tag format.

Ok.. Now the conclusion

In past 5 posts, we discussed several aspects surrounding the impact of server virtualization on Networking. We Started with the fact that Servers with the virtue of Virtualization have hypervisor software inside them. This Hypervisor is adding another layer of software called Virtual Switch/Virtual Ethernet Bridge. This VSwitch also adds on complexities in terms of Network Management and VM mobility. Then we discussed further on different kinds of VEBs - Software VEBs and Hardware VEB(SR-IOV). Issues associated with vSwitches/VEBs are targeted to be solved through IEEE 802.1 qbg by the introduction of Edge Virtual Bridging through VEPA(Virtual Ethernet Port Aggregator) Technology and S-Channel(Multi-channel VEPA) technology. While IEEE 802.1qbg solved some problems of VSwitch, it did raise some issues which are tackled by IEEE 802.1qbh and IEEE 802.1br by introducing Port Extension Technology. While IEEE 802.1qbh was withdrawn last year, IEEE 802.1BR is active and it did solve some problems while introduced some other. So, it all comes to using these solutions effectively as per use case. It also depends on how much IT budget we have and IT needs in terms of Server requirements.

Personally, I agree with what many experts in this area say - Virtual switches won’t be going away anytime soon, but the configuration and management of these virtual network devices shouldn’t reside with the server team merely by virtue of their ownership of the underlying VM management platform. Until the technology allows virtual port management to be pulled into a comprehensive management tool, it means the network and server teams will have to share authority for the VM platform..

That ends my series of posts on "Impact of Server Virtualization on Networking" .

[I am thinking of topic for my next blog posts. Most probably I will take up the one which is interesting me now a days when I read literature on Data Center Networking.. I read quite a lot about leaf&Spine architecture, Data Center Fabric, Transformation of DC networks from Hierarchical nature to Flat.. And the most famous - movement of traffic patterns from "North-South" to "East-West"]...

Tuesday, November 13, 2012

Impact of Server Virtualization on Networking - 4

Edge Virtual Bridging (EVB) - IEEE 802.1qbg

In my previous posts, we discussed on vSwitches/VEB and SR-IOV technologies. None of the devices built upon these technologies can achieve the level of network capabilities those are built into enterprise-class L2 data center switches. Obviously, L2 data center switches are feature-rich and volumes richer in terms of capabilities. To solve the management challenges with VEBs IEEE 802.1Qbg standard is being developed. The primary goals of EVB are to combine the best of software and hardware VEBs with the best of external L2 network switches.

VEPA (Virtual Ethernet Port Aggregator) :

EVB is based on VEPA technology. This VEPA technology was proposed by HP and is taken as basis for IEEE 802.1qbg standard. First let us see what is standard VEPA. It is a way for virtual switches to send all traffic and forwarding decisions to the adjacent physical switch. This removes the burden of VM forwarding decisions and network operations from the host CPU. It also leverages the advanced management capabilities in the access or aggregation layer switches. Traffic between VMs within a virtualized server travels to the external switch and back through a reflective relay, or 180-degree turn( blue line shown in the following pic).

Do you see anything weird here? - Hairpinning

Packet sent from the same port is travelling to Edge Switch and is being received on the same port. Normally, Ethernet frames are not forwarded back out of the same interface they came in on. This action, called hairpinning, causes a loop in the network at the port. Normally, typical STP behavior prevents switch from forwarding a frame back down the port it was received on. But, for VEPA based EVB stuff, we need that phenomenon to happen. Simply, we need that hairpin turn to happen. So, some solution needs to be implemented in switch to allow such hairpin turn to be allowed.

EVB provides a standard way to solve the hairpinning problem.Basically, when a port on switch is configured as VEPA port, then standard proposes a negotiation mechanism between physical server and switch. With this negotiation, switch allows this hairpin turn.

Point to be noted here - Current Edge Switch infrastructure needs firmware update with this negotiation mechanism implemented in order to have hairpin forwarding to occur.

Good thing about VEPA based solution is that it does not require new tags and involves only slight modifications to VEB operation, primarily in frame relay support. VEPA continues to use MAC addresses and standard IEEE 802.1Q VLAN tags as the basis for frame forwarding, but changes the forwarding rules slightly according to the base EVB requirements.

That's some briefing on EVB with VEPA technology. Let's explore positives and negatives about it..

As processing overhead related to I/O traffic through vSwitch is reduced, server’s CPU and memory usage goes down. As adjacent switch performs advanced management functions as well, there is some scope to use NICs with low-cost circuitry.. Some cost cutting there…Right???
Now, Control point for VMs is moved to Edge switch(TOR/EOR). So, If some company bought a TOR/EOR switch, they do not need to change any infrastructure. VEPA leverages existing investments made in DC Edge switching.
This VEPA can also be implemented in hypervisor/ SR-IOV nic. That gives flexibility to investors to have this either in server or Edge switch.

VEPA enabled EVB technology still does not solve policy management problem across VMs that I mentioned in previous posts. So, policies attached to VMs can not still prevail during VM movement.
VEPA can also burden switches with more multicast and broadcast traffic(remember the negotiation mechanism that I mentioned for hairpinning mode).
Switches can not mix VEPA, VEB and directly accessible ports on the same port.

S-Channel Technology (Also referred as Multi-Channel VEPA):

So, we discussed what is VEPA here. But, this VEPA technology does not satisfy all use cases for which VEPA is meant. So, S-channel technology is introduced to satisfy some use case which basic VEPA did not satisfy:

Cases where Hypervisor functions require direct access to server NICs.
Cases where VMs directly would like to access Server NIC.
Cases where some VMs on the server would like to follow VEB mechanism and other VMs on the server would like to follow VEPA. So, sharing the same server NIC to allow both VEB and VEPA connections in order to optimize local, VM-to-VM performance.
Directly mapping a VM that requires promiscuous mode of operation.

So, for solving the purpose of mapping different kinds of Virtual connections on same

server NIC connection, obvious choice is to explore existing ways of segregating

same physical connection into multiple logical connections. We already have such

solution - Service VLAN tags (S-Tags) from IEEE 802.1ad. The VLAN tags let you logically separate traffic on a physical network connection or port (like a NIC device) into multiple channels. Each logical channel operates as an independent connection to the external network.

S-channel also defines two new port-based, link-level protocols:

• Channel Discovery and Configuration Protocol (CDCP) allows the switch discovery and configuration of the virtual channels. CDCP uses Link-Layer Discovery Protocol (LLDP) and enhances it for servers and external switches.
• Virtual Switch Interface Discovery Protocol (VDP) and its underlying Edge Control Protocol (ECP) provide a virtual switch interface that sends the required attributes for physical and virtual connections to the external switch. VDP/ECP also lets the external switch validate connections and provides the appropriate resources.

Obviously, a picture which depicts these agents in server as well as edge switch will make understanding much better. Picture uses 802.1qbg terminology. So, basically these protocols need to be implemented at both ends in order to have S-channel/Multi-channel VEPA to work.

How customers(server Admins/Network admins) can use S-channel?
S-channel enables complex virtual network configurations in servers using VMs. You can assign each of the logical channels to any type of virtual switch (VEB, VEPA, or directly mapped to any virtual machine within the server). This lets IT architects match their application requirements with the design of their specific network infrastructure(something as shown in this pic)

  * VEB can be used for VM-to-VM traffic. VM-to-VM Traffic do not need to hairpin now.
* VEPA/EVB for management visibility of the VM-to-VM traffic. As traffic goes to edge switch, this traffic can be monitored/managed using edge switch monitoring/management technologies.

How issues with VEPA are tackled?

VEPA raised some issues which are being tackled by Bridge port extension technologies. There are two standards for this bridge port extension :

IEEE 802.1qbh - This uses CISCO's VNTAG mechanism
IEEE 802.1BR - This uses E-Tag mechanism.

These will be explained in my next post.

[To be continued - Next post contains VN-Tag and E-Tag]

Tuesday, November 6, 2012

Impact of Server Virtualization on Networking - 3

what caused these problems precisely?

If you are an IT administrator/Network Manager/Networking SW designer/Networking silicon designer, did you ever thought of a need for something as follows:

"A need for switching traffic from multiple, independent operating systems, each with distinct IP and MAC addresses, and sharing the same physical interface. "

This is one situation Ethernet protocol itself never anticipated. Thus, virtual switch design that sits above Hypervisor which does the phenomenon specified above - will still stand as an “ad hoc solution” to the problem of VM switching.

Most of the issues came from the introduction of vSwitch which is also called Virtual Ethernet Bridge (VEB). Before going into proposed solutions, first let us explore how Virtual Ethernet Bridge works…However, remember that we are still talking about vSwitch in Server, we did not enter into any solution in edge switches.

Virtual Ethernet Bridges

As I mentioned in previous post, A Virtual Ethernet Bridge (VEB) is a virtual Ethernet switch that you implement in a virtualized server environment. It is anything that mimics a traditional external layer 2 (L2) switch or bridge for connecting VMs. VEBs can communicate between VMs on a single physical server, or they can connect VMs to the external network.

Most common implementations of VEB are software-based or hardware-based.

Software-based VEBs – Virtual Switches

In a virtualized server, the hypervisor abstracts and shares physical NICs among multiple virtual machines, creating virtual NICs for each virtual machine. For the vSwitch, the physical NIC acts as the uplink to the external network. The hypervisor implements one or more software-based virtual switches that connect the virtual NICs to the physical NICs

Data traffic received by a physical NIC passes to a vSwitch. The vSwitch uses its hypervisor-based configuration information to forward traffic to the correct VMs. When a VM transmits traffic from its virtual NIC, a vSwitch forwards the traffic in one of two :

• If the destination is external to the physical server or to a different vSwitch, the vSwitch forwards traffic to the physical NIC. (blue line in pic)

• If the destination is internal to the physical server on the same vSwitch, the vSwitch forwards the traffic directly back to another VM. (gray line in pic)

I already discussed in my previous post on what kind of problems these VEBs can cause.

Hardware-based VEBs – Single Root –I/O Virtualization enabled NICs

PCI Special Interest Group(PCI –SIG) proposed a technique called SR-IOV(Single Root – I/O Virtualization) which moves VEB functionality into an intelligent NIC instead of vSwitch. Moving VEB functionality into the hardware reduces the performance issues associated with vSwitches. SR-IOV essentially carves up an intelligent NIC into multiple virtual NICs—one for each VM.But how does it to? What does it to ? It does this by providing independent memory space, interrupts and DMA streams for each VM.

SR-IOV-enabled NICs let the virtual NICs bypass the hypervisor vSwitch by exposing the virtual NIC functions directly to the guest OS. Thus, the NIC reduces latency between the VM to the external port significantly. The hypervisor continues to allocate resources and handle exception conditions, but it doesn’t need to perform routine data processing for traffic between the VMs and the NIC.

In a VEB implemented as an SR-IOV NIC, traffic flows the same way as with a vSwitch. Traffic can switch locally inside the VEB (gray line in the picture) or go directly to the external network (blue line in the picture).

Intel mainly drove SR-IOV effort and it seemed like a promising solution when Intel proposed it several years ago, but it’s failed to gain market momentum due to poor interoperability between NICs and scalability concerns as the number of VMs per server grows. Aside from lackluster industry adoption, the problem is that each embedded bridge is yet another device to manage (no different from a software switch), and that management function is not integrated with the overall network management system. Due to implementation differences (that is, extended functions not part of the standard), different NICs may have different bridging capabilities, and these often don’t interoperate.

Ok.. Enough problems….What is the solution??

How about ditching this virtual switch which is causing all problems? But vSwitch/VEB has it’s own advantages as well(of course without those advantages, vSwitch could not have come out right?).Let me recap the things vSwitch is useful for?

· Good for intra-VM switching

· Can connect to external networking environment

· Good for deployment when there is no need for an external switch (how come?? For example, you can run a local network between a web server and a firewall application running on separate VMs within the same physical server)

Sounds reasonable?? So, we might still need VEB and able to solve issues caused by VEB. Anyhow, even if we want to completely remove VEB functionality and move bridging completely to Edge device, how about devices already in field which are using VEBs? So, that's another case where we might want to keep existing server advancements in tact and still able to solve these issues?

This is where two new IEEE standards projects come in, with work proceeding on two parallel and largely complementary paths. These two solutions involve edge devices where every good network engineer believes switching belongs.Both are amendments to the base IEEE 802.1Q VLAN tagging standard.

Edge Virtual Bridging (EVB) – IEEE 802.1Qbg :

This involves VEPA (Virtual Ethernet Port Aggregator) and S-Channel Variants.

Port Extension Technology - IEEE802.1BR and IEEE802.1Qbh :

This involves using Cisco’s VN-Tag or 802.1BR E-Tag

I will discuss each of these in my next posts....

[To Be continued - Next post contains EVB, VEPA, S-Channel]