Infrastructure as a Service (IaaS) for cloud environments provides compute processing, storage, networks, and other fundamental computing resources. To support multi-tenant cloud environments, IaaS utilizes the various advantages of the virtualization, but con-ventional virtual (overlay) network architectures for IaaS have been a direct cause of scalability limitations in multi-tenant cloud environments. In other words, IaaS’s virtual networks have the limitations due to the problems of high availability and load bal-ancing, etc. To solve these problems, we present EYWA, a virtual network architecture that scales to support huge data centers with high availability, load balancing and large layer-2 semantics. The design of EYWA overcomes the limitations by accommodating (1)a large number of tenants (about 224 = 16,777,216) by using virtual LANs such as logically isolated network with its own IP range in the cloud service providers’ view, and providing (2)public network service per tenant without throughput bottleneck and single point of failure (SPOF) on Source and Destination Network Address Translation (SNAT/DNAT) and (3)a single large IP subnet per tenant by using large layer-2 semantics in the consumers’ view. EYWA combines existing techniques into a decentralized scale-out control and data plane. The only component of EYWA is an agent in every hypervisor host that can control packets and the agents act as distributed controller. As a result, EYWA can be deployed into all the multi-tenant cloud environments today.
4. 3/58EYWA
Contents
1. The conventional Network
Architectures
① Physical Networks
② Virtual (Overlay) Networks
2. The Architecture using Legacy
Protocols
3. EYWA
4. Summary
5. Scenarios (Traffic Flows)
5. 4/58EYWA
① Physical Networks
The conventional Data center
Problems & Limitations
New Data center Network Architectures
The Comparison of the new Architectures
Monsoon
Etc
7. 6/58EYWA
Confidential
Problems & Limitations
1. Fragmentation of Resources
2. No Performance Isolation
3. Poor server to server connectivity
4. Need very high reliability near top of the tree (Single Point of Failure)
70~80% of the packets stay inside the data center
8. 7/58EYWA
Confidential
The Comparison of the new DC Network Architectures
Monsoon VL2 SEATTLE FAT-TREE PortLand SPAIN MOOSE TRILL Dcell Bcube MDCube
Org. MS Research
Univ. of
Princeton
Univ. of California
San Diego
HP
Univ. of
Cambrid
ge
MS Research Asia
Publishing
SIGCOMM
2008
SIGCOMM
2009
SIGCOMM
2008
SIGCOMM
2008
SIGCOMM
2009
NSDI 2010
DC CAVE
S Works
hop
2009
RFC 5556
2009
SIGCOMM
2008
SIGCOMM
2009
CoNEXT
2009
Authors
Albert
Greenberg…
Albert
Greenberg,
Changhoon…
Changhoon
Kim…
M. Al-Fares…
R.N.
Mysore…
J. Mudigon
da,
M. Al-Fare
s…
M. Scott
…
Radia
Perlman
C. GUO… C. GUO…
H. Wu,
C. GUO…
Topology Clos Network Clos Network N/A Fat-Tree Fat-Tree N/A N/A N/A
Bcube Topo
logy
Packetizing
MAC-in-MAC
(802.1ah PBB)
IP-in-IP IP-in-IP(?) IP rewriting
MAC
rewriting
(PMAC)
MAC
rewriting
TRILL Hdr
Load
Spreading
MAC-Rotation ECMP ECMP ECMP ECMP
Multi-path O O X O O O X O
Mod. of
End-Host?
O O X X X O X X O
Mod. of
switches?
O X O
O
(Special HW)
O
(Special
HW)
X
O
(Rbridge)
△
ARP
Directory
Server
Directory
Server
DHT
on
the switches
Fabric
Manager
ESADI
12. 11/58EYWA
Confidential
Problems & Limitations
1. Public Networks
① High Availability
SPOF of a single (Virtual or Physical) router
② Load Sharing & Balancing
The throughput bottleneck of a single router (SNAT/DNAT)
Scale-up (Physical Router)
Additional layer-4 load balancing service like AWS ELB
or additional physical load balancer (scale-up)
③ Traffic Engineering
Additional waste of network bandwidth
Increased latency to traverse a single router
2. Private Networks
① Layer 2 network is hard to scale out
Broadcast
MAC Flooding
STP
② VLAN (802.1Q) limit
VLAN ID limit of 4,094
3. Cost & Scalability
13. 12/58EYWA
Confidential
1. Classic(no VPC) Problems
① No Private network per Tenant
Higher latency than L2
② No Traffic Isolation per Tenant
Shared GW with other tenants
2. ELB Limitations
① Layer-4 load balancing only
② TCP only
Port 25, 80, 443 & 1024~65535
③ Domain name only
No Static IP & No interoperating with GSLB & Firewall
AWS - Classic & ELB
14. 13/58EYWA
Confidential
3. VPC Limitations
① B-Class(65,536)
Smaller layer 2 subnetting
② Internet Gateway Bottleneck
③ The limited number of VPC
④ ELB consumes Private IP
⑤ Not extensible L2 network over VPN
AWS - VPC
15. 14/58EYWA
Confidential
CloudStack - Traffic Flows
Tenant A
VM
Tenant B
VM
Tenant-A Public Traffic (Remote VR)
Tenant-A Public Traffic (Local VR)
Tenant-A Private Traffic
Tenant A
VR
Tenant B
VR
VR-B
VR-A
INTERNET
16. 15/58EYWA
Confidential
1. RVM (Router VM): HAProxy-based
① Feature
Port Forwarding & layer-4 load balancing by RVM
② Algorithm
Round robin & Least Connection
③ Advantage
Load balancing without additional component
④ Limitations
RVM Bottleneck
Layer-4 load balancing only
2. VPX
① Feature
Layer-4 load balancing by VPX device
② Algorithm
Round robin & Least Connection
③ Advantage
Performance
④ Limitations
Additional HW
No Scalable
Layer-4 load balancing only
CloudStack
17. 16/58EYWA
Confidential
OpenStack
1. Problems
① Dedicated Network Node & Network Node Bottleneck
② By Gartner: OpenStack in the enterprise? Ha ha ha, you must be joking.
The difference between Amazon and OpenStack, though, is that Amazon's core services such as EC2,
S3, and others are stable, while some of OpenStack's core tech such as its Neutron networking
layer are very, very weak.
http://www.theregister.co.uk/2013/11/20/gartner_openstack_criticism/
③ By HP: OpenStack’s networking nightmare Neutron was everyone’s fault
http://www.theregister.co.uk/2014/05/13/openstack_neutron_explainer/
18. 17/58EYWA
Confidential
OpenStack - Neutron/DVR
1. DVR (Distributed Virtual Router)
① https://wiki.openstack.org/wiki/Neutron/DVR
② Limitations
http://www.slideshare.net/carlbaldwin/dvr-slides
SNAT is centralized in Network Node.
19. 18/58EYWA
The Architecture using Legacy Protocols
1. Traffic Flows of Architecture
2. Advantages & Limitations
3. (M)VRPP
4. VxLAN
21. 20/58EYWA
Confidential
Advantages
1. Public Network
① Load Sharing & Balancing
Load Balancing by 254 VRs per tenant
Scalable
For Inbound & Outbound Traffic
② High Availability
HA by 254 VRs per tenant
③ Traffic Engineering
Save of Network Bandwidth
Low latency
2. Private Network
① A large number of tenants
Traffic Isolation by VxLAN (2^24 = 16,777,216)
② Large layer 2 network
VxLAN
Multicast instead of Broadcast (VxLAN)
Decrease in MAC Flooding
22. 21/58EYWA
Confidential
Limitations
1. Public Network
① Performance
Limited Scalability: Maximum 254 VRs per tenant
② Load Sharing
Primary back VR’s bottleneck when takeover
VM’s different default GW IP addresses
③ Traffic Engineering
Waste of Network Bandwidth for advertisement packets
VM’s Inefficient default GW
2. Private Network
① Large layer 2 network
ARP Broadcasting
② Each VR consumes the private IP address pool
26. 25/58EYWA
The EYWA
1. Traffic Flows
2. Advantages
3. The Architecture
4. Demo
5. The Comparison of the new
Architectures
6. Summary
27. 26/58EYWA
Confidential
Traffic Flows on EYWA
Tenant 1
VM
Tenant 2
VM
Tenant-1 Public Traffic (Orphan)
Tenant-1 Public Traffic (Normal)
Tenant-1 Private Traffic
Tenant 1
VR
Tenant 2
VRVR-1-1
VR-1-2
VR-2-2
VR-2-1
INTERNET
28. 27/58EYWA
Confidential
Advantages
1. Public Network
① Load Sharing & Balancing
Load Balancing by Unlimited VRs
Scalable
Load balanced Inbound & Outbound Traffic
② High Availability
Active-Active structure by Unlimited VRs
③ Traffic Engineering
Save of Network Bandwidth
Low latency
2. Private Network
① A large number of tenants
Traffic Isolation by VXLAN (16,777,216)
② Large layer 2 network
VxLAN
Multicast instead of Broadcast (VxLAN)
Decrease in MAC Flooding
Agent
Eliminate Broadcast
3. VM Migration
4. No added Server & H/W
29. 28/58EYWA
Confidential
The Architecture of EYWA
111.2.3.11
10.0.0.21 10.0.0.22
111.2.3.15 111.2.3.111
111.2.3.112
10.0.0.2 10.0.0.3 10.0.0.21
...
10.0.0.20
XXX
...
10.0.0.22 10.0.0.23
111.2.3.113
10.0.0.23
111.2.3.13
VR(Virtual Router) VR-{Tenant}-1 VR-{Tenant}-2(~Unlimited)
Function NAT, LB, VPN, DHCP NAT, LB, VPN
Internet
Public-IP
for VR
Public-IP
for VM
Tenant-A VLAN
Tenant-B VLAN
Tenant A Tenant B
VR-A-1 VR-A-2
GW: 10.0.0.1
VM VM VM VM VM VM
VR-B-1
GW: 10.0.0.1
VR-B-2
GW: 10.0.0.1
VM VM VM
GW: 10.0.0.1
111.2.3.13
HA & LB & Scalable
X
Normal
mode
Orphan
mode
30. 29/58EYWA
Confidential
The Architecture of EYWA
VSe (Common)
VR-A-1
111.2.3.11
10.0.0.1/8
VSi
A
vport-A
VM
VSe (Common) VSe (Common)
Switch (Public Net.)
Switch (Private Net.)
VR-A-2
111.2.3.12
10.0.0.1/8
VR-B-1
111.2.3.21
10.0.0.1/8
VSi
A
vport-A
VM
VSi
B
vport-B
VM
vtep-A vtep-A vtep-B
vnet0
peth0
vnet0
peth0
vnet0
peth0
eth0
eth1
eth0
eth1
eth0
eth1
vnet1
vnet2
peth1 peth1
VSi
A
vport-A
VM
vtep-A
Tenant A
Tenant B
VR-B-2
VSi
B
vport-B
VM
eth0
eth1
111.2.3.22
10.0.0.1/8
AgentAgent ControllerAgent
Orphan Normal
Normal Normal
eth0eth0
vnet1
eth0
vnet3
eth0
vnet1
eth0
vnet2
peth1
vtep-B
32. 31/58EYWA
Confidential
The Comparison of the Virtual Network Architectures
Component AWS OpenStack CloudStack EYWA
Tenants
(Virtual LAN)
The number of
tenants
?
(5 VPC per tenant in a
region)
224 = 16,777,216 224 = 16,777,216 224 = 16,777,216
Public
Network
Outbound
Router per tenant
One per VPC
(Internet GW)
1
(L3 Agent)
1
(RVM)
Unlimited (VRs)
Router
Deployment
?
All VRs on a
single server
All VRs on
distributed
servers
All VRs on
distributed
servers
HA of Router ? Active-Standby Active-Standby High Available
LB of Router X X X O
SPOF
or
Bottleneck
Internet GW
L3 Agent &
Network
Controller
RVM Nothing
Inbound
LB ELB LBaaS RVM or VPX Unlimited VRs
SPOF
or
Bottleneck
ELB(Default Limit 20) Network Node RVM or VPX Nothing
Private Network Layer 2 Network
Medium
Under B-Class
Small Private
Network
Small Private
Network
Large
A-Class
224 = 16,777,216
Router & LB Internet GW & ELB
L3 Agent &
LBaaS
All-in-one
(RVM)
or
added LB(VPX)
All-in-one
(VRs)
33. 32/58EYWA
Confidential
Summary
1. Advantages
① Cloud Service providers’ view
A large number of tenants (about 224 = 16,777,216) by using virtual LANs
② Consumers’ view
Public network service per tenant without throughput bottle-neck and SPOF on
SNAT/DNAT
Private network (a single large IP subnet) per tenant
2. Architecture
① Scales to support huge data centers with high availability, load balancing and large layer-2
semantics.
② Decentralized scale-out control and data plane.
③ The only component is an agent in every hypervisor host and the agents act as distributed
controller.
3. EYWA can be deployed into all the multi-tenant cloud environments today.