基于.NET的大型Web站点StackOverflow架构分析

原文链接:Stack Overflow Architecture Update - Now At 95 Million Page Views A Month

编译/博客园

Stack Overflow网址:http://stackoverflow.com/

当前访问量:每月9500PV(每天300多万PV)

当前Alexa排名:149

所用.NET技术:C#、Visual Studio 2010 Team Suite、ASP.NET 4、ASP.NET MVC 3、Razor、LINQ to SQL+raw SQL

下面是英文原文:

A lot has happened since my first article on the Stack Overflow Architecture(2009-8-5). Contrary to the theme of that last article, which lavished attention on Stack Overflow's dedication to a scale-up strategy, Stack Overflow has both grown up and out in the last few years.

自从2009年8月发布了第一篇关于“Stack Overflow 架构”方面的文章,Stack Overflow已经发生了很大的变化。那篇文章更多关注的是Stack Overflow如何解决网站的扩展性(scale-up)问题,而经过几年的发展,Stack Overflow已经长大成人,成长为了大型网站。

Stack Overflow has grown up by more then doubling in size to over 16 million users and multiplying its number of page views nearly 6 times to 95 million page views a month.  

现在与2009年相比,Stack Overflow每月独立访问用户翻了一倍,超过1600万;每月PV翻了近6倍,达到9500万。

Stack Overflow has grown out by expanding into the Stack Exchange NETwork, which includes Stack Overflow, Server Fault, and Super User for a grand total of 43 different sites. That's a lot of fruitful multiplying going on.

Stack Overflow新增了很多站点,比如Server Fault, Super User等,共有43个不同站点组成了Stack Exchange NETwork,可谓硕果累累,迅猛增长。

What hasn't changed is Stack Overflow's openness about what they are doing. And that's what prompted this update. A recent series of posts talks a lot about how they've been handling their growth: Stack Exchange’s Architecture in Bullet PointsStack Overflow’s New York Data CenterDesigning For Scalability of Management and Fault Tolerance, Stack Overflow Search — Now 81% LessStack Overflow NETwork Configuration, Does StackOverflow use caching and if so, how?Which tools and technologies build the Stack Exchange NETwork?.

Stack Overflow的变化翻天覆地,而不变的是他们开放的心态,所以才有了这篇架构分享的文章。最近,他们写了一系列文章分享他们如何应对这样的快速增长。

Some of the more obvious differences across time are:
穿越时空,我们来看看有哪些明显的变化?

  • Just More. More users, more page views, more datacenters, more sites, more developers, more operating systems, more databases, more machines. Just a lot more of more.
    更多:更多的用户,更多的PV,更多的数据中心,更多的站点,更多的开发者,更多的操作系统,更多的数据库,更多的服务器...
  • Linux. Stack Overflow was known for their Windows stack, now they are using a lot more Linux machines for HAProxy, Redis, Bacula, Nagios, logs, and routers. All support functions seem to be handled by Linux, which has required the development of parallel release processes.
    Linux:Stack Overflow因使用Windows系统而著称,现在他们使用越来越多的Linux服务器,比如HAProxy(负载均衡), Redis(NoSQL数据库), Bacula(数据备份系统), Nagios(远程监控软件), 日志, 路由器都运行于Linux系统,几乎所有需要并行处理的功能都是由Linux处理(这句话的翻译可能不准确)。
  • Fault Tolerance. Stack Overflow is now being served by two different switches on two different interNET connections, they've added redundant machines, and some functions have moved to a second datacenter.
    容错:Stack Overflow使用了两条不同的互联网线路,增加了更多的冗余服务器,将一些网站服务运行于第二个数据中心。
  • NoSQL. Redis is now used as a caching layer for the entire NETwork. There wasn't a separate caching tier before so this a big change, as is using a NoSQL database on Linux.
    NoSQL:Redis作为整个网站的缓存层。这是一个巨大的改变,以前并没有将缓存作为一个独立的层分离出来。Redis运行于Linux。

Unfortunately, I couldn't find any coverage on some of the open questions I had last time, like how they were going to deal with multi-tenancy across so many diffrent properties, but there's still plenty to learn from. Here's a roll up a few different sources:

遗憾的是,一些我关注的问题并没有从中找到答案,比如面对这么多不同的系统,如何解决多租户的问题(Multi-tenancy 是一种软件体系结构,在这种体系结构中软件运行在 software as a service 服务商的服务器上,服务于多个客户组织即 tenant)。但是,从中我们依然可以学到很多。下面是收集的一些数据列表:

The Stats

  • 95 Million Page Views a Month
  • 800 HTTP requests a second
  • 180 DNS requests a second
  • 55 Megabits per second
  • 16 Million Users  - Traffic to Stack Overflow grew 131% in 2010, to 16.6 million global monthly uniques. 

Data Centers

  • 1 Rack with Peak InterNET in OR (Hosts our chat and Data Explorer)
  • 2 Racks with Peer 1 in NY (Hosts the rest of the Stack Exchange NETwork)

Hardware

  • 10 Dell R610 IIS web servers (3 dedicated to Stack Overflow):
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz Quad Core with 8 threads
    • 16 GB RAM
    • Windows Server 2008 R2
  • 2 Dell R710 database servers:
    • 2x Intel Xeon Processor X5680 @ 3.33 GHz
    • 64 GB RAM
    • 8 spindles
    • SQL Server 2008 R2
  • 2 Dell R610 HAProxy servers:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 4 GB RAM
    • Ubuntu Server
  • 2 Dell R610 Redis servers:
    • 2x Intel Xeon Processor E5640 @ 2.66 GHz
    • 16 GB RAM
    • CentOS
  • 1 Dell R610 Linux backup server running Bacula:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 32 GB RAM
  • 1 Dell R610 Linux management server for Nagios and logs:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 32 GB RAM
  • 2 Dell R610 VMWare ESXi domain controllers:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 16 GB RAM
  • 2 Linux routers
  • 5 Dell Power Connect switches

Dev Tools

  • C#: Language
  • Visual Studio 2010 Team SuiteIDE
  • Microsoft ASP.NET (version 4.0)Framework
  • ASP.NET MVC 3Web Framework
  • RazorView Engine
  • jQuery 1.4.2Browser Framework:
  • LINQ to SQL, some raw SQLData Access Layer
  • Mercurial and KilnSource Control(分布式版本控制系统)
  • Beyond Compare 3: Compare Tool(文件比较工具)

Software and Technologies Used

  • Stack Overflow uses a WISC stack via BizSpark
  • Windows Server 2008 R2 x64: Operating System
  • SQL Server 2008 R2 running Microsoft Windows Server 2008 Enterprise Edition x64: Database
  • Ubuntu Server
  • CentOS
  • IIS 7.0: Web Server
  • HAProxy: for load balancing(高性能的负载TCP/HTTP均衡器)
  • Redis: used as the distributed caching layer.(作为分布式缓存层的NoSQL数据库)
  • CruiseControl.NET: for builds and automated deployment(.NET平台的持续集成工具)
  • Lucene.NET:  for search
  • Bacula: for backups(开源的数据备份系统)
  • Nagios: (with n2rrd and drraw plugins) for monitoring(监视系统运行状态和网络信息的远程监控软件)
  • Splunk: for logs(日志分析工具)
  • SQL Monitor: from Red Gate - for SQL Server monitoring
  • Bind: for DNS
  • Rovio:  a little robot (a real robot) allowing remote developers to visit the office “virtually.”
  • Pingdom:  an external monitor and alert service.(网站监控服务及网站速度测试工具)

External Bits

Code that is not included as part of the development tools:

  • reCAPTCHA(用于验证码验证,已被Google收购)
  • DotNETOpenId(.NET 平台上的 OpenID 实现方案)
  • WMD - Now developed as open source. See github NETwork graph (轻量级所见即所得编辑器)
  • Prettify(代码高亮显示)
  • Google Analytics
  • Cruise Control .NET
  • HAProxy(负载均衡)
  • Cacti网络流量监测图形分析工具)
  • MarkdownSharp(Markdown文本处理器的C#实现)
  • Flot(基于JQuery的纯JavaScript实现的绘图库)
  • Nginx(反向代理服务器
  • Kiln(分布式版本控制系统)
  • CDN: none, all static content is served off the sstatic.NET, which is a fast, cookieless domain intended for static content delivered to the Stack Exchange family of websites.
    (没有使用CDN,用一个专门的域名sstatic.NET传递所有的静态内容)

Developers and System Administrators

  • 14 Developers
  • 2 System Administrators

Content

  • License: Creative Commons Attribution-Share Alike 2.5 Generic
  • Standards: OpenSearch, Atom
  • Host: PEAK InterNET

    NET技术基于.NET的大型Web站点StackOverflow架构分析,转载需保留来源!

    郑重声明:本文版权归原作者所有,转载文章仅为传播更多信息之目的,如作者信息标记有误,请第一时间联系我们修改或删除,多谢。