I like you very much, just as you are...: July 5, 2009

Friday, July 10, 2009

Cloud computing promise still stormy with reliability issues

Even as Google builds its Chrome OS to utilize cloud computing resources efficiently, data centers around the world are still experience reliability issues—and this week was particularly bad. When the cloud dissipates, there's no ray of sunshine for customers that rely on cloud services.

By Chris Foresman, Jon Stokes | Last updated July 9, 2009 11:15 AM CT

Yesterday's announcement of Google's Chrome OS plans were met with plenty of discussion about what it might mean for the future of computing. The OS is essentially a lightweight version of Linux designed to run the company's Chrome browser to access Google's (or other third-party) cloud computing services, such as Gtalk, Gmail, Google Docs, and more. While there are numerous benefits of using such cloud services—like data persistence across multiple machines—what happens when the servers that run those services run into trouble, burn down, or lose power?

Unfortunately, it seems, there aren't any new answers since we examined this issue almost one year ago. In the last week alone, there have been several high profile outages at data centers that host sites, such as video site DailyMotion, credit card authorization service Authorize.net, and Microsoft’s Bing Travel. Even the Google App Engine—a platform for third-parties to run their own cloud services—experienced performance issues that resulted in high latency and even data loss.

Rackspace Hosting, which provides servers that run untold numbers of websites, experienced a power outage Tuesday in its Dallas data center for as long as 45 minutes. Once power was restored, though, it took some sites several hours to come back fully. It was the second such power outage for the company's Dallas data center in just over a week, though it's not particularly common; as TechCrunch noted, the last time the company had a major outage was in November 2007. However, the recent incidents illustrate the problem—there are still risks associated with using the cloud.

Technical risks aren't the only kind that affect cloud customers—what happens if your cloud service provider goes out of business? In this economy, it could conceivably happen.

While these high-profile cloud outages—whether technical or economic in origin—certainly impact consumers, the main problem with them is that a steady drip of cloud failure news greatly increases the anxiety of IT professionals who already have concerns about using cloud services in their own shops.

It hurts to be on the cutting edge
Many of the IT pros who are evaluating cloud services name reliability as a major concern with cloud services, and have been doing so in the Ars forums and in closed-door sessions for over a year now. Many of these folks are at large companies and are used to having control over and responsibility for all of the servers that the business uses, so the idea of putting parts of their business on rented, "black box"-style cloud services makes them uneasy.

Many of the vendors that we hear from either downplay the reliability concerns or offer some version of "it hurts to be on the cutting edge." And with this last response, they do have a point. Often1, the decision to use cloud services instead of in-house systems is a decision that's made for reasons of either cost and flexibility (in the case of SaaS) or development speed (in the case of actual cloud infrastructure).

So the developers in your IT department may love the fact that they can immediately dial up thousands of virtual servers from a cloud provider like Rackspace (the infrastructure-as-a-service tier) or code to an infinitely scalable platform like Google's App Engine (the platform-as-a-service tier), but it's up to IT management to strike the balance between rapidly developing applications at Internet scale and planning for the impact on the business that any cloud-related downtime will have. And the higher up in the cloud stack you go to rent your services, the more vulnerable you are to downtime because you're more locked into that particular provider's solution.

Ultimately, the cloud is here to stay, and cloud outages are here to stay, so enterprise cloud customers will have to carefully balance the downtime risks with the cost and agility benefits of using the different tiers of cloud services (IaaS, PaaS, or SaaS).

Thursday, July 9, 2009

[클라우드 컴퓨팅시대] 클라우드 패키지 제품 `봇물`

용량 확장ㆍ기능 통합 손쉽게

클라우드 컴퓨팅이 IT 메가 트렌드로 주목받고 있는 가운데 일부 업체들은 클라우드 인프라를 손쉽게 구축할 수 있는 패키지 제품을 선보이고 있어 주목된다.

EMC의 스토리지 `시메트릭스 V맥스'는 스케일 아웃(Scale Out) 스토리지 아키텍처가 적용된 첫 제품이다. 스케일 아웃이란 기존의 스토리지 제품이 갖고 있던 확장 한계를 넘어 성능 저하 없이 스토리지를 증설할 수 있는 기술로, 고객의 요구에 따라 방대한 스토리지를 공급해야 하는 클라우드 환경에서는 필수적이다.

가상화된 데이터센터의 경우 지금보다 최소 10배 이상의 무수한 가상서버가 존재할 것으로 예상된다. 이들 서버들이 스토리지를 공유하거나 지금보다 수배에 이르는 스토리지를 필요로 할 경우에도 안정적으로 서비스하기 위해서는 원활한 스토리지 확장이 중요하다.

V맥스는 즉각적인 서버 리소스 할당과 서버간 리소스 로드 밸런싱, 가상서버 관리 등을 지원하고 가상 프로비저닝 기능을 통해 스토리지 용량 활용률을 높일 수 있도록 설계됐다. Flash(SSD), SATA II 등 드라이브 종류와 상관없이 데이터를 장비간 이동할 수 있고 가상 서버와 스토리지 볼륨간 채널의 장애복구 등을 지원하며 서버에서부터 스토리지 연결 부분 그리고 스토리지 자체에까지 가상화 기술을 적용한 것이 특징이다.

한국HP는 서버와 스토리지, 소프트웨어(SW) 등이 통합된 클라우드 어플라이언스 장비 `클라우드 매트릭스'를 선보였다. 내부 클라우드를 구축하거나 테스트 및 개발용 플랫폼 구축, 물리적 서버와 가상 서버의 통합 플랫폼 구축 시장 등을 겨냥하고 있으며 별도의 튜닝 없이 클라우드 서비스를 할 수 있다. HP의 관리 자동화 기술이 적용돼 운영체제 배포, 병목 분석 관리, 서비스 자동화, 서버-스토리지 통합 관리 등을 지원하며 이르면 올해 내에 대기업 시장을 겨냥한 다른 제품도 선보일 예정이다.

이밖에도 국내 벤처기업인 넥스알이 분산 대용량 처리 기술인 하둡 기반의 어플라이언스 제품 시판을 준비하는 등 올 하반기를 기점으로 국내 시장에서 다양한 업체의 클라우드 제품이 잇달아 소개될 전망이다.

전문가들은 이들 제품들이 내부 클라우드(Private Cloud) 시장을 겨냥하고 있다는데 주목하고 있다.

클라우드에 대한 제도적, 문화적인 장벽이 존재하는 상황에서 무리하게 공공 클라우드(Public Cloud) 시장을 여는 대신 자금력이 풍부하면서도 기존 인프라를 개선하려고 하는 대기업 내부 클라우드 시장을 먼저 공략하는 움직임이라는 분석이다. 허주 한국EMC 마케팅 부장은 "클라우드 시장은 국내외를 막론하고 아직 초기 단계에 머물고 있는 것은 사실"이라며 "그러나 구입 후 바로 적용할 수 있는 패키지 제품이 출시되고 있는 것은 클라우드 시장 대중화 측면에서 의미가 있을 것"이라고 말했다.

[클라우드 컴퓨팅시대] `유연한 인프라` 패러다임이 바뀐다

고집적 블레이드 서버ㆍ가상화 기술접목 활발
'스케일 아웃' 방식으로 스토리지 확장성 향상
박상훈 기자 nanugi@dt.co.kr | 입력: 2009-07-08 21:01

■ 클라우드 컴퓨팅시대
(5) 인프라 전략

`빌려쓰는 IT', 이른바 클라우드 컴퓨팅 시대의 도래는 기존의 IT 인프라 제품에도 큰 폭의 변화를 몰고올 전망이다. 고객의 요구에 따라 필요한 시기에 필요한 만큼 IT 인프라를 제공하기 위해서는 유연한 IT 인프라를 구축하는 것이 필수적이다. 서버 분야는 `블레이드'로 대표되는 패러다임이 고성능, 고집적화 바람을 타고 있고 스토리지는 성능 저하 없이 용량을 확장할 수 있는 기술이 등장했다. 네트워크 분야에서도 가상화를 접목한 기술성과가 속속 공개되고 있다.

◇클라우드 서버는 블레이드로 간다= HP는 본사 연구소를 중심으로 `CaaS(Cell as a Service)'란 신제품 개발 프로젝트를 진행 중이다. 이르면 올해 말 선보일 이 제품은 아마존, 구글과 같은 범용적인 클라우드 서비스 이외에 높은 보안성과 안정성을 요구하는 대기업 대상 클라우드 시장에 겨냥하고 있다. CaaS는 서버와 스토리지, 네트워크 등 클라우드 서비스에 필요한 하드웨어를 일괄 제공하는 솔루션으로 알려졌다.

CaaS는 아직 많은 부분이 베일에 쌓여 있지만 블레이드화로 대표되는 HP 서버 기술 로드맵의 현주소를 확인할 수 있게 될 전망이다. 블레이드란 하나 이상의 중앙처리장치(CPU)와 스토리지 등을 탑재할 수 있는 서버로, 데이터센터내 상면 공간을 줄일 수 있어 고밀도 서버라고도 부른다.

블레이드는 클라우드 환경에서 방대한 서버풀이 필요하다는 점에서 가능성이 새롭게 주목받고 있다. 현재는 주로 엔트리급 서버에 대한 블레이드에 초점이 맞춰져 있지만 하이엔드급으로 넘어가면서 블레이드는 CPU와 메모리만을 갖는 순수한 컴퓨팅 머신 역할을 하고 스토리지는 버추얼 커넥션을 통해 연결해 필요에 따라 스토리지를 할당하는 기술도 등장할 것으로 전망되고 있다.

한인종 한국HP 기술자문사업본부 부장은 "기존 블레이드의 장점을 수용하면서 중형급, 혹은 그 이상의 컴퓨팅 파워를 갖는 블레이드도 곧 개발될 것"이라며 "클라우드 서비스에 필수적인 운영체제 선택이나 SW에 따른 컴퓨팅 파워 분산 등 인프라 측면에서의 자유도가 더욱 넓어질 것"이라고 말했다.

◇스토리지도 스케일 아웃으로 새단장= 스토리지 역시 클라우드 컴퓨팅 시대에 맞게 새로운 옷으로 갈아입고 있다. 클라우드 환경에서 스토리지는 CPU 파워처럼 얼마나 동적으로 유연하게 할당할 수 있느냐가 관건이다. 특히 기업들이 전사자원관리(ERP)와 같은 IT 시스템을 도입하면서 기업 데이터량은 기하급수적으로 늘어나고 있다.

기존의 스토리지도 용량 확장이 안되는 것은 아니지만 용량을 계속 확장하면 성능이 줄거나 정체되는 한계가 있다. 컴퓨터의 CPU와 메모리를 늘리는 것처럼 단일 노드에 자원을 추가하기 때문에 한 시스템 내에서 처리할 수 있는 최대 스토리지 컨트롤러 한계를 넘어서면 성능이 오히려 떨어지는 것이다.

이에 따라 최근 새롭게 주목되는 기술이 `스케일 아웃(Scale Out)' 방식의 스토리지다. 마치 새로운 컴퓨터를 붙이는 것처럼 노드 자체를 늘리기 때문에 많은 스토리지를 증설하더라도 성능 저하를 걱정하지 않아도 된다. 그동안 스케일 아웃 방식의 단점으로 지목됐던 독립시스템 간에 리소스 공유와 장애 대비를 위한 통신 문제 등도 EMC가 `가상 매트릭스(Virtual Matrix)' 기술을 적용한 스토리지 제품 `V맥스'를 발표하는 등 대안 찾기 움직임이 활발해지고 있다.

이밖에도 사용자의 요구에 따라 스토리지 용량을 동적으로 할당하는 `신 프로비저닝(thin provisioning)', 데이터 중복제거를 이용한 공간 활용 최적화 등도 클라우드 컴퓨팅 시대를 맞아 스토리지 부문에서 주목되는 기술이다.

◇가상화ㆍ자동화 기술 결합 계속될 듯= 네트워크 부문에서도 클라우드 컴퓨팅을 접목한 기술들이 속속 선을 보이고 있다. 시스코는 가상화 전문업체와 함께 가상 서버 환경에서 사용할 수 있는 가상 스위치와 가상 라우터 기술 개발을 마치고 성과를 잇달아 공개하고 있다. 국내업체인 클루넷은 기존 CDN(Contents Delivery Network)에 클라우드 개념을 접목한 CCN(Cloud Computing Network) 기술을 개발하고 네트워크 대역폭 자체를 클라우드로 제공하는 서비스를 시작했다. 서준호 클루넷 연구소장은 "최근에는 네트워크 장비 업체들도 가상화와 클라우드를 속속 도입하고 있는 추세"라며 "이들 기술들은 기존 하드웨어를 일부 최적화해서 적용할 수 있는 수준까지 발전했다"라고 말했다.

전문가들은 IT 인프라 업체들의 이런 변신이 클라우드 컴퓨팅 시장이 성장함에 따라 더욱 확산될 것으로 전망한다. 한인종 한국HP 기술자문사업본부 부장은 "클라우드 컴퓨팅 인프라의 핵심은 가상화 기술과 자동화 기술의 결합"이라며 "유연하게 IT 자원을 할당, 제공할 수 있는 인프라 측면의 기술들은 진화를 거듭하게 될 것"이라고 말했다.

Wednesday, July 8, 2009

RAC(Real Application Cluster) 데이터베이스

http://blog.naver.com/PostView.nhn?blogId=vigdori&logNo=70048726242

1. GRID의 개념

1.1 GRID의 개념

1) GRID 컴퓨팅의 정의

자세한 정보는 http://blog.naver.com/sbg10?Redirect=Log&logNo=120009432781 참고

그리드 컴퓨팅(Grid Computing)의 핵심 개념은 전화나 전기, 수도 같은 유틸리티로서의 컴퓨팅이다.
사용자는 원하는 때 원하는 만큼 정보나 컴퓨팅 작업을 요청하고 받을 수 있게 된다.

그리드 컴퓨팅은 프로세싱, 네트워크 대역폭 및 스토리지지 용량과 같은 분산된 컴퓨팅 자원을 가상화하여 하나의 시스템 이미지를 만들어 사용자 및 응용 프로그램이 다양한 IT 기능에 완벽하게 접근할수 있도록 지원한다.

쉽게 기업내에 산재해 있는 소형 서버들을 연결해 하나의 커다란 컴퓨터처럼 사용하는 개념으로 생각 할 수 있다.

유틸리티 컴퓨팅은 "클라이언트 측면에서" 본 것이다. "서버 측면에서" 보자면(그리드의 이면을 파고들면), "그리드"는 자원 할당, 정보 공유 그리고 고가용성과 관련된 개념이다.

① 자원할당
- 자원을 요청하고 필요로 하는 누구든지 원하는 것을 얻을 수 있도록 하는 것
- 요청이 없는 동안에 자원의 낭비를 막는 것.

② 정보공유
- 사용자와 어플리케이션이 필요로 하는 정보는 언제 어디서나 필요에 따라 이용할 수 있도록 해주는 것.

③ 고가용성
- 고가용성이란 하나의 노드에 문제가 생긴 경우 다른 노드에서 서비스나 기능을 대신 제공하는 것을 말한다
- 긴 시간동안 지속적으로 운영이 가능한 시스템이나 컴포넌트로 가용성이란 흔히 "100% 가용" 등과 같이 상대적으로 측정되거나
또는 "절대 고장나지 않음" 등과 같이 표현될 수 있다.

2) GRID 컴퓨팅의 필요성

① GRID 컴퓨팅은 소형 서버들을 연결, 고성능을 유지하면서 더욱 많은 자원을 활용 가능 하므로 효율적인 IT 인프라 사용을 위한 좋은 방안이다.

② 기존에 사용하던 시스템을 전환할 필요 없이, 기존 인프라스트럭처로부터 GRID 컴퓨팅으로 전환 가능 하다.

③ GRID 컴퓨팅은 저렴한 가격으로도 기업의 인프라를 효율적으로 활용할 수 있는 최적의 솔루션이다.

3) 오라클 GRID 컴퓨팅
- 저렴한 다수의 컴퓨팅 자원으로 고가의 컴퓨팅 자원보다 더 높은 성능을 구현할 수 있다.

- 그리드는 어떤 컴퓨터든, 자원을 항상 최대 용량 만큼 쓰는 것이 아니기 때문에 쉬고 있는 순간에 다른 업무에 활용할수 있도록 빌려준다는 개념으로도 이해할 수 있다.

- Oracle Database 10g는 Enterprise Grid Computing을 위해 디자인된 최초의 데이터베이스이다.

2 RAC의 개념

2.1 RAC의 정의
- Oracle Real Application clusters(RAC)는 Oracle Parallel Server(OPS)의 후속 제품으로 개발되어 Oracle9i 버전부터 기능을 제공 한다.

- RAC는 동일 데이타베이스(스토리지)를 여러 인스턴스에서 동시에 액세스할 수 있다

- RAC는 시스템 확장이 가능하기 때문에 결함 허용, 로드 밸런싱 및 향상된 성능을 제공한다.

- 모든 노드가 동일한 데이타베이스를 액세스하기 때문에 한 인스턴스에서 장애가 발생해도 데이타베이스에 대한 액세스가 손실되지 않는다.

- Oracle RAC의 핵심은 공유 디스크 하위 시스템이다.

- 클러스터의 모든 노드는 클러스터 내의 모든 노드에 대한 데이타, 리두 로그 파일, 제어 파일 및 매개변수 파일을 액세스할 수 있어야 한다.

- 데이터 디스크는 모든 노드가 데이타베이스를 액세스할 수 있도록 허용하기 위해 전역으로 사용할 수 있어야 한다.

2.2 RAC의 장점

① 확장성(자원(CPU/메모리/디스크등)이 부족했을 경우에 대처 할 수 있는 구조)

- 새로운 업무가 지속적으로 추가되어 서버의 용량이 부족해지는 경우가 발생된다면 클러스터상에 새로운 서버를 유연하게 확장 할 수있고,
서버를 확장 하더라도 문제가 발생하지 않는다.

② 고가용성 (장애가 발생해도 시스템 전체가 운용 될 수 있는 구조)

- 하나의 서버로 구성된 데이터베이스일 경우 데이터베이스 장애가 발생할 경우 복구될때까지 서비스 이용이 불가능 했으나, RAC의 경우에는 하나의 서버에 장애가 발생하더라도, 나머지 서버에서 지속적인 서비스를 제공할 수 있어 서비스의 중지가 발생하지 않는다.

3. RAC 구조

3.1 RAC의 구조

- 물리적인 하나의 데이터베이스를 여러 대의 서버가 공유하여 사용하는 것.

- 모든 서버들은 같은 데이터를 사용하게 되어 논리적으로는 하나의 시스템을 이용하는 것임.

- Cache Fusion 기능을 위해 NODE간 High-Speed Interconnect Network는 필수

reference
- Linux 및 FireWire에 RAC 클러스터 구축 (주)한국오라클
- RAC & Enterprise Manageability Best Practices (주)한국오라클

================================================
* 오라클 정보공유 커뮤니티 oracleclub.com
* http://www.oracleclub.com
* http://www.oramaster.net
* 강좌 작성자 : 김정식 (oramaster _at_ naver.com)
================================================
[출처] [펌] RAC(Real Application Cluster)|작성자 원선

Cloud computing to drive open source

With the cloud computing wave poised to reach the world market in the next 12 to 18 months, open source software and coding techniques are about to hit the big time.

That’s because open source software and the methodologies that accompany it have already been proven to be the chosen route for the vast majority of companies aiming to capitalise on the cloud phenomenon.

For evidence of this, you need look no further than the route companies such as Amazon, Google and Rackspace have taken in building out the massive datacentres they plan to begin selling capacity on in the coming years.

Without fail in each of these examples, open source is either at the core or forms a vital component of what’s on offer. And as cloud computing becomes more a prominent topic, so open source will find greater traction in the market.

The reasons open source is a popular route are not difficult to find. Since the cloud computing players are extremely technically proficient, they have sufficient skill in-house to capitalise on the more open nature of open source – and in doing so, can build a far lower cost solution than what would be on offer from a proprietary technology.

These solutions’ open nature furthermore allow cloud providers to mould and form tools to their own needs, changing and adapting underlying technology rapidly so that extra performance can be eked out of a solution.

Incidentally, cloud companies using open source technologies gain the useful side-effect of adhering to the open standards that the majority of open source solutions subscribe to. This proves to be a great benefit down the line when it comes to integrating disparate line of business systems or solutions providing specific functionality to a business silo.

A number of companies are wondering when exactly cloud computing will hit South Africa, since the topic is becoming an important part of the planning most enterprises in the more developed US or European markets are doing today.

Because of the bandwidth limitations we face locally and despite the arrival of new undersea cables, cloud computing will take on a different form in South Africa to markets where bandwidth is ubiquitously available.

My personal belief is that South African companies will become involved with cloud computing from an internal perspective, building clouds that exist inside their datacentres, but function similarly to clouds located at service providers’ offsite datacentres.

It stands to reason that these customers will need to look at open source technologies just like their outsourced peers, since the level of scalability, customisability and control is just not there in the proprietary world.

For that reason I can’t see why open source won’t go from strength to strength locally over the coming years.

Monday, July 6, 2009

Large Data Set Analysis in the Cloud: Hadoop gets a boost

Traditional business intelligence solutions can't scale to the degree necessary in today's data environment. One solution getting a lot of attention recently: Hadoop, an open-source product inspired by Google's search architecture. Twenty years ago, most companies' data came from fundamental transaction systems: Payroll, ERP, and so on. The amounts of data seemed large, but usually were bounded by well-understood limitations: the overall growth of the company and the growth of the general economy. For those companies that wanted to gain more insight from those systems' data, the related data warehousing systems reflected the underlying systems' structure: regular data schema, smooth growth, well-understood analysis needs. The typical business intelligence constraint was the amount of processing power that could be applied.

Consequently, a great deal of effort went into the data design to restrict the amount of processing required to the available processing power. This led to the now time-honored business intelligence data warehouses: fact tables, dimension tables, star schemas.

Today, the nature of business intelligence is totally changed. Computing is far more widespread throughout the enterprise, leading to many more systems generating data. Companies are on the Internet, generating huge torrents of unstructured data: searches, clickstreams, interactions, and the like. And it's much harder-if not impossible-to forecast what kinds of analytics a company might want to pursue.

Today it might be clickstream patterns through the company website. Tomorrow it might be cross-correlating external blog postings with order patterns. The day after it might be something completely different. And the system bottleneck has shifted. While in the past the problem was how much processing power was available, today the problem is how much data needs to be analyzed. At Internet-scale, a company might be dealing with dozens or hundreds of terabytes. At that size, the number of drives required to hold the data guarantees frequent drive failures. And attempting to centralize the data imposes too much network traffic to conveniently migrate data to processors.

One thing is clear: the traditional business intelligence solutions can't scale to the degree necessary in today's data environment.

Fortunately, several solutions have been developed. One, in particular, has gotten a lot of attention recently: Hadoop. Essentially, Hadoop is an open source product inspired by Google's search architecture. Interestingly, unlike previous open source products that were usually implementations of previously-existing proprietary products, Hadoop has no proprietary predecessor. The innovation in this aspect of big data resides in the open source community, not in a private company.

Hadoop creates a pool of computers, each with a special Hadoop file system. A central master Hadoop node spreads data across each machine in a file structure designed for large block data reads and writes. It uses a clever hash algorithm to cluster data elements that are similar, making processing data sets extremely efficient. For robustness, three copies of all data is kept to ensure that hardware failures do not halt processing.

When it comes time to mine the data, the programmer can

High Performance and Grid Computing in the Cloud

The HPCcloud discussion group has been created in order to address the growing interest in High Performance Computing and Grid Computing in the Cloud. The purpose of this group is to present experiences and scenarios by individuals, organizations and projects to illustrate how Cloud computing can enhance the different types of distributed and high performance computing infrastructures in science and engineering. The group covers the following aspects about innovative potential, benefits and challenges of new Cloud technologies and services in High Performance Computing (HPC) and Grid Computing research and business:
• Cultural, security, political and legal barriers to implementing Cloud provisioning models in HPC and Grid environments
• Architectures for integration of Cloud technologies and services with HPC and Grid infrastructures
• Standardization of interactions between HPC and Grid platforms and Cloud infrastructures
• Limitations of existing Cloud services and technologies for the capability and capacity computing demands of the HPC and Grid communities in the execution of both tightly-coupled HPC and loosely-coupled HTC applications
• HPC Clouds offering platforms with HPC devices and configurations, and Scientific Clouds offering specific services for the scientific and technical computing community
• Impact of virtualization on the performance of memory, CPU and I/O intensive, and latency sensitive applications, and virtualization support for specialized communication transports
• Service and infrastructure scalability and elasticity management for the efficient execution of virtualized HPC and Grid platforms
• Challenges of porting HPC applications to the Cloud and new computing paradigms for HPC on Cloud