Storage High Availability – the Achilles heel of server clusters
December 28th, 2012
Let me start by saying that storage high availability is not a feature that every business is going to need in their computing infrastructure. Fault tolerant systems in general are arguably unimportant to businesses that can tolerate one or two days of downtime. But without fault tolerance, you better have a well-tested and documented recovery procedure . In fact, a great recovery procedure can probably cut the downtime from storage system failure to 8 hours or less. But in the real world, few SMB’s have a great, well-tested, and documented recovery procedure. Also, in spite of the fact that there are a lot of questionable disaster recovery statistics out there, the fact remains that businesses are remarkably dependent on their computing infrastructure for survival. So it makes sense for businesses to take a close look at the real impact a computing service outage will have. In other words, I am going to be pretty skeptical of businesses that think they can go two days without e-mail for example. So I am going to assume that some level of fault tolerance matters to most businesses.
In this article I want to focus on a common scenario: businesses that have a server cluster that lacks highly available storage. Now I am not talking about using RAID that protects against disk failure, or SAN boxes with redundant controllers and redundant power supplies. I am talking about removing the entire SAN as a single point of failure. Because in the real world, it will be the one non-fault tolerant component (backplane?) that will fail.
The gold standard for larger businesses is probably to have all storage mirrored to a secondary NetApp or EqualLogic box. But that can get pricey. Microsoft is pushing it’s own Scale Out File Server in Server 2012, and that looks like it will offer a more robust and possibly cheaper alternative to the traditional SAN. But right now, there are few enclosures that meet the requirements for true high availability. (SCSI Enclosure Services v.3)
One cheap solution that we have rolled out in the past for virtualization clusters is to synchronize all the virtual machines to a secondary server with plenty of storage using a tool like vReplicator or Veeam. If the SAN fails, you can fire up a recent version of the production VMs on this machine with reduced performance.
A more expensive solution that we have worked with was provided by nScaled and involves using a Falconstor agent installed on each VM or physical server. Disk snapshots are replicated to a custom-built ESXi server with plenty of storage and then replicated from there up to the cloud. That is a very robust solution that allows for either local or cloud recovery. But it’s pricey.
So another solution we are working on in the lab right now is to build out Linux servers running infiniband and DRBD as a cheap, highly available SAN alternative. It’s true that snapshotting will be tricky and this definitely isn’t an off-the-shelf solution with a friendly GUI for administration. But what it lacks in polish it more than makes up for in raw functionality. We can provide fully fault tolerant storage for a fraction of the cost of even a single enterprise SAN. Building out and expanding storage is cheaper as well because we can use after-market drives instead of paying the outrageous prices NetApp or Dell charge for hard drives. SSDs even become a viable option for a broader range of applications at this price point.
I hope that I have shown that there are a broad range of highly available storage solutions to match practically any budget. I strongly advice all businesses to consider adding this feature to their computing systems architecture. Please feel free to contact us at info@globalizenetworks.com if you want to discuss your own challenges.
I recently read this TechRepublic blog post about how GE is restructuring it’s IT practices to revitalize American manufacturing. GE’s new process is borrowing ideas from Agile development and Gemba among others. One way to think of this approach is to break a big project into smaller parts and deploy incremental solutions that move toward the ultimate business goals. That is the Agile part. The Gemba (“real place”) component involved breaking down the barriers between IT, management, and the end-users:
They created a mission control room for the ERP+ project and co-located the IT people, the business stakeholders, and the employees who would eventually be operating the software all in the same room.
This sounds like a great approach and really one could argue that most small and medium businesses (SMB) already have many of the advantages that Gemba provides. However, one IT trend that seems to run counter to this approach is the outsourcing and commodification of IT services such as that seen in the managed service provider (MSP) model. It seems to me that MSPs can’t afford to iterate an IT solution, their economy of scale is only achieved when they implement a standard platform across a wide range of clients. So this might be a case where a small, traditional IT services provider can provide more value. A company like Globalize can come to your place of business and work side by side with your management and end-users to develop incremental improvements to your existing infrastructure. We can hear right away what is and is not providing real value to your business and “fail fast to succeed sooner.” This story illustrates that IT is becoming more and more important to the success of businesses everywhere. Businesses with an IT strategy that is integrated with their business plan and actual execution will probably be the most competitive at the end of the day.
Dedicated hardware hosting for test labs
November 30th, 2012
It’s essential to research a new technology before implementing it. You need to read the documentation and scan the blogs and forums for gotchas. Then you need to make a plan. But ideally you would test your plan in a test lab before implementing anything in production.
I know from experience that not all organizations have enough old hardware or extra internet connections lying around to create a proper lab environment. Also, many IT groups are constrained for time and money.
We will be offering access to our new hosted infrastructure for companies that want access to an external lab environment with dedicated hardware. Unlike Amazon or the other commodity cloud providers, we are primarily a services company. So we are here to help design and implement your validation testing. Also, test environments with multiple subnets will be much easier to implement in a dedicated hosting environment such as ours, especially if your team lacks extensive AWS experience. All of these commodity cloud offerings are hermetically sealed to some degree. It generally difficult and sometime impossible to reconfigure these platforms to meet non-standard requirements. We offer value to clients that need more freedom to customize their environments.
I am not against public cloud services like AWS. The cloud will certainly become the foundation of all IT services as time goes on. However, there is a rocky path between that bright future and the current situation faced by most companies today. There is still a lot of uncertainty about the best cloud strategies. Microsoft, VMWare, and Rackspace are all offering competitive cloud platforms that provide different benefits for different use cases. Most companies want to take a staged approach. Our new offerings can help you do just that by trying out various cloud integration scenarios without impacting production environments.
Once we have our own redundancy and high-availability in place, we will be offering more services such as private cloud hosting. Contact us for more info: info@globalizenetworks.com
Amazon Glacier will be the future of online backups
November 14th, 2012
I was pretty excited when Amazon first announced it’s new Glacier product. This offering allows archiving of data at $0.01 per GB per month with a 3-5 hour recovery time. Now they have added a feature to automatically archive S3 storage to glacier based on rules set by admins. I haven’t used it yet, but it seems that CloudBerry will be first tool for SMB’s to check out when exploring online backup. For $79.99, their Server offering seems to be a great deal. We will be testing this in our new lab at Hurricane Electric soon.
New Offering: Windows Server 2012 Hyper-V Replication hosting
November 14th, 2012
Microsoft has added a lot of compelling features to Hyper-V in Server 2012 which make it very competitive with VMWare’s (industry standard) vSphere product. Ars Technica provided a good general writeup of the third generation Hyper-V here. Microsoft MVP Aidan Finn is my favorite technical authority on Hyper-V and his comparison between 2012 Hyper-V and vSphere 5.1 offers a vicious tear-down of the VMWare platform. To be fair, some of the features he specifies as being available in vSphere enterprise are also available in cheaper versions of vSphere. But overall, it appears that Hyper-V is actually surpassing VMWare in many areas and all of these features are included in the FREE version of Hyper-V Server.
Some of my favorite Hyper-V 2012 features include:
Shared-Nothing Live Migration – allows us to move VM’s between Hyper-V hosts without being part of a cluster or using shared storage. (Nice for maintenance or manual load rebalancing.)
Replica – replicate VMs over the WAN for Disaster Recovery
We are working on an offering to provide Hyper-V VM replication services to small businesses that want to have site redundancy in case of disaster. FEMA estimates 40% of businesses are unable to rebuild post-disaster. We are now offering a pilot program with special pricing for clients that are interested. Contact me for more details: info@globalizenetworks.com.
IE 9 is the most secure browser all of a sudden?
September 29th, 2012
Internet Explorer got a black eye last week from a widely exploited zero-day vulnerability but a recent NSS report shows that IE 9 does a better job of blocking malware than the other major browsers. It looks like the key to Microsoft’s success is their SmartScreen technology which basically maintains a blacklist of bad URL’s.
https://www.nsslabs.com/reports/your-browser-putting-you-risk-part-1-general-malware-blocking
This flies in the face of conventional wisdom right now which states that Chrome is the most secure browser. I don’t think that I will switch from Chrome on my Mac to IE 9 on a VM yet. But I will feel more comfortable using IE 9 or 10 to browse while I am running a Windows environment.
Further analysis is available here:
BlackHat/Defcon 2012 takeaways
July 31st, 2012
BlackHat/DefCon 2012 Report (Draft)
July 30th 2012
Scott@globalizenetworks.com
Abstract: I attended my first Black Hat/DefCon conference this year and it was pretty amazing. I spoke with a bunch of great people ranging from senior security managers to pentesters and other sysadmins. Many of the people at the conference were dedicated security specialists which really means that they operate in the enterprise and government space. I didn’t really meet any other people working with small and medium sized businesses. This report will focus on the issues most relevant to SMBs.
I will start with a skeletal outline of the top issues from talks that I personally attended. In the coming weeks I will flesh out each topic with dedicated articles and I will add in analysis of other topics from internet sources.
Outline:
- Java
- Problem
- Java is being attacked a lot these days (J. Oh BH2012). 83% of successful BlackHole exploits are Java/Windows7 and BlackHole is the number one web exploit toolkit. (J. Jones BH2012). Sometimes business critical Java applications require older versions of Java.
- Immediate action
- Patch or disable Java where possible.
- Consider terminal services where we can’t patch or disable Java.
- Project work
- Reduce browser attack surface overall.
- flash blocking and ad blocking
- javascript restriction
- restrict plugins and addons
- Patch management for 3rd party software including browser plugins.
- Reduce browser attack surface overall.
- Problem
- NTLM
- Problem
- NTLM is a weak, deeply broken authentication protocol which is still widely deployed and difficult to remove from more complex Microsoft environments. NTLM is receiving renewed attention from the security community. (Duckwall & Campbell BH2012, Z. Fasel DC2012). New tools (ZackAttack, WCE) simplify the exploitation of NTLM. Zack Fasel pointed out that it took FireSheep to force wider SSL adoption by reducing the technical skills required to intercept HTTP traffic. He wants to speed the removal of NTLM from corporate networks by providing a similar tool for NTLM which could supposedly work even externally (if 445 is allowed outbound).
- Immediate action
- Block tcp port 445 outbound on the firewall.
- Project work
- On the horizon
- Blocking NTLM where possible (if possible).
- Problem
- Intrusion Detection
- Problem – John “Four” Flynn gave an excellent talk at BH2012 on how ineffective current Intrusion Detection Systems are. He suggested using the kill chain conceptto improve IDS: “If you can figure out a way to tag each event with the stage they are part of, you can stack the events in a way that lets you analyze them in terms of potentially part of a kill chain.”Verizon’s 2012 Data Breach Investigations Report shows that only 5% of data breaches are detected by internal mechanisms. Mandiant’s 2012 M-Trends report reveals a similar percentage. The good news for SMBs here is that only 30% of attacks are aimed at organizations with less than 1000 employees (per Verizon’s report cited above.)
- Immediate action
- Understand the limitations of IDS and start treating internal networks as hostile.
- Project work
- begin collecting and aggregating more data
- Snort
- Splunk
- Nagios?
- Consider projectnova.org to hinder and possibly detect the internal Reconnaissance step of the kill chain (this deserves another article for sure.)
- Start building alerts that correlate low priority signals based on their relevance to kill chain steps.
- begin collecting and aggregating more data
- On the horizon
- New sources to categorize network and host signals into kill chain buckets
- Hardware backdoor
- Problem – J. Brossard gave a presentation at DC2012 of his hardware backdoor PoC Rakshasa. This was a very plausible exploit whereby the bios of a computer is replaced with an open source stack of Coreboot, SeaBios, and iPXE. The payload would be loaded at boot time over the internet , possibly using an ad hoc wifi or wimax to completely bypass internal networks and IDS. (i.e. connecting to attacker SSID broadcast from parking lot) This sort of exploit would be injected directly into memory and would never touch the disk making it difficult to detect. This particular PoC is of limited scope in that the most recent Intel chipset supported by Coreboot is 5 years old. However, it does show that this is a real threat.Brossard does not believe that this can be done perfectly without simultaneously flashing them all with hardware firmware equipment. However, in this imperfect world, using a floppy or boot CD to flash one BIOS or firmware at a time is probably the only practical precaution we can take. Brossard also thinks using open source bios is the way to go since you can examine the code. This would be a great idea if we could dedicate a team of software engineers to reviewing the code line by line. Then again, think of the help desk nightmare of system failures caused by non-standard BIOS problems. I guess Brossard probably never had to do tech support.
- Immediate action
- Routinely flash all system BIOS and PCI firmware with latest versions. Project work
- Make BIOS updates part of regular system maintenance for servers and clients.
- SOHO routers
- Problem – A lot of attention at BlackHat/DefCon 2012 was focused on compromising SOHO routers (Cutlip DC2012, Purviance & Brashars BH2012) Purviance & Brashars demoed an unlikely scenario in which javacript was used to locate the SOHO router used as a default gateway, crack the password, and then replace the firmware with DD-WRT. Now if you have played with DD-WRT you know how fiddly it is to get working: you need to match the image with the hardware version, unplug the power, etc.So this specific demo wasn’t too convincing. However, it seemed that a lot of the talks were shying away from really cutting edge exploits. Arguably there is too much money to be made by keeping the best secrets to yourself. Therefore, it’s safe to assume that they or someone else has a more reliable version of this exploit and that it will start to appear soon.I don’t really love the idea of a hacker owning the home routers of my corporate users. I also would find it distasteful and deeply shocking to discover any of these SOHO routers in place on any corporate network that I help manage. But crazier things have happened.
- Immediate action
- Audit corporate networks for SOHO routers.
- Project work
- Check and update the firmware of user’s home routers
- On the horizon
- It might actually make more sense to build out a hardened DD-WRT image, deploy it to a standardized hardware router, and provide this to corporate users for use at home.
- VMWare
- Problem
- Weaknesses are being exposed in VMWare’s vSphere suite. Some of the fixes are incomplete. Researchers are picking around the edges of existing patches and finding similar vulnerabilities in other parts of the suite. Old problems like directory traversal are still being exposed in web apps. (A.Minozhenko DC2012)
- Immediate Action
- Fully patch vCenter.
- Project work
- Remove or disable unused packages to reduce attack surface
- Problem
- Change your Yahoo password ASAP
- Never use the same password on more than one site.
- “Yeah right, how do I keep track of all those passwords?”
- Use a password manager like KeePass or 1Password. They are incredibly convenient and allow you to not reuse passwords. Consider this, what if you used the same password for your online banking account as a compromised Yahoo.com password. When the hackers go fishing on other sites with all of their stolen credentials, they might end up logging into your bank account. How about that?
The MSP model is basically a commodity play
August 26th, 2011
Who wants to be a commodity?
I have struggled with the MSP model since it came out and have considered writing about it for some time. Today, I realized why we aren’t an MSP. It’s that the MSP play turns both the client and the provider into commodities. For those that don’t know, MSP refers to “Managed Service Provider” and has become an ascendent business model for IT service providers. The model is predicated on charging fixed monthly fees (usually per client and per server) for a limited list of services. The provider then tries to scale up and service more clients with fewer technicians by leveraging remote access and monitoring tools.
Patching and monitoring is necessary but not sufficient.
Intuitively I feel that IT services are difficult to commoditize. This isn’t the fast food industry. The MSP model says that proactively patching and monitoring systems will reduce service requests. I find that almost impossible to believe. Patching and monitoring could not have prevented 95% of the service requests that our company sees in the average week. Of course we do patching and monitoring. This is dictated by Best Practice. Of course patching and monitoring can help a service provider “proactively” fix some problems before they become visible to the end users. This is true for everyone who follows best practice, regardless of their model.
So what do we do? How are we different from MSPs?
We didn’t buy an expensive framework to cram all of clients into. We build custom solutions suited to each individual environment. We support existing systems as long as they are meeting business requirements. MSPs have the tendency to push clients to a unified platform that lowers costs on the admin side. Our model is more flexible. We go on-site regularly. We talk to clients face-to-face and work to understand what they are trying to do. Sure we can provide remote support as needed, that’s trivial these days. But relying on remote support exclusively turns both the client and the provider into a commodity. First IBM dumped their commodity businesses in favor of high value-add services and now HP seems ready to follow in their footsteps. I feel good about trying to learn from them.
Google Apps Transition
April 21st, 2011
If you have been using Google Apps, you will have noticed that many of Google’s other services, such as the Android Market or Google Voice, would not accept your Google Apps user credentials. Most Google services require that you use a regular personal (consumer) Google account.
But that is all changing now. Google is in the process of transitioning the Google Apps account infrastructure. Early adopters can go ahead and start transitioning selected accounts right now. Users will be given the chance to change the email address on any conflicting accounts and there will some options for transferring data between accounts.
I had been often annoyed by the need for separate accounts, so I am glad that Google is finally getting their act together and fixing that kludgy mess. Now I am off to grab a bunch of new Google Voice phone numbers!
