So yesterday my Windows Server VMs running in Windows Azure (VM Role) were automatically shutdown and then later restarted. I assume this occurred due to an update/update to the host server and/or environment. I have my servers deployed in pairs where each pair is in the same availability set. The idea here is that only one VM per availability set will be taken offline at any one time. As servers are added into an availability set they are done so without adding the server into the same rack/fault domain as the other members. The theory is this should push your SLA from 99.9% to 99.95% (I assume the last .05% is to account for certificate expiration )
When determining how machine machines to add into an availability set you need to ensure the load handled by the machines can be satisfied with x-1 machines were x represents the number of machines in the availability set. So in my case, for this pair, my x was 2 with the idea a single server could handle the load. Of course you would likely want to configure more to ensure there is no single point of failure. It is fairly trivial to add 4, 5, or even more servers into an availability set using PowerShell.
With the promise from Azure that only one server will ever be down at any one time the next question you may have is: So how did my Azure invoked outage yesterday fair?
So as you can see Windows Azure’s Availability Sets worked as advertised!
As an unplanned follow up to my previous post I wanted to reply to some of the feedback I received and take another run at this little IO test. The feedback was generally around “what” I tested rather than the “why” and “how”. I had no doubt the “why” was super clear and I was not interested in debating the “how” because as I said before this is a very informal test so I am glad I did not receive either of those remarks. As for the “what” feedback it boiled down as:
- You get what you pay for – at $9 a month for Azure and free for EC2 what did you expect?
- Try testing on a more realistic platform, one that someone may actually expect decent IO.
- How about a newer “Cloud Ready” OS bro?
- Hey buddy, we are friends and I work for Rackspace, so why didn’t you include them in the mix?
All are fair comments – so lets take another stab at this and see what paying a bit more money can get us.
For those that did not read the previous post, the reason I am doing this testing is because the general feeling from a few of us using VMs running in the cloud is the IO seems or feels pretty slow. While Amazon, Windows Azure, and RS give you options when it comes to the number CPUs, network speed, disk space, and RAM it seems when it comes to disk IO you get what you get. While Amazon EC2 does give you designators such as “low” or “high” IO for some of their instances, there is no real indication of what that actually means or how it compares to other providers.
Update: Check out Part 2.
On an email distribution list yesterday someone commented on the disappointing IO performance they received while running a SharePoint Farm with VM roles in Windows Azure. As an Azure user I too had noticed the IO did feel a bit sluggish but with a super fast SSD in my laptop just about any VM these days feels that way. Just a few days earlier I was checking out Amazon’s EC2 pricing and it appears the cost to run a VM in EC2 vs. Azure appear to be about the same for about the same configuration. So naturally the next question is, of the two cloud services which offers better IO?
I get asked all the time about a good SharePoint reading list. I have always found the SharePoint MCM reading lists a really good start. Some of these items are blogs and papers with additional links which if the reader follows those will normally find it could take a good amount of time to traverse the entire list. There are quite a few SharePoint books out there too but these lists do not include any books. Here are the links to the SharePoint 2010 and SharePoint 2007 Pre-Reading List for the SharePoint MCM program.
SharePoint 2010 MCM Reading List
SharePoint 2007 MCM Reading List
So get your learn on and start a reading…
One of the really cool things about working on and having the community use your tools is sometimes you get really great feedback and suggestions. Recently Heiko Hatzfeld from Microsoft PFE suggested we include method parameter or argument details into the output details produced by SNAP when it takes a snap of a process.
I have a service which leverages Azure’s Service Bus Queues. Clients post messages with session into a request queue and then wait/block on the response coming back via a response queue. I have a number of Azure Persistent VMs which each have a windows service which monitors the request queue and once they have work it takes about 3 seconds for them to process the request and queue a response into to the response queue for the waiting client.
I have been running with a 256GB Crucial C300 in my Lenovo W520 for a while now and it has been great. I love the performance and I love to see applications “pop’ (vs poop) open. Someone, I forget who, tweeted the other day about a great deal on Amazon on a Crucial M4 for $399 so I hit Amazon, did the 2 day Amazon Prime ship and here I sit with a new drive just populated with Acronis True Image Home 2011.
So have you ever heard this “There is Nothing in the ULS Logs” or better (worse) yet you have experienced it. Yea me too, it’s a real bummer and your next step is to typically crank up the ULS logging verbosity and crossing your fingers. Sometimes you get lucky and sometimes you don’t – so where next? I have found myself attaching a debugger and looking at the managed exception messages that trail across the debugger window while I reproduce the problem and sometimes these are enough to either provide a line of investigation or possibly the answer to my issue.
I am working on a MVC4 RC application within the Visual Studio 2012 RC. I started to notice my web pages were starting to get a bit sluggish with regard to response times. I found that I could not get any page to return in less than 1 second. So I stated to do a bit of digging – I created a TestController controller with a single Index() action and a view which did not use any master page/layout. The page, according to FireBug was 2-3ms coming from IISExpress running on my local dev machine. Now the view was a very simple page with a DocType, header, and body without any content so this is about as basic as you can get.
As a follow up to my previous post “When Page Output Caching Does Not Output” I have recorded a video which actually walks you through the steps and issues which I documented in this previous post. So for those of you whom don’t like to read all that much you may watch this video and/or refer back to my previous post on the same subject.
Recently I left Microsoft where I worked for almost 15 years and where about 10 of those years were spent in Escalation Services where my daily routine was debugging failing or faulting applications. This all began with user and kernel mode Windows processes and then once the .Net Framework shipped I move to the ASP.Net and CLR teams and began debugging more managed processes. Normally customers would send my team crash dumps or memory dumps of the offending process(s) and we would use tools such as WinDbg or CDB to dig deeper into the process to determine what was happening. There are several challenges when doing this type of work and one of the most painful is locating and referencing the correct symbols files (*.pdb).
So let’s say you have a sense of humor and your co-worker fails to lock his or her computer (and they have a sense of humor too [very important]). Checking out the calendar you notice it’s April first, BAM a perfect opportunity has just landed in your lap – now what? First, don’t do anything that will get you fired, because that isn’t really all that funny. So what should we do to this poor sap’s computer?
I have a 16GB Lenovo laptop which I use in my daily work. It runs Windows 7 and while you can install SharePoint 2010 on Windows 7 I choose never to do that (you can read more here about why I don’t use Windows 7 as my SharePoint development platform). I am not a big fan of dual, triple, quad, (or whatever comes next) booting, because as soon as I boot into one OS I will likely need to send email or do something which is setup in another OS. I also don’t like running a server OS on my laptop because I use Bluetooth every once in a while and I like the hibernate and sleep functionality Windows 7 provides. So until Windows 8 hits mainstream with its virtualization platform I must resort to running a 3rd party virtualization solution so I chose VMWare Workstation and currently I am running their latest version 8.0.
UPDATE: The SharePoint Foundation 2010 April 2012 has a fix for this issue. Check out http://todd.in/spversions for more information.
Back in December Microsoft released a patch they called MS11-100 which addressed a vulnerability in the .Net Framework. In addition to correcting the original issue it introduced a regression which breaks SharePoint’s Page Output Caching. As mentioned in my previous post while SharePoint puts all the constructs in place for Page Output Caching its really ASP.Net which actually stores and manages the Page Output Cache on SharePoint’s behalf. As ASP.Net decides what to cache for SharePoint it looks at the HttpResponse’s Cookies collection and if any new cookies are being set/sent back to the client the page content will not be cached. As a result the next request for the same page which matches the varyby parameters set to SharePoint will result in a cache miss and the page processing again will occur.
SharePoint’s Page Output Caching can offer a massive performance boost to publishing sites but only when its working and working correctly. One of the problems I have seen is when some administrators turn on Page Output Caching they just assume it works. While this may be the desire and in most cases it may just work for you I would suggest you verify; and I don’t mean hit the site with the browser to see if it speeds up.
This post is about troubleshooting SharePoint’s Page Output Caching. Now if you don’t use Page Output Caching or yours is working just fine you are the “Master of your Page Output Caching” – as for the rest of us we will likely need to put on our troubleshooting hat and dig a bit deeper. I find troubleshooting anything is allot easier if you know a little about how the component you are troubleshooting operates and how it is suppose to work.
This Friday, 11-11-11, will mark my last day at Microsoft. After 15 years of working for the best software company in the world I have decided to accept another challenge. I have so many great memories, experiences, and friends this move was not trivial but ultimately could not be passed. I owe so much to Microsoft and all of the co-workers I have ever worked with. When my kids were in the hospital I was worried sick but I did not have worry about any bills or costs for the visits — Microsoft’s health benefits are the best around. Having the opportunity to work with the smartest folks around only made me want to work that much harder and made me a better engineer but more importantly a better person. I have so many friends I have worked with around the world whom have all contributed to my successes and my hope is one day I will have the opportunity to payback what they have so unselfishly provided me.
A few observations after downloading the Power Point presentations from last week’s SharePoint Conference 2011
In a previous post I spoke about how importance of pre-populating SharePoint’s Content Database’s UserInfo table with users for landing/root (and/or very popular) sites just before a large release of a new SharePoint web application. While I did mention the API you could call to make all this happen I did not provide any tooling. This post is about a small tool I wrote, which at this point has been used with a couple of customers, to pre-populate UserInfo tables.The tool itself comes in two flavors – one for MOSS 2007 and the other for SharePoint 2010 Server. Both flavors allow you to export users from the User Profile Store to a flat file which can then be imported in a manner which populates the UserInfo table. In addition, the SharePoint 2010 version supports both Windows and Claims users.
When an authenticated user, whom has never visited a site collection, first visits a site there are a number of tables within the Content database which must be updated. This activity can be expensive and performance can suffer when the site collection is the root of a web application which has just been announced or released for the first time into production. In fact, I have seen first hand this behavior take down a very large SQL server upon initial launch of a large intranet site to the point we had to roll back and I have teammates which have had the same experience (hence this post).