February 9, 2016 at 5:15 pm #35084
I have been asked again to justify what they are saying is the final hurdle to get time and permission to bring DSC into production. I wrote this and would love any feedback anyone is willing to give.
Windows Systems Development is always looking at our returns (value) on our efforts towards the System Provisioning BAU work, and quite frankly, looking back over the last year, we see opportunities for improvement. We needed to adjust to the fact that many of us are spending over 50% of our time doing project work, 10% of our time is spent in administrative meetings, 10 to 30% of our time is spent responding to traffic and BAU items that need immediate attention; the variance is dependent on our on-call or status. This doesn't include training, vacation, and work redistributions due to people be out for training and vacation. There has also been a considerable amount of shifting in terms of priorities of which areas we should focus our attention toward over the last year also, causing items being worked on to be left in an incomplete status. What we are left with, is 10 to 30% of our time has been spent on items we have not been able to take to completion, giving us very little return for our efforts.
This situation is also resulting in a degradation of our current Provisioning process, taking it from about an hour with little interaction, to about 5 hours with a high level of interaction. This is a living breathing thing, it required regular care and feeding. How much change have we seen in VMware, Storage, and UCS in the last year? A lot!
What we are proposing as a solution is that we take an evolutionary and incremental approach to improving and maintaining Windows provisioning process.
Evolutionary, because it will evolve as changes in the environment, increased knowledge and changing perspective happen. Incremental because instead of trying to stand up a whole system, we are going to break down process into smaller logical groups of items, categorizing them by giving them value or weight, considering the environmental changes and the business needs.
But our code development will be directed by our stated goal of standing up Provisioning II, utilizing PowerShell DSC, and moving away from anything that has contributed to inconsistency or has limited accessibility due to the unique skills required to work with it (like vCO). We also feel this better aligns our efforts toward the direction that Microsoft has taken with DSC, which Microsoft adapted to compete with the industry trends for configuration management. These trends being demonstrated by the wide acceptance of Puppet and Chief on the Linux side of the isle.
I would like to deviate slightly for just a moment, just to state that our goal is to take the provisioning process past its previous efficiency of building a server in less than an hour with an ORT of 15 to 30 min to building 5 or more servers at a time in under an hour with a separate ORT process automatically executing after the build process, producing a spread sheet with all the results from the independent testing process. This spread sheet will be placed in SharePoint and an email sent to the requester and builder with a link to the document.
Back to our proposed solution of taking an evolutionary and incremental approach to achieving the above stated goal. What were are proposing is we stand up 4 servers in production, which will give us a complete, but minimal DSC Environment with a PULL server in Earth and a PULL server in Mars, giving us the ability to add to our current provisioning process the tools of PowerShell modules being utilized by the Local Configuration Manager built into Windows Server 2012 R2 and beyond to configure these new builds. The Local Configuration Manager main role is the main unutilized resource currently. The use and purpose of each server will be outlined below. I am estimating 32 hours to get these servers stood up and configured. **
By standing up these servers, we can begin to utilize the resources of DSC on all the items or tasks where it brings immediate relief, reducing the items that are causing the current process to take 5 hours. After we work through that initial list, we can beginning to group and prioritize items that can be moved to DSC and this will further the cause of building out the new provisioning process. This not only stems the current tide, but we reap the longer term return on investment moving us toward the platform we will be using for the foreseeable future.
By carefully prioritizing and grouping items to work through, we will be able to be flexible to the ever changing demands put on Windows Server Engineering team. We will be able to make sure that our direction and effort is coordinated brings us in concert with each other, making us more effective and efficient. We will be able to work the task through to completion so we can realize the full value of our efforts.
After talking with most of the team, listening to their issues with the current state of the server build process, looking through the current list of items that have been waiting to be done, and going through the current ORT spreadsheet, we have created a list of things that can be moved to DSC and need to be prioritized. Of those things, the install and configuration of SEP, Splunk, Snow, BGInfo, Tripwire, TSM (backup) and a few NIC and registry configuration like Disable NetBIOS, IPV6, register DSN, and properly configuring the page file are all items that if eliminated, would reduce manual actions after the builds and ORT time. Therefor reducing the time to build a server.
I am estimating that this initial effort, the 32 hours to stand up and configure the 4 servers, and to complete the list of items in the paragraph above, will take 76 hours to complete.
Again, I am going to deviate slightly to give some context in terms of impact potential. We are all aware that toward the end of March, or the first part of April, we will begin the process of building 350 new servers. This doesn't consider the 40 to 50 servers we are building a month now. Looking at the 350, using the 5 hours it is taking to build a server now, it will take 1750 labor hours to build these servers. If our initial items conservatively reduces the build time by one hour, before we begin those builds, we will reduce that time to 1400 hours. We invest 76 hours and get 350 hours back. If we can get started right away and reduce the build time by 2 hours, we can reduce that 1050 hours, and it we can get the build times under 2 hours, we can reduce that work load to under 700 hours. I think this is a pretty significant return on investment in a time frame that make a pretty compelling case to get started as soon as possible.
I would even consider arguing that the time spent on this effort can be billed as prep work for this project, which will help assure the success of this project like very few other things will.
I think it is worth mentioning that when we get the system to the point where it can build multiple servers at the same time, using 5 consecutive builds as a conservative reference point, it would reduce the man hours for all 350 servers to 70 hours. Not only could that process be done by 2 people in the EOC in just under a week; think about the other amazing work and progress the rest of the Windows Server Engineering team would be freed up to do taking away the 2700 hours of server builds (45 Server per month X 12 months = 540 Servers, X 5 hours per server build = 2700 labor hours.)
Combining these two areas, we are talking about saving between 3000 to 3400 labor hours over the next 12 to 14 months.
This is a list of some of the things that were considered for the initial list of 5 and are some of the things that will be worked on shortly thereafter.
SEP, Splunk, Snow, BGInfo, (One line)
Tripwire (DC2), More Effort
TSM, Bigger Effort, (includes interactions with TSM server)
Configure backup schedule
Construct Config file on client
DBA Slipstream SP3
IIS + log mgmt., SQL
Registry key settings
Server owner and App owner
At one point I believe there was between 25 and 30 registry setting that can all be done in DSC.
Server Base Configuration
Set Page File
GPT Disks (non-OS)
RSS and NetBIOS settings on NIC's
Rename Local Administrator Account
Disable register DSN
Move Server AD Object to proper OU
Computer add and removal
Administration (28 Hours)
ORT, create separate testing process along with each item that is developed for the new provisioning process so that we have an separate but automatic build confirmation system in the end.
Implement Automatic Ticket Creation and closing, (make tickets relevant to actual servers built.
1) Complete design, specifications and builds for 1 production Authoring Server, and 2 IIS Web PULL servers (Earthand Mars) and one Compliance reporting server.
February 10, 2016 at 10:16 am #35118
Just a couple of quick thoughts (hopefully I'll have some more time later to respond in detail).
Love the detail you are bringing in. I would change incremental to iterative. There is a bit of baggage with incremental. Occasionally you find that you'll need to throw out some part of your process, whether it is abandoning a resource, changing a tool, or whatever. Incremental subtly implies that you'll build on what you have and makes it harder to rationalize throwing out something that isn't working. Iterative lends you to thinking you'll try something, evaluate, and then try something else.
In addition to the pull servers, you'll want to get PS 5 and/or powershellget in place as soon as possible. You'll need a nuget feed or file share to distribute from, but that'll handle module distribution outside of DSC resources.
Initially converting tasks to running through DSC always takes longer than expected. It's actually hard to get things done consistently. Definitely invest in testing up front. Build acceptance tests for what you expect your configuration management to do (and some of what not to do). Use those tests to validate if your DSC configs are making the changes you want and keep them in place as you change other things. This takes a lot longer than most folks expect, but is way worth the effort.
I have a few quotes from Chef customers who have realized the value (and several are using Chef with DSC on Windows – but the comments are more generalized). I'm sure our sales and marketing folks have more detailed stuff, but this is what I have on hand. I'm omitting the specific customers and each quote is from a different org – if you really care we can talk more offline.
I'm including these quotes so you can see that there are other real orgs using these patterns and techniques and realizing business value.
– Estimated cost per server fell from
$5,200 to $65. Estimated cost per 200
deployments fell from $1M to just $4K.
– Reduced Internet banking
deployment time from 6 months to 18
minutes. Reduced time to deploy SQL update
across infrastructure from 3 days to 3 hours.
– Reduced typical virtual server
deployment from 80 hours to 1 hour.
– Reduced time to production
for new application environments from 4-5
weeks to 1-2 days.
– Precise configurations
eliminated error-prone manual processes
Infrastructure can be configured, tested and
deployed in minutes vs. days or even weeks
before Chef, providing repeatability and
compliance across multiple data centers.
February 11, 2016 at 7:26 am #35161
Thanks for the response, I am going to make the change to iterative. I really like the feedback you have gotten from your customers. I have been told to start with a summery and try and add a break subtitle?
Still hoping for a little more feedback. I thought people would have written 100 of these by now.
February 11, 2016 at 8:40 am #35168
You have covered a ton in that. If I was at the mgmt level, the important parts are the cost savings, lack of need for "special" skills, and that it's built-in to Windows. There's a lot of backing from MS to see that this is available in all of their offerings to make it become THE way to manage things.
You must be logged in to reply to this topic.