Custom Groups vs Custom Datacenters

Recently I have been working with a global organisation to assist with capacity planning and wastage of their virtual estate. This organisation has a significant number of business units and network zones that need to have their capacity planning performed independently, but the clusters within those boundaries are generally managed by the same vCenter server with geographically focussed datacenters. The challenge therefore was how do we manage capacity of our groups of clusters (this customer has hundreds of clusters)? One thing to note is that this customer has a standard (which is then reflected in a vROps policy) to only use an allocation based model of capacity utilisation and they were unable, for a variety of reasons, to move to a demand based model.

In vROps we have two means of grouping objects; custom groups and custom datacenters. Unfortunately neither of these is ideal for our purposes, due to the reasons shown in the table below:

PROS CONS
Custom Group
  • Membership can be dynamically configured based on many metrics, relationships or properties
  • Capacity remaining can be calculated via super metrics that sum the capacity remaining of child objects
  • Time remaining is not available as a metric on the object
  • Custom groups are just that groups as opposed to objects in their own right
Custom Datacenter
  • Allows large datacentres to be configured into logical groupings that more readily reflect the breakdown of an organisation
  • Treated as a full object in its own right meaning that capacity remaining and time remaining are calculated for the custom datacenter based on the child objects
  • Different policies can be configured between custom datacenters and their child objects
  • Membership cannot be natively configured
  • Different policies can be configured between custom datacenters and their child objects
  • Consistency between the values computed for a custom datacenter and its child objects can sometimes be challenging
  • Based entirely on infrastructure so can only be broken down into clusters or hosts. It is not possible to create a custom datacenter based upon virtual machines directly

When discussing these with the customer they asked the obvious question:

Is there a way to determine the time remaining of a group of clusters based on the time remaining of the individual clusters via a super metric?

The simple answer is no, and the more complex answer is still no

Take the following simple example where we have two identically sized clusters and every VM that we deploy is the same size and we deploy a set number of VMs per day (to make this example easier to understand)

  • Cluster A has capacity for 10 VMS and we deploy 2 VMs per day
  • Cluster B has capacity for 20 VMs and we deploy 1 VM per day

Based on the above the time remaining for each of these would be as follows:

  • Cluster A – 5 days
  • Cluster B – 20 days

So what is the time remaining for the clusters as a group? 20 days? 25 days? Something else? The table below shows us how the group of clusters will be filled and therefore how much time remaining there is:

        Day
 0 1 2 3 4 5 6 7 8 9 10
  Cluster A  Capacity Remaining  10 8 6 4 2 0
 Time Remaining  5 4 3 2 1 0
  Cluster B  Capacity Remaining  20 19 18 17 16 15 12 9 6 3 0
 Time Remaining  20 19 18 17 16 15 4 3 2 1 0

This is a a simple example and yet the maths is pretty complex, imagine this with 10 clusters, differing virtual machine sizes, different and changeable deployment patterns and the ability to define a single value via the super metric mechanism is not suitable.

The answer therefore seems to be custom datacenters, as by using these we gain access to vROps inbuilt capacity engine to calculate the time remaining figure for groups of clusters. Custom Datacenters are top-level objects within vROps and every benefit that goes with that.

There are still some challenges that I’ve experienced specifically around consistency of numbers whereby the number of virtual machines remaining in the clusters that make up a custom datacenter don’t necessarily add up to the number of VMs remaining for the custom datacenter itself. This can however be masked by the use of super metrics.

The second and much larger challenge however is that there is no dynamic membership mechanism for custom datacenters. In smaller environment this may not be so much of a problem whereby the membership can be managed manually. However, in environments with hundreds of clusters, then managing this manually via the UI is both impractical and prone to mistakes.

With my current customer we’ve used the vROps API both to create the custom datacenters but also to manage the relationships between the custom datacenters and the underlying clusters. More information on how to use the vROps API can be found at the following links:

Straight up flying with the vrealize operations rest api

vRealize Operations Manager API Guide

In doing so PowervROps was born, a module that allows the use of the vROps API directly from within PowerShell, but that is worthy of a post all to itself:

PowervOps – PowerShell cmdlets for the vROps API

 

PowervROPs – Powershell cmdlets for the vROps API

What started out as the functions that were written for a customer I was working with has become something of an obsession that has resulted in a full-scale PowerShell module that exposes various parts of the vROps API.

The module should not be seen as a replacement for the vROps cmdlets available within PowerCLI, more as a complementary set that were born out of the specific requirements of a customer, but have since evolved as additional functions have been written.

The module is available on GitHub at the following URL:

https://github.com/andydvmware/PowervROps

There are currently 41 functions within the module, and for the most part they accept all parameters as defined by the API.

Screen Shot 2017-08-21 at 14.32.07

Over the coming days/weeks I’m going to write a series of posts detailed some of the ways in which the module has been used, however some of the highlights of the current release include:

  • Ability to generate and then download reports
  • Ability to add metrics and properties to objects
  • Add, set and delete relationships
  • Create new resources (for example custom datacenter objects)
  • Creation and deletion of custom groups
  • Ability to start/stop monitoring of resources
  • Ability to mark/unmark resources as being maintained

Configuring the time frame that vROps uses when calculating capacity and time remaining values

The challenge: Configure vROps so that it will perform capacity based calculations in a way that is less susceptible to historical events.

By default, vROps will use all available data when calculating the amount of capacity and time remaining for an object. In certain situations and scenarios it may be preferable to only use a certain time frame when calculating these values. Examples of such situations could be:

  • Upgrading the hardware in a cluster which increases the available capacity
  • Large scale provisioning or disposal of virtual machines that cause a significant impact on the amount of used capacity

The capacity and time remaining figures are inextricably linked. Each time capacity is calculated vROps will first work out the capacity remaining and from that the time remaining figures. When calculating these values vROps will use all available data and then try to smooth a curve across the full extent of the data range.

Take the following two images of utilisation of CPU. The two charts both represent real world data. In the first the full data range is shown, which shows a number of events that occurred within this cluster; hardware upgrade and a large scale decommissioning of test VMs. The second is the same cluster but the image has been cropped such that only 90 days worth of data is shown.

Full data rangeFull-range-metric
90 day data rangeMetric-90day-Window

The diagrams show quite clearly how it is easier to work out the underlying trend line of the data in the second image compared to the data in the first image.

It should also be noted that the underlying algorithms used are particularly susceptible to inaccuracy when the allocation model of capacity management is used, and the issues seen here do not necessarily apply when using either the demand based model or when the demand and allocation models are used in combination.

vROps also has a policy element called ‘Time Range’ which is set by default to 30 days and as per the description within the policy this time range is only used for non trend analytics and has no bearing on the analytical side of vROps which both capacity and time remaining fall into.Screen Shot 2017-07-24 at 20.00.17

As mentioned earlier, vROps will use all available data when calculating capacity and time remaining but there are a couple of additional gotchas that should be considered:

  1. The data retained may or may not be as per that configured under ‘Time Series Data’ on the ‘Global Settings’ page. vROps will purge data beyond the value configured here but it will not be instantaneous as the removal of data is a relatively expensive operation and so the amount of data retained may be a few days or weeks beyond this value
  2. vROps also rolls up data into coarser grained values but still uses these values when calculating capacity.
  3. When changing the time frame value this will affect time remaining, trend views and projects.

 

To alter the behaviour of vROps so that it uses a specific time frame for capacity calculation a number of steps need to be performed; firstly to alter the time frame and secondly to alter how the rolled up data is used.

DATA TIME FRAME TO USE

Care should be taken when altering files on a vROps node. The file should be backed up prior to making changes.

SSH into every vROps node as the root user

Modify the following file: $ALIVE_BASE/user/conf/analytics/capacity.properties

Change:

historicDataRangeForTrend=-1

To:

historicDataRangeForTrend=90

The number shown here is for 90 days, enter the appropriate number of days that you wish to use

Save the file and exit the editor

Restart all vROps services by issuing the following command as root:

service vmware-vcops restart
Once vROps analytics has finished starting up, trend views and projects would be immediately effected. The time remaining values will only be affected after the next capacity calculation.

ROLLED UP DATA

Depending on the version of vROps that is being used determines how to alter this behaviour. In vROps 6.6 the configuration is now directly exposed in the UI, but in prior versions a configuration change needs to be made on each vROps node

6.6 Modification Change

Alter the global setting ‘Additional Time Series’ from its default of 36 months to 0 months.

Prior to 6.6 Modification Change

Care should be taken when altering files on a vROps node. The file should be backed up prior to making changes.

SSH into every vROps node as the root user

Modify the following file: $ALIVE_BASE/user/conf/fsdb/rollup.properties

Change:

rollupReadEnabled=true

To:

rollupReadEnabled=false

Save the file and exit the editor

Restart all vROps services by issuing the following command as root:

service vmware-vcops restart
I’d like to credit my colleague James Ang who spent many hours working with me on this and providing a lot of very helpful information and insight into the inner workings of vROps capacity analysis engine.

Parsing of XML with vRealize Orchestrator

Parsing of XML is fairly straightforward with vRO, however there is one major gotcha you should be aware of; namespaces.

Depending on the XML content that vRO is trying to parse it may not return any values if certain namespaces are used. One way around this is to do a string.replace on the xml prior to parsing it.

An example of how to parse XML is shown below:

The code below is based on the following XML

<?xml version="1.0" encoding="UTF-8"?>

<CreateITDCIResponse xmlns="http://www.ibm.com/maximo" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" creationDateTime="2016-06-22T13:07:11+00:00" transLanguage="EN" baseLanguage="EN" messageID="1466600831635951728" maximoVersion="7 1 20110105-1024 V7118-37"><ITDCISet>
<CI>
<CHANGEBY>Someuser</CHANGEBY>
<CHANGEDATE>2016-06-22T13:07:11+00:00</CHANGEDATE>
<CIID>123456</CIID>
<CINAME>CINAMETEST</CINAME>
<CINUM>CITEST</CINUM>
<ITDCIMANAGEDBY>TBD</ITDCIMANAGEDBY>
<ITDCIOPERATINGENV>TBD</ITDCIOPERATINGENV>
<ITDSERVCONTPLAN>0</ITDSERVCONTPLAN>
<STATUS>VERIFY</STATUS>
<STATUSDATE>2016-06-22T13:07:11+00:00</STATUSDATE>
</CI>
</ITDCISet>
</CreateITDCIResponse>

And this is the vRO code used to extract a specific field, in this case the CIID

// Strip out the namespace definition as vRO doesn't handle the definitions very well

var modifiedXMLString = xmlString.replace('xmlns="http://www.ibm.com/maximo" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"',"");


var parsedXML = new XML(modifiedXMLString)

// Check to see whether we have any content

if (!parsedXML) {
 var errorCode = "Invalid XML Document";
 throw "Invalid XML document";
}

// Validate that only 1 CI has been returned

if (parsedXML.ITDCISet.CI.length() != 1) {
 var errorCode = "Invalid number of CIs returned";
 throw "Invalid number of CIs returned";
}

else {
 for each (CI in parsedXML.ITDCISet.CI) {
 System.log("The CI number is: " + CI.CIID);
 }
 }