This post is a hodgepodge list of what I think of as the IT fundamentals. These are activities tangential to our regular analytics, big data and data science work, but they build a foundation that nudge the odds of project success onto our side. Getting these items right will not garner much praise, but without them, the project foundation has cracks, and eventually the cracks will cause real damage to the project and our reputation.
There are three scenarios when even the most seasoned pro can get a bit sloppy. First, when wrapped up in the excitement of a new project, especially one with new technology, where there are few established methodologies and frameworks. Second, near the end of a project: when under pressure to deliver, no matter how unrealistic the timeline, there is a temptation to be the project hero and to take risks in order to finish on time and under budget. Third, the most common source of carelessness, is old fashioned procrastination on boring and low-profile tasks.
Case in point: At a client site where I was recently consulting, the server hosting the QA data warehouse database died suddenly. Fortunately, the database team was very diligent about making regular backups. No problem, we thought, we will just spin up a new server and restore the database. Unfortunately, the database team was less enamored with testing the restore process and we learned that restoring a backup in this environment had never been tested. As per Murphy’s law, it went wrong and the backup could not be restored. The database had to be rebuilt from scratch and data copied over from the production environment. This was time consuming enough and then things got worse. When the QA data warehouse was back up and running with data, the ETL was producing numbers that were different than before the crash. So now more troubleshooting is going on and even more people are involved. Eventually, the source of the problem was found but it took developers away from new work, the business users were left waiting and the reputation of the entire project team was tarnished.
What should have been a minor hiccup turned into a multi-week delay. A long running project with a large team can absorb this type of mishap but it is devastating to the schedule of a small project team trying to deliver on a short timeline.
This leads to number one on my list of IT fundamentals for success:
Backup and restore
While this is likely to be the responsibility of another team, it behooves us to ensure that regular backups are done and that the restore process is reliable and tested. Bringing back a dead server should take a few hours at the most and not become a project unto itself.
Have a glue language handy
Knowing a polyvalent scripting language has served me well on countless occasions. Some are obvious, like for scheduling jobs to run overnight and for automating tests. Other uses are more obscure. For example, I remember a complex report with hundreds of data elements that I was tasked with developing. Rather than using the report builder UI, I created the the report in xml format using data exports to csv files, some text file manipulation and xml transformations with Perl. I saved many hours, if not days, of tedious and error prone dragging and dropping that would have been required to develop the same report in the report development UI.
Python is my language of choice these days, although in past I have relied on Ruby, Perl, Unix shell scripts, Sed/Awk and Windows PowerShell.
Source Code Control
My rule of thumb is that any file that gets modified more than once needs to be in a source control system. You just never know when you have to go back a few steps or the client changes direction on a previously agreed upon modification. It also allows you to recover from a mistake or to see how things were working before a change. This is not only applicable to development work, but configuration files and documentation. SVN and GIT are free and popular. Google Docs and other cloud document management systems do versioning reasonably well for documents and spreadsheets.
Even in 2015 adequate disk space is problematic. Again, this is usually not our responsibility but it is our team that deals with the consequences. When this comes up on a project I often make an awkward joke about solving this problem by driving to Costco and buying a palette of portable multi-Terabyte drives. Obviously that is not a real solution but my point is that if I can easily buy hundreds of Terabytes of storage for my personal use and have access to it within a couple of hours (depending on how close you live to a Costco and how quickly you can remove the packaging) than there is no excuse for lacking disk space inside a company big enough to run into this issue in the first place. Enterprise grade hard disks are cheaper than ever, virtualization technology allows storage adjustments on the fly, SAN technology is mature and robust, not to mention the myriad of cloud storage options available.
There needs to be a support plan with an escalation path for anything that goes into production. When a business user calls or emails with an issue, there should be a process in place to deal with it. Clear lines of responsibility need to be established in advance so that nothing falls through the cracks or gets stuck in support purgatory. This is especially critical for issues that straddle technical boundaries. In particular for BI and analytics projects, anything newly released needs a joint support team made up of, at a minimum, a business analyst, who knows the business processes and the data, and a technical representative, who is empowered to make modifications quickly. A successful project from a technical perspective can become an “IT sucks it again” story in the eyes of the business (who are ultimately footing the bill) if the users cannot get timely and effective help.
Keep Business Requirements Real
My guideline for business requirements is a citation from Albert Einstein:
“Everything should be made as simple as possible, but not simpler.”
I encourage teams to minimize the time spent documenting detailed reporting or dashboard functionality unless there is a specific reason for doing so. Elaborate report mockups are needless distractions (again, unless there is a specific reason to present the data in a specific format). The focus of the business requirements should be on the questions that need to be answered by the data, who needs these answers and when. Be clear about what business process will be improved by timely data. What are the positive outcomes for the business?
Here is a simple example: the sales management team needs to know every morning how many deals closed the day before, which deals are likely to close in the next ten days, the running total of deals and revenue for the quarter and where each sales team is in comparison to the same date last year. The managers need to see this in a format that can be viewed on a smartphone or printed off from a laptop. This report will be used by the sales managers to prioritize with which sales rep and customers to spend their time on a daily basis and for discussion about sales strategies in the weekly sales management meeting. The target outcomes are a 8% increase in sales and eliminating the time wasted haggling in sales management meetings over who has the “right” numbers.
Break out of the either/or IT paradigm
This last item is more philosophical than practical. The problem with enterprise IT has always been how to balance keeping up with business needs (making changes as often as required) with providing a stable, secure and reliable environment (by making as few changes as possible).
This manifests itself in complaints from the business that making changes to the data warehouse is too slow and the business intelligence reports get out of date too quickly to be useful. In retaliation, the IT side will complain that business people keep changing their minds but everything has to be delivered immediately. In the BI, big data and analytics world, we are balancing two opposing objectives: building a solid data architecture and delivering timely results to the business.
Trying to deliver too quickly results in data quality problems, technical debt and systems that nobody trusts or wants to work with. Being too slow for the business puts the data warehouse on the road to obsolescence as the business turns to self-service data discovery tools and Excel (which can lead to another set of problems).
It is time to extinguish either/or thinking on this issue. In business management terms the solution is known as “both… and” thinking (Jim Collins) or the “third alternative” (Stephen Covey). Enterprise IT projects must incorporate the lessons from agile, scrum, lean development and continuous delivery. This is how fast moving Silicon Valley startups get software out the door so quickly. None of these methodologies are a panacea but some of the practices can be adopted inside an enterprise IT environment. For example, technical teams should work in shorter “sprints” that produce discrete chunks of work that can be released immediately for business user acceptance testing.
Most of these ideas find their inspiration in the book, The Pragmatic Programmer, which I recommend to anyone working in IT. The software tools and the buzzwords change but these ideas transcend technology and build a foundation for professionalism in our industry. As business intelligence, big data and data science professionals, it is our responsibility to take a step back from our daily data work and be sure that the IT fundamentals are in place.
Thanks for reading. Let me know in the comments or on Twitter if there anything else that you would add to this list.
photo credit: flickr.com
David Currie has been helping businesses get the most out Cognos Business Intelligence software since 1999, first as a Cognos employee and since 2008 as an independent consultant. He develops the solution architecture to satisfy complex business reporting and analytics requirements, sourcing data from operational databases, data warehouses and now big data repositories. He blogs about business intelligence and big data at davidpcurrie.com. Connect with him through the blog, LinkedIn or Twitter.