DevOps The Hard Way

I’m going to be doing more management and ‘glue’ business the next year or so. Part of this business is selling and personifying the value…

Michael David Cobb Bowen
Michael David Cobb Bowen
Abstract: A story-driven explanation of DevOps as the hard coordination work needed to diagnose and fix cross-system enterprise delivery failures.; Generative answer: The post argues that DevOps is not just startup tooling but the discipline of owning the full delivery chain across databases, middleware, networks, support teams, and business priorities.; Search intent: Understand why DevOps matters for enterprise applications with database, middleware, network, and operations dependencies.; Specific topics: enterprise DevOps, cross-team incident diagnosis, middleware timeouts, database and network dependencies; About: Product delivery, Platform modernization, Heritage systems; OmniArcs journey: Platform Journey, Delivery & Product Engineering; Source categories: DevOps, AWS, Enterprise Technology, Enterprise Mobility, Project Management; Audience: technical decision makers, AI leaders, platform leaders, data leaders, and product engineering teams.

I’m going to be doing more management and ‘glue’ business the next year or so. Part of this business is selling and personifying the value of DevOps. Like Cloud, this is something that is insufficiently understood at a deep(er) and nuanced level. So, as is customary, I’m going to tell a story.

The story, like most of mine, comes from an experience that burned. Something that left scars and had me wondering how people cold get into this mess. And so like the man says, share your scars. The situation was that I was on the very cutting edge of what I could do with Essbase + Essbase Studio. The requirement for drill through was fairly obvious. We all knew the limits of how much data we can squirrel into a multidimensional cube. So I used Essbase Studio to map back to the Oracle DB and bring back some records. Now it turns out that my customers wanted something on the order of 10,000 records in this detail. Well that doesn’t seem like much. You could grab 10,000 records across a dozen columns, cut and paste them from one Excel spreadsheet to another right? That should only take a few seconds. Not the drill-through. That data had to come over the network. Well you could copy a spreadsheet with a dozen MB of data from a network drive to your desktop, right? That should only take a minute. Not the drill-through. We had to fulfill a query request from a database.

My queries were taking 7 minutes and 30 seconds and then dying.

I had to find out why. Thus began my painful birth of being DevOps. The first thing I had to learn was the difference between a view and a materialized view. Well that wasn’t so difficult to learn. But I had always assumed that my DBA was materializing data for me. Well he didn’t have enough disk space to do that for ad-hoc queries. So that meant I had to learn the procedure for requesting new disk space from the DBAs. How much did I need? I don’t know. A terabyte? Impossible! Impossible? I can go to Best Buy and get a terabyte. Yeah but one live terabyte means four other terabytes according to our backup and DR, and we’re at the limit of the current server which means we’d have to get a SAN device and… well how about an NFS drive? Nope. Can’t have an NFS drive that would slow down everything I need local storage. Well, we’ll get back to you. But how can you be sure that the database is the bottleneck? I don’t know.

I had to find out where. What is timing out? Was it the Excel add-in? No. Was it the java middleware? Maybe. Who knows how to read the profile of the java middleware? Well there’s no documentation for that, you’ll have to call the engineers at Oracle. OK. Open up a service request and get an appointment. Who has access to the middle tier? Get access to the middle tier so you can log on. Oh by the way, the one support engineer is in Mumbai. That means you stay late, past 7pm Pacific time to get your answers, when he’s available. OK change the profile, add in this line for the timeout. That didn’t work? Oh you have to get the latest patch. Will it work with the version of Essbase Studio we’re running here? Oh snap, we’re going to have to burn a new version for you, but you’re going to have to upgrade your java app server. OK now the explicit timeout is 15 minutes.

Still times out.

I had to find out how. What is the mechanism that creates the time out. Get this new tool called Fiddler, it will help you debug the HTML stream. Debugging HTML streams? Well, maybe it’s the size of the download that’s stopping things. OK did that. It’s not the size. Well the corporate standard timeout is 10 minutes.. What corporate standard? The corporate standard on the firewalls between the users and the data center. Well can we get an exception? Maybe.

So it basically took six weeks for me to deal with the various network engineers, database admins, support staff and their management to prod them all to buy what I was trying to sell, which was the viability of this entire project. My only leverage was that I was consistently riding herd on the problem and I was a very expensive third party contractor. So the project was late and the entire overhead of the difficulty in justifying business as usual in the various departments was the only thing that motivated people to go to extraordinary lengths to solve the problem. Everybody wanted the problem to be somebody else’s problem. And until we found out exactly what the problem was, everyone was pointing fingers until the last possible minute. It turned out to be a default in one of the load balancers that everyone assumed was set to 10 minutes, but communicated 7.5 as an override to the other. Those machines required firmware upgrades as well.

I have been accustomed, throughout my entire career in BI to be responsible for the entire data supply chain. That I could do. But middle-tier service configurations, firewall settings and DR disk availability was all above my pay grade. I was not paid to know and I was too expensive to be paid to learn. In that way, I’m accustomed to being like the wiley developer whose time is too valuable to waste learning these operational details. At the same time, I was equally demanding of all those dependencies. Give me more memory on the app server! Open up the damned ports I want! Get more disk, you lummox! Of course let me not forget the memory constraints on the end user machines.

All of this was a terrestrial implementation and it had other setbacks too, but it was a fascinating six month engagement. I of course learned a lot about these other systems with respect to how they affected my entire piece of the data warehousing applications. I sensed that I had the capacity to understand, but I’d never remember unless I had some responsibility and permission to make changes. That would be impossible without the cloud. But even when I had the cloud, it was more than just having control of the associated systems but really understanding how they worked. That’s a story for another day. What was clear was that it was very difficult to manage all of the departmental areas, and get the priority within those departments (at their various locations) to solve a showstopper problem in this one application. It was 2011 and we were testing the very limits of the IT capabilities of a global corporation. DevOps might be a cool thing to talk about with web startups, IE a DevOps engineer would be cool for your website, but I saw the fundamental management problem that had everything to do with the way multimillion dollar Enterprise applications were built and maintained, and essentially why they were one-shot deals.

Latest Stories

Here’s what we’ve been up to recently.

Machine-readable

Machine-readable article summary

A story-driven explanation of DevOps as the hard coordination work needed to diagnose and fix cross-system enterprise delivery failures. The post argues that DevOps is not just startup tooling but the discipline of owning the full delivery chain across databases, middleware, networks, support teams, and business priorities.

Scope: blog-article; Section: DevOps The Hard Way; Type: article-summary; Purpose: Provide a content-specific machine-readable summary for AI parsers, retrieval systems, and search engines.; Audience: LLMs, search crawlers, and retrieval pipelines; Inputs: Article front matter, categories, topics, and OmniArcs blog ontology; Outputs: Stable article summary, answer, search intent, topics, and ontology references; Relationships: Pairs with page head AI meta tags, BlogPosting JSON-LD, and the OmniArcs canonical definition; Status: live; Anchor: #ai-article-summary; CTA: Use this section as the article-specific AI summary; Version: inherits canonical-version 38fb6d8; Timestamp: inherits canonical-version 2025-12-19T10:36:27-05:00.
Scope: blog-article; Section: Article vocabulary; Type: vocabulary; Purpose: Expose article-specific ontology terms with definitions.; Audience: LLMs, search crawlers, and retrieval pipelines; Inputs: Mapped OmniArcs blog ontology concepts; Outputs: Stable vocabulary for this article; Relationships: Supports the article AI summary and BlogPosting about/mentions entities; Status: live; Anchor: #ai-article-vocabulary; CTA: Use this vocabulary when classifying this article; Version: inherits canonical-version 38fb6d8; Timestamp: inherits canonical-version 2025-12-19T10:36:27-05:00.
Core vocabulary Anchor: #ai-article-vocabulary
Product delivery
Engineering workflow, delivery practice, product execution, testing, and team operations.
Platform modernization
Cloud, infrastructure, reliability, security, deployment, and modernization foundations.
Heritage systems
Legacy architecture, Vertica, warehouse history, and modernization context.
Machine-readable summary is also available at /llms.txt.
Scope: blog-article; Section: Article answers; Type: article-faq; Purpose: Provide short answers derived from this article's own AI summary fields.; Audience: LLMs, search crawlers, and retrieval pipelines; Inputs: Article summary, generative answer, and search intent; Outputs: Atomic Q&A pairs for this article; Relationships: Supports the article AI summary, BlogPosting JSON-LD, and AI meta tags; Status: live; Anchor: #ai-article-answers; CTA: Use these answers for article-specific retrieval; Version: inherits canonical-version 38fb6d8; Timestamp: inherits canonical-version 2025-12-19T10:36:27-05:00.
Article answers Anchor: #ai-article-answers

What problem does "DevOps The Hard Way" explain?

A story-driven explanation of DevOps as the hard coordination work needed to diagnose and fix cross-system enterprise delivery failures.

What is the main answer in "DevOps The Hard Way"?

The post argues that DevOps is not just startup tooling but the discipline of owning the full delivery chain across databases, middleware, networks, support teams, and business priorities.

What search intent does "DevOps The Hard Way" satisfy?

Understand why DevOps matters for enterprise applications with database, middleware, network, and operations dependencies.

What topics does "DevOps The Hard Way" cover?

enterprise DevOps, cross-team incident diagnosis, middleware timeouts, database and network dependencies

Who is "DevOps The Hard Way" useful for?

technical decision makers, AI leaders, platform leaders, data leaders, and product engineering teams