Drilling & Completions Reservoir Engineering Transfer

Gun Barrel Diagram: Calculate and Visualize Well Spacing Part 1


The Gun Barrel workflow allows the user to quickly find the 3D distances between the midpoints of the lateral section of selected nearby horizontal wells. This critically important information was once only possible to calculate using specialized geoscience software or through painstaking and time-consuming manual work. With the integrations, we can now calculate this information directly from Spotfire:

Figure 1: Gun Barrel View

Drilling & Completions Transfer

From Science Pads to Every Pad: Diagnostic Measurements Cross the Chasm

My energy finance career began in late 2014. As commodity prices fell from $100 to $35, I had a front row seat to the devastation. To quote Berkshire Hathaway’s 2001 letter, “you only find out who is swimming naked when the tide goes out.” It turned out that many unconventional operators and service businesses didn’t own bathing suits. My job was to identify fundamentally strong businesses that needed additional capital … not to survive, but to thrive in the new “low tide” environment. To follow Buffett’s analogy, I was buying flippers and goggles for the modest swimmers.

Following 18 months of carnage, commodity prices began to improve in 2016. Surviving operators had been forced to rethink their business models: pivoting from “frac ‘n’ flip” to “hydrocarbon manufacturing”. Over the following two years, I familiarized myself with hundreds of service companies and their operator customers. The entire industry was chasing two seemingly conflating objectives: 1) creating wells that are more productive 2) creating wells that are less expensive. This is the Shale Operator Dual Mandate: make wells better while making them cheaper.

Shale Operator Dual Mandate

As an oil service investor, I was uniquely focused on how each company I met helped their customers accomplish the Shale Operator Dual Mandate. Services that achieved one goal would likely survive, but not thrive. However, those that helped customers meet both goals were sure to be the winners in the “new normal” price environment of $50 oil.

Transformations happened quickly throughout the OFS value chain. Zipper fracs drastically improved surface efficiencies, ultralong laterals were drilled further than ever before, in-basin sand mines appeared overnight, and new measurements came to the fore. These new measurements deliver incredible insights: fracture half length, well productivity by zone, vertical frac growth, optimal perforation placement, and much more.

In Basin Sand

New Measurements

While zipper fracs, long laterals, and local sand have taken over their respective markets, new measurements have struggled to gain traction outside of “science pads”. Frustrated technical service providers bemoan the resistance to change and slow pace of adoption in our industry. These obstacles failed to slow the advance of zipper fracs, long laterals, and local sand … why have disruptive new measurement technologies been on the outside looking in?

Challenge #1: Unclear Economics

The first challenge for new measurements is unclear economics. Despite the recent improvements, unconventional development remains cash flow negative … and has been since its inception.

The above data suggests ~ $400B of cash burn since 2001 … small wonder operators are wary of unproven returns on investment! (Note: to be fair, operators were incentivized by capital markets to outspend cash flow for the great majority of this period. Only recently has Wall Street evolved its thinking to contemplate cash on cash returns, as opposed to NAVs).

For any technology to become mainstream, it must either immediately lower costs (e.g. zipper fracs, local sand) or have obvious paybacks (e.g. long laterals). New measurements, by contrast, do not clearly map to economic returns. Instead, these service providers tend to focus on “interesting” engineering data and operational case studies. Operators will not put a technology into wide use until its economic impact is fully understood. This can mean waiting months for offset wells to come online or years for neighboring operators to release results.

Challenge #2: Changing How Customers Work

The second challenge, which is just as important to end users, is that service providers must deliver insights within a customer’s existing workflow. Operators are busier than ever before. E&P companies have experienced waves of layoffs, leaving those remaining to perform tasks previously done by now-departed colleagues.

In addition, many service providers don’t appreciate the opportunity cost of elongating an existing customer workflow to incorporate new variables. A smaller staff is already being asked to perform more work per person; it should be no surprise that customers are hesitant to allocate budget dollars to perform even more individual work.

Challenge #3: No Silver Bullets

While each new diagnostic data type is an important piece of the subsurface puzzle, no single element can complete the picture on its own. Instead, each measurement should be contextualized alongside others. For example, fiber optic measurements can be viewed alongside tracer data to better determine which stages are contributing the most to production. When each diagnostic data source is delivered in different medium, it becomes nearly impossible to overlay these measurements into a single view.

The Oxbow Theory

The combination of the above factors leads to the “Oxbow Theory” of new measurement abandonment. As you may know, as rivers age, certain sections of the river meander off course. Over time, sediment is redistributed around the meander, further enhancing the river’s bend. Eventually, the force of the river overwhelms the small remaining ‘meander neck’, and an oxbow lake is created. Sediment deposited by the (now straight) river prevents the oxbow lake from ever rejoining the river’s flow. By the same token, new measurement techniques that do not cater to existing workflows may be trialed but will not gain full adoption. Instead, they become oxbow lakes: abandoned to the side of further-entrenched workflows.

Our Solution: is the only analytics platform designed for oil and gas. If you’re an operator, we can help make sense of the tsunami of data delivered by a fragmented universe of service providers. If you’re a service company, we can help deliver your digital answer product in a format readily useable by your customers. Please reach out to to learn more.

Drilling & Completions Transfer

Extracting and Utilizing Valuable Rock Property Data from Drill Cuttings

Why the Fuss About Rock?

The relentless improvement in computing speed and power, furthered by the advent of essentially unlimited cloud-based computing, has allowed the upstream oil and gas industry to construct and run complex, multi-disciplinary simulations. We can now simulate entire fields – if not basins – at high resolution, incorporating multiple wells within highly heterogeneous structural and lithological environments. Computing power is no longer the limitation; we’re constrained by our input data.

Massive volumes of three-dimensional seismic data and petrophysical logs can be interpreted and correlated to produce detailed geological models. However, populating those models with representative mineralogical, geomechanical, and flow properties relies on interpolation between infrequent physical measurements and well logs. This data scarcity introduces significant uncertainty into the model, making it harder to history match with actual well performance and reducing predictive usefulness.

Physical measurements – to which wireline logs are calibrated and from which critical correlation coefficients are determined – are typically made on samples taken from whole cores. Some measurements can be performed on side wall cores collected after the well has been drilled using a wireline tool. However, intact side wall cores suitable for making rock property measurements aren’t always retrieved, and the sampling depth is relatively inaccurate compared to the precise drilling of test plugs from a whole core at surface.

The drilling, retrieval, and analysis of a full core adds significantly to well construction time and can cost anywhere from $0.5 to $2.0 million. In today’s cash-constrained world, that’s a tough ask. Core analysis has become faster and cheaper, but cores are still only collected from less than one percent of wells drilled. This produces a very sparse data set of rock property measurements across an area of interest.

Overlooked Rock Samples

Drill cuttings are an often-overlooked source of high-density rock samples. Although most wells are mud logged for operational geology purposes, such as picking casing points or landing zones, the cuttings samples are frequently discarded without performing any further analysis.

Geochemical analysis at the wellsite is sometimes used to assist with directional drilling. However, detailed characterization typically requires transporting the samples to a central laboratory where they can be properly inspected, separated from extraneous material, and processed to ensure a consistent and representative measurement.

At Premier, we encourage our clients to secure and archive cuttings samples from every well they drill. Even if the cuttings won’t be analyzed right away, they represent a rich, dense sample set from which lateral heterogeneity can be measured and used to fill in the gaps between wells where whole core has been cut and evaluated.

Figure 1: The images above show how rock properties can be collected, analyzed, and visualized from cuttings.

Rapid, high-resolution XRF measurements and state-of-the-art x-ray diffraction (XRD) can be used to match mineralogical signatures from cuttings to chemofacies observed in offset cores.

Information from cuttings samples expedited from the wellsite for fast-turnaround analysis can be used to optimize completion and stimulation of the well being drilled. For example, geomechanical properties correlated with the chemofacies identified along a horizontal well can be used to adjust stimulation stage boundaries. The objective is for every fracture initiation point within a stage to encounter rock with similar geomechanical characteristics, increasing the probability of successful fracture initiation at each point.

On a slower timeline, a more detailed suite of laboratory measurements can be completed, providing information about porosity, pore structure, permeability, thermal maturity, and hydrocarbon content.

The Cuttings Motherlode

In late 2017, Premier Oilfield Group acquired the Midland International Sample Library – now renamed the Premier Sample Library (PSL), which houses an incredible collection of drill cuttings and core samples dating back to the 1940s. The compelling story of how we saved it from imminent demise is the subject of another article. Suffice to say the premises needed some TLC, but the collection itself is in remarkable condition.

Figure 2: Original sample library at the time it was acquired by Premier.

The library is home to over fifty million samples from an estimated two hundred thousand wells – most of them onshore the United States, and many within areas of contemporary interest like the Permian Basin. It gives us the ability to produce high-density rock property datasets that can be used to reduce the uncertainty in all manner of subsurface models and simulations.

New samples are donated to the library each week, many of them from horizontal wells. This provides invaluable insight into lateral facies changes and reservoir heterogeneity. Instead of relying only on the sparse core measurements discussed earlier, our 3D reservoir models can now be populated with superior geostatistical distributions conditioned with data from hundreds of sets of horizontal well cuttings. In an ideal world, cuttings samples from every new well drilled would be contributed to the collection, preserving that information for future generations of explorers and developers.

We have already helped many clients study previously overlooked intervals by reaching back in time to test samples from wells passing through to historically more prolific horizons. Thanks to their predecessors’ rock squirreling habits, these clients have access to an otherwise unobtainable set of data.

Our processing team works around the clock to prepare and characterize library samples, adding more than 2,000 consistently generated data points to our database each day. Since it would take years to work through the entire collection, we have prioritized wells that will give us broad data coverage across basins of greatest industry interest. Over time, guided by our clients’ data needs, we will expand that coverage and create even higher-density data sets.

Share and Apply the Data

At Premier Oilfield Group, we believe that generating and sharing rock and fluid data is the key to making more efficient and more effective field development decisions. The Premier Sample Library is a prime example of that belief in action.

Following an intense program of scanning, digitization, well identification, and location, we are now able to produce a searchable, GIS-based index for a large part of the collection. Many of the remaining boxes are hand-labeled with long-since-acquired operating companies and non-unique well names but forensic work will continue in an attempt to match them with API-recognized locations.

We are excited to have just launched the first version of our datastak™ online platform. This will finally make the PSL collection visible to everyone. Visitors can see detailed, depth-based information on sample availability and any measurements that have already been performed. Subscribers gain access to data purchase and manipulation tools and, as the platform develops, we will add click-through functionality to display underlying test results and images.

Subscriptions for datastak™ cater to everyone from individual geologists to multi-national corporations. We want the data that’s generated to be as widely available and applicable as possible.

Rock properties available through datastak™ provide insights during several critical workflows. These often require integrating rock properties with other data sets, such as offset well information, pressure pumping data, and historical production. Many of the data types generated through cuttings analysis can already be brought into® and made readily available for engineers and geologists. This provides a rich data set, ready for analytics.

For example,® can be used to build machine learning models that tease apart the effects of completions design and reservoir quality on well performance. Premier and continue working collaboratively to identify additional data types and engineering workflows that will help ground advanced analytics with sound geologic properties.

Armed with a consistent, comprehensive set of rock property data, developers will be in a position to separate spurious correlation from important, geology-driven causality when seeking to understand what drives superior well performance in their area of interest. Whether that work is being carried out entirely by humans or with the assistance of data science algorithms, bringing in this additional information will enable more effective field development and increase economic hydrocarbon recovery.

For further information on Premier Oilfield Group, please visit

Data Science & Analytics Drilling & Completions Transfer

Demystifying Completions Data: Collecting and Organizing Data for Analytics (Part 3)

As mentioned in my previous post, in order to really be of value, we need to extend this analysis to future wells where we won’t have such a complete data set. We need to build multi-variate models using common, “always” data – like pump curves, geologic maps, or bulk data. Our approach has been for engineers to build these models directly in Spotfire through a side panel we’ve added but save these models back to a central location so that they can be version controlled and accessed by anyone in the organization. They can quickly iterate through a variety of models trained on our data set to review the model performance and sensitivity.

If we have access to historical information from previous wells, we can run our model on a variety of data sets to confirm its performance. This could be past wells that had micro seismic or where we knew there were issues with containment. Based on these diagnostics we can select a model to be applied by engineers on future developments. In order to make sure the model is used correctly we can set fences on the variables based on our training set to ensure the models are used appropriately. Because the models are built by your team – not a third-party vendor – they know exactly what assumptions and uncertainties went into the model. This approach empowers them to explore their data and answer questions without the stigma of a black-box recommendation.

Figure 1: Your team builds the models in Petro. ai – not a third-party vendor – so you know exactly what assumptions and uncertainties went into it. This approach empowers you to explore your data in new ways and answer questions without the limitations of black-box recommendations.

However, in addition to fences, we need to make sure engineers understand how and when to apply the models correctly. I won’t go into this topic much but will just say that the direction our industry is moving requires a basic level of statistics and data science understanding by all engineers and geologists, because of this has incorporated training into our standard engagements.

Slightly different hypothesis

This example used a variety of data, but it only answers one question. It’s important to note that even slight variations in the question we ask can alter what data is needed. In our example, instead of asking if a specific frac design would stay within our selected interval, we wanted to know if the vertical fracture length changed over time, we would need a different data set. Since micro seismic is a snapshot in time we wouldn’t know if the vertical frac stays open. A different data type would be needed to show these transient effects.

Data integration is often the biggest hurdle to analytics

We can start creating a map to tie back the required data needed for the questions we are interested in answering. The point of this diagram shown here is not to demonstrate the exact mapping of questions to data types, but rather, to illustrate how data integration quickly becomes a critical part of this story. This chart shows only a couple questions we may want to ask, and you can see how complicated the integration becomes. Not only are there additional questions, but new data types are constantly being added; none of which add value in isolation – there is no silver bullet, no one data type that will answer all our questions.

Figure 2: Data integration quickly becomes complicated based on the data types needed to build a robust model. There is no silver bullet. No single data type can answer all your questions.

With the pace of unconventional development, you probably don’t have time to build dedicated applications and processes for each question. You need a flexible framework to approach this analysis. Getting to an answer cannot take 6 or 12 months, by then the questions have changed and the answers are no longer relevant.

Wrap up

Bringing these data types together and analyzing them to gain cross-silo insights is critical in moving from science to scale. This is where we will find step changes in completions design and asset development that will lead to improving the capital efficiency of unconventionals. I focused on completions today, but the same story applies across the well lifecycle. Understanding what’s happening in artificial lift requires inputs from geology, drilling and completions. empowers asset teams to operationalize their data and start using it for analytics.

Three key take ways:

  • Specific questions should dictate data collection requirements.
  • Data integration is key to extracting meaningful answers.
  • We need flexible tools that can operate at the speed of unconventionals.

I’m excited about the progress we’ve already made and the direction we’re going.

Data Science & Analytics Drilling & Completions Transfer

Demystifying Completions Data: Collecting and Organizing Data for Analytics (Part 2)

As promised, let’s now walk through a specific example to illustrate an approach to analytics that we’ve seen be very effective.

I’m going to focus more on the methodology and the tools used rather than the actual analysis. The development of stacked pay is critical to the Permian as well as other plays. Containment and understanding vertical frac propagation is key to developing these resources economically. We might want to ask if a given pumping design (pump rate, intensity, landing) will stay in the target interval or break into other, less desirable rock. There are some fundamental tradeoffs that we might want to explore. For example, we may break out of zone if we pump above a given rate. If we lower the pump rate and increase the duration of the job, we need to have some confidence that the increase in day rates will yield better returns.

We can first build simulations for the frac and look at the effects of different completions designs. We can look at offset wells and historical data – though that could be challenging to piece together. We may ultimately want to validate the simulation and test different frac designs. We could do this changing the pumping schedule at different stages along the lateral of multiple wells.

Data collection

With this specific question in mind, we need to determine what data to collect. The directional survey, the formation tops (from reference well logs) and the frac van data will all be needed. However, we will also want micro seismic to see where the frac goes. Since we want to understand why the frac is either contained or not we will also need the stress profile across the intervals of interest. These could be derived from logs but ideally measured from DFITs. We may also want to collect other data types that we think could be proxies to relate back to the stress profile, like bulk seismic or interpreted geologic maps.

These data types will be collected by different vendors, at different times, and delivered to the operator in a variety of formats. We have bulk data, time series data, data processed by vendors, data interpreted by engineers and geologists. Meaningful conclusions cannot be derived from any one data type, only by integrating them can we start to see a mosaic.


Integrating the data means overcoming a series of challenges. We first need to decide where this data will live. Outlook does not make a good or sustainable data depository. Putting it all on a shared drive is not ideal as it’s difficult to relate. We could stand up a SQL database or bring all the data into an application and let it live there but both have drawbacks. Our approach leverages which uses a NoSQL back end. This provides a highly scalable and performant environment for the variety of data we will need. Also, by not trapping the data in an application (in some proprietary format) it can easily be reused to answer other questions or by other people in the future.

Getting the data co-located is a start but there’s more work to be done before we can run analytics. Throwing everything into a data lake doesn’t get us to an answer and it’s why we now have the term “data swamp”. A critical step is relating the data to each other. takes this raw data and transforms it using a standard, open data model and robust well alias system; all built from the ground up for O&G. For example, different pressure pumping vendors will have different names for common variables (maybe even different well names) that we need to reconcile. We use a well-centric data model that currently supports over 60 data types and exposes the data through an open API. also accounts for things like coordinate reference systems, time zones, and units. These are critical corrections to make since we want to be able to reuse as much of our work as possible in future analysis. Contrast this approach with the one dataset – one use case approach where you essentially rebuild the data source for every question you want to ask. We’ve seen the pitfalls of that approach as you quickly run into sustainability challenges around supporting these separate instances. At this point we have an analytics staging ground that we can actually use.

Interacting with and analyzing data

With the data integrated we need to decide how users are going to interact with the data. That could be through Matlab, Spotfire, python, excel, or PowerBI. Obviously, there are trade-offs here as well. Python and Matlab are very flexible but require a lot of user expertise. We need to consider not only the skill set of the people doing the analysis, but the skill set of the those who may ultimately leverage the insights and workflows. Do only a small group of power users need to run this analysis, or do we want every completions engineer to be able to take these results and apply them to their wells? We see a big push for the latter and so our approach has been to use a combination of custom web apps we’ve created along with O&G specific Spotfire integrations. Spotfire is widespread in O&G and it’s great for workflows. We’ve added custom visualizations and calculations to Spotfire to aid in the analysis. For example, we can bring in the directional surveys, grids, and micro seismic points to see them in 3D.

Figure 4: enables a user friendly interface, meeting engineers where they are already working with integrations to Spotfire and web apps.

We now have the data merged in an open, NoSQL back end, and have presented that processed data to end users through Spotfire where the data can be visualized and interrogated to answer our questions. We can get the well-well and well-top spacing. We can see the extent of vertical frac propagation from the micro seismic data. From here we can characterize the frac response at each stage to determine where we went out of zone. We’re building a 360 view of the reservoir to form a computational model that can be used to pull out insights.

In the third and final post of this series, we will continue this containment example and review how we can extend our analysis across an asset. We’ll also revisit the data integration challenges as we expand our approach to other questions we may want to ask while designing completions.


Database, Cloud, & IT Production & Operations Transfer

Real-time Production in using Raspberry Pi

One of the most pressing topics for data administrators is “what can I do with my real-time production data?”. With the advent of science pads and a move to digitization in the oilfield, streaming data has become one of the most valuable assets. But it can take some practice and getting used to.

I enjoyed tinkering around with the platform and while we have simulated wells: it’s much more fun to have some real data. doesn’t own any wells but when the office got cold brew, I saw the opportunity.

We would connect a Raspberry Pi with a sensor for temperature to the cold brew keg and pipe temperature readings directly into the database. The data would come in as “casing temperature” and we’d be able to watch our coffee machine in real-time using!

The Plan

The over diagram would look like this:

The keg would be connected to the sensor and pass real-time information to the Raspberry Pi. Then it would shape it into the real-time schema and publish to the REST API endpoint.

Build out

The first step was to acquire the Raspberry Pi. I picked up a relatively inexpensive one off Amazon and then separately purchased two temperature sensors by Adafruit. It read temperature and humidity, but for the moment we’d just use the former.

There’s enough information online to confirm that these would be compatible. After unpacking it, I setup an Ubuntu image and booted it up.

The Script

The script was easy enough, the Adafruit came with a code snippet and then for the endpoint, it was a matter of picking the right collection to POST to.

[code language="python"]
import time
from multiprocessing import Pool
import os
import datetime
import requests
import csv
from pprint import pprint
import argparse
from functions import get_well_identifier, post_new_well, post_live_production, get_well_identifier
import sys
import Adafruit_DHT

while True:
freq_seconds = 3
wellId = 'COFFEE 001'
endpoint = 'RealTimeProduction'

pwi = '5ce813c9f384f2057c983601'

# Try to grab a sensor reading.  Use the read_retry method which will retry up
# to 15 times to get a sensor reading (waiting 2 seconds between each retry).
humidity, temperature = Adafruit_DHT.read_retry(Adafruit_DHT.DHT22, 4)

# Un-comment the line below to convert the temperature to Fahrenheit.
temperature = temperature * 9/5.0 + 32

if temperature is not None:
casingtemp = temperature
casingtemp = 0

post_live_production(endpoint, pwi, 0, casingtemp, 0, 0, 0, 0, 0, 0, PETRO_URL)
print((wellId + " Tag sent to " + PETRO_URL + endpoint + " at "+"%Y-%m-%d %H:%M:%S")))


Once connected, we were extremely pleased with the results. With the frequency of readings set to 3 seconds, we could watch the rising and falling of the temperature inside the keg. The well was affectionately named “COFFEE 001”

Data Science & Analytics Drilling & Completions Transfer

Demystifying Completions Data: Collecting and Organizing Data for Analytics (Part 1)

The oil and gas industry collects a huge amount of data trying to better understand what’s happening in the subsurface. These observations and measurements come in a range of data types that must be pieced together to garner insights. In this blog series we’ll review some of these data types and discuss an approach to integrating data to better inform decision making processes.

Before getting into the data, it’s important to note why every company needs a data strategy. Capital efficiency is now the name of the game in unconventionals. Investors are pushing for free cash flow, not just year over year increases in production. The nearby slide is from one operator but virtually every investor deck has a slide like this one. There are positive trends that operators can show – price concessions from service providers, efficiency gains in drilling, completions, facilities and increases in lateral length. Despite these gains, as an industry, shale is still not profitable. How much further can operators push these trends? How will this chart be created next year? Single-silo efficiencies are gone, and the next step change will only come from an integrated approach where the data acquired across the well lifecycle can be unlocked to fuel cross-silo insights.

Figure 1: Virtually every investor deck has a figure like this one. There are positive trends that operators can show– price concessions from service providers, efficiency gains in drilling, completions, facilities and increases in lateral length. Despite these gains, as an industry, shale is still not profitable. How much further can operators push these trends? How will this chart be created next year?

This is especially true in completions, which represent 60% of the well costs and touches so many domains. What does completions optimization mean? It’s a common phrase that gets thrown around a lot. Let’s unpack this wide-ranging topic into a series of specific questions.

  1. How does frac geometry change with completions design?
  2. How do you select an ideal landing zone?
  3. What operations sequence will lead to the best outcomes?
  4. What effect does well spacing have on production?
  5. Will diverter improve recovery?

This is just a small subset, but we can see these are complex, multidisciplinary questions. As an industry, we’re collecting and streaming massive amounts of data to try and figure this out. Companies are standing up centers of excellence around data science to get to the bottom of it. However, these issues require input from geology, geomechancis, drilling, reservoir engineering, completions, and production – the entire team. It’s very difficult to connect all the dots.

There’s also no one size fits all solution; shales are very heterogenous and your assets are very different from someone else’s, both in the subsurface and surface. Tradeoffs exist and design parameters need to be tied back to ROI. Here again, there are significant differences in strategy depending on your company’s strategy and goals.

Managing a data tsunami

When we don’t know what’s happening, we can observe, and there’s a lot of things we can observe, a lot of data we can collect. Here are some examples that I’ve grouped into two buckets: diagnostic data that you would collect specifically to better understand what’s happening and operational data that is collected as part of the job execution.

The amount of data available is massive – and only increasing as new diagnostics techniques, new acquisition systems and new edge devices come out. What data is important? What data do we really need? Collecting data is expensive so we need to make sure the value is there.

Figure 2: Here are some examples of diagnostic data that you would collect specifically to better understand what’s happening and operational data that is collected as part of the job execution.

The data we collect is of little value in isolation. Someone needs to piece everything together before we can run analytics and before we can start to see trends and insights. However, there is not standards around data formats or delivery mechanisms and so operators have had to bear the burden of stitching everything together. This is a burden not only for the operators, but also creates problems for service providers whose data is delivered as a summary pdf with raw data in Excel and is difficult to use beyond of the original job. The value of their data and their services is diminished when their work product has only limited use.

Thinking through an approach

A common approach to answering questions and collecting data is the science pad, the scope of which can vary significantly. The average unconventional well costs between $6 and 8M but a science pad can easily approach $12M and that doesn’t take into account costs of the time people will spend planning and analyzing the job. This exercise requires collecting and integrating data, applying engineering knowledge, and then building models. Taking science learnings to scale is the only way to justify the high costs associated with these projects.

Whether on a science pad or just as part of a normal completions process, data should be collected and analyzed to improve the development strategy. A scientific approach to completions optimization can help ensure continuous improvement. This starts with a hypothesis – not data collection. Start with a very specific question. This hypothesis informs what data needs to be collected. The analysis should then either validate or invalidate our hypothesis. If we end there, we’ve at least learned something, but if we can go one step further and find common or bulk data that are proxies for these diagnostics, we can scale the learnings with predictive models. Data science can play a major role here to avoid making far reaching decisions based off very few sample points. Just because we observed something in 2 or 3 wells where we collected all this data does not mean we will always see the same response. We can use data science to validate these learnings against historical data and understand the limits where we can apply them versus where we may need to collect more data.

In part 2 of this series, we’ll walk through an example of this approach that addresses vertical frac propagation. Specifically, we’ll dive into collecting, integrating, and interacting with the required data. Stay tuned!


Database, Cloud, & IT Transfer

Death by apps, first on your phone and now in O&G

How many apps do you use regularly on your phone? How many of them actually improve your day? What at first seemed like a blessing has turned into a curse as we flip through pages of apps searching for what we need. Most of these apps are standalone, made by different developers that don’t communicate with each other. We’re now seeing a similar trend in O&G with a proliferation in software, especially around analytics.

O&G has always been a data-heavy industry. It’s well documented that data is one of the greatest assets these companies possess. With the onset of unconventionals, both the types of data and the amount of data has exploded. Companies that best manage and leverage data will be the high performers. However, this can be a challenge for even the most sophisticated operators.

Data is collected throughout the well lifecycle from multiple vendors in a wide range of formats. These data types have historically been ‘owned’ by different technical domains as well. For instance, drillers owned the WITSML data, geo’s the well logs, completions engineers the frac van data, production engineers the daily rates and pressures. These different data types are delivered to the operator through various formats and mechanisms, like csv files, via FTP site or client portals, in proprietary software, and even as ppt or pdf files.

Each domain has worked hard to optimize their processes to drive down costs and increase performance. Part of the gains are due to analytics applications – either built in house or delivered as an SaaS offering from vendors – providing tailored solutions aimed at addressing specific use cases. Many such vendors have recently entered the space to help drillers analyze drilling data to increase ROP or to help reservoir engineers auto-forecast production. However, the O&G landscape is starting to look like all those apps cluttering your phone and not communicating with each other. This usually translates into asset teams becoming disjointed, as each technical disciple uses different tools and has visibility only on their own data. Not only is this not ideal, but operators are forced to procure and support dozens of disconnected applications.

Despite the gains achieved in recent years, certainly due in part to analytics, most shale operators are still cash flow negative. Where will we find the additional performance improvements required to move these companies into the black?

The next step in gains will be found in integrating data from across domains to apply analytics to the overall asset development plan. A cross-disciplinary, integrated approach is needed to really understand the reservoir and best extract the resources. Some asset teams have started down this path but are forced to cobble together solutions, leaving operators with unsupported code that spans Excel, Spotfire, Python, Matlab, and other siloed vendor data sources.

Large, big-name service providers are trying to build out their platforms, enticing operators to go all-in with their software just to more easily integrate their data. Not surprisingly, many operators are reluctant to go down this path and become too heavily dependent on a company that provides both their software and a large chunk of their oilfield services. Is it inevitable that operators will have to go with a single provider for all their analytics needs just to look for insights across the well lifecycle?

An alternative and perhaps more attractive option for operators is to form their own data strategy and leverage an analytics layer where critical data types can be merged and readily accessed through an open API. This doesn’t mean another data lake or big data buzz words, but a purpose-built analytics staging area to clean, process, blend, and store both real time and historical data. This layer would fill the gap currently experienced by asset teams when trying to piece their data together. provides this analytics layer but comes with pre-built capabilities so that operators do not need a team of developers working for 12 months to start getting value. Rather than an SaaS solution to one use case, is a platform as a service (PaaS) that can be easily extended across many use cases. This approach removes the burden of building a back end for every use case and then supporting a range of standalone applications. In addition, since all the technical disciplines leverage the same back end, there is one true source for data which can be easily shared across teams.

Imagine a phone with a “Life” app rather than music, calendar, weather, chat, phone, social, etc. A single location with a defined data model and open access can empower teams to perform analytics, machine learning, engineering workflows, and ad hoc analysis. This is the direction leading O&G companies are moving to enable the integrated approach to developing unconventionals profitably. It will be exciting to see where we land.

Data Science & Analytics Transfer

Multivariate modelling using the diamond dataset

Today’s post was created by data analyst Omar Ali.

We’ll demonstrate how to create a multivariate model using the well-known diamond dataset from Kaggle. For this project, we’ll be utilizing the new Models feature in Our most recent release makes it extremely easy to run predictive algorithms on any type of dataset. This tool is constantly being upgraded with added functionality and features per our customers’ feedback. That being said, let’s predict prices!

Before we begin, please download the diamond dataset from the Kaggle page here.

Before building a model, let’s first explore the data. We want to find all the blank values or anything that has an odd distribution first to make sure we’re going to build a model on clean data.

Some quick preprocessing shows us that there are some zero values in x, y, and z. The depth and table distributions should be fenced at the distribution to keep outliers from skewing the dataset, and there are no NULL values for our cut and color samples.

The next portion of this exercise will familiarize you with the new Models feature in the

Click on the PetroPanel icon in Spotfire and select your respective database. Test your connection, and if it works, click on the “Models (beta)” tab.

Once you’re in the Models (beta) tab, you should see the following:

Click on “New Model,” name your model, and click on “Create.” This will bring you the main tab where you can edit your inputs, choose a machine learning algorithm, and save and train a model.

My Model Options tab looks like this:

To predict prices, we’ll use a random forest with 50 trees and a 20% test set hold out.

As you scroll down, you will find the Model Inputs tab. In this tab, you will point the PetroPanel to your table, select your predictors, and assign them to either “Categorical” or “Continuous.” For continuous variables, you can “fence” your dataset to keep outliers from skewing your model. For categorical variables, hot coding is the default, but a lookup table can be built as well. Here’s my Model Inputs tab:

As you can see, I fenced the depth parameter from 55 to 69 to keep the outliers from skewing the dataset.

Finally, the “Model Outputs” tab is where we can choose what we’re predicting, also with the opportunity to fence. Here’s mine:

Once everything is ready, just click “Save.” Once you save, you can go back to the “Loaded Models” tab and should see the model populated there. All you need to do now is train the model and see your results! The predicted values for your dataset will be added to the original dataset. Below you can see a visualization of the price predicted by the actual price of the diamond.

The model result metrics can be found on the Suite under the “Machine Learning” tab. The Suite is independent of the Petro Panel within Spotfire and is fully functional on its own. From here, you can view the model inputs and outputs, the variable importance, and many other metrics that are required for model evaluation. Here’s what it looks like for the model I just trained:

As you can see, we have a model with 84% accuracy on the test set. A key feature of the tool is that the model is already stored in our database, so we can create a job that runs predictions with this model at whatever frequency we decide on. This means that if we have a model that predicts diamond prices, the Suite can put it into production and the database will store the results. So, if we get diamond data daily, we can load in the data and predict the price through the Suite. If you’re looking for a one-time prediction, you can also manually enter in values for your predictions and generate a result to see how the model functions. An example of this can be found below:

I hope you enjoyed my demo of the new Multivariate Modeling feature in and the functionality of putting machine learning into production through the Suite. If you have any questions or concerns, please comment below!

Developers Corner Transfer

Writing your First JavaScript Vue.js App for

Getting installed can be an exciting time and open quite a few doors for development, especially when it comes to JavaScript apps. Custom applications become a cinch using the API. In the coming weeks I’ll be putting together some simple applications that you can make on top of the platform. We’ll be using an assortment of languages to communicate with the API so feel free to ask for an example.

Here is the HTML

[code language="html"]

<h1>Hello, Wells!</h1>

<div id="hello-wells" class="demo">
  <blog-post v-for="well in wells" v-bind:key="" v-bind:title="">


And the JavaScript (Vue.js)

[code language="javascript"]
Vue.component('blog-post', {
  props: ['title'],
  template: '

{{ title }}


new Vue({
  el: '#hello-wells',
  data: {
    wells: []
  created: function () {
    var vm = this
    // Fetch our array of documents from the wells collection
      .then(function (response) {
        return response.json()
      .then(function (data) {
        vm.wells = data['data']

And poof! We’ve called the first 10 wells from the wells collection:

Hello, Wells!

ZEAL 4-25-46-26

What’s going on here is that the app is pulling directly from the server asynchronously. In the coming weeks, I’ll show how we can create reactive JavaScript applications that will update from the server so that we can watch things like rigdata or real-time production data. This data was provided by GeoLogic and we’ll be setting up a public instance for everyone to develop against.