Stochastic Data: Gateway to AI

by Dr. Sam L. Savage

Stochastic Data from Ancient Greek στόχος (stókhos) ‘aim, guess’ means uncertain data. But wait a minute, all data is uncertain.

That’s my point!

AI can make statistical sense out of uncertainty. Visit our webpage Gateway to AI where we have posted 8 short videos on what I call the Stochastic Data Cycle.

Coherent Stochastic Data

AI is trained on Stochastic Data, and AI can produce Stochastic Data. But before that data may be used in subsequent calculations it must be converted to a SIP (Stochastic Information Packet). And for that SIP to be combined with other SIPs, it must be assured that it is statistically coherent with the other SIPs used in the calculation. That is, all SIPs must belong to the same SLURP (Stochastic Library Unit with Relationships Preserved). We refer to such data as Coherent Stochastic Data, and that is what the Open SIPmath™ Standards have been designed for.

You Decide

Imagine that you asked AI to roll a die one million times. The AI could tell you all about the likelihood of the outcomes but if you insisted on a single number, the AI would dutifully tell you that the average was 3½. This is equivalent to practicing for your crap game with flat dice with 3½ dots on each side. So to summarize:

AI is trained on Stochastic Data.

AI can output Stochastic Data if you have a place to store it and a way to use it.

The Open SIPmath Standard offers both.

Copyright © 2024 Sam L. Savage

Stochastic Data for Project Planning

A PolyBlog featuring research by Dr. Sam Savage and his former student Pace Murray, a View from the Trenches from an even more former student, Jimmy Chavez, and the announcement of a new book by Doug Hubbard.

Illustration from Construction Cost Overruns: Reference-Class Forecasts on Steroids by Pace Murray and Sam Savage in Phalanx Magazine.

Extracting SIPs from Historical Data

By Pace Murray and Dr. Sam L. Savage

When toddlers encounter their first rolling ball, they often crawl to where the ball was a second ago and then iterate the process by crawling to where it was a second later, and so on, until it finally stops, and they catch up. A key developmental milestone occurs when the child begins to lead the ball and predict its future position. Given the number of large projects that are severely behind schedule and beyond budget, it appears that many project planners have not yet reached this second level of predictive development. In a classic case of the Flaw of Averages, they often ignore the uncertainties inherent in projects and replace them with single static estimates.  But help is on the way.[1]

Kahneman and Flyvbjerg

Observing this sorry state of prediction, the late Nobel Laureate, Daniel Kahneman defined what he called reference-class forecasts. Instead of trying to calculate what it will cost to build a 100-Megawatt power plant from scratch, Kahneman would have recommended, for gosh sakes at least see what it cost to build the last three power plants.

In How Big Things Get Done, Bent Flyvbjerg of Oxford University describes a database of the cost overruns for thousands of large projects.[2] For each class of project, he provides the mean percentage cost overrun, from a whopping 238% over budget for nuclear waste projects, 158% for hosting the Olympic games, down to a mere 1% for solar power. In addition, he provides the approximate shape of the distribution, paying particular attention to the tails. This database provides a great reference class for individual types of projects, and furthermore can serve as the foundation for creating Libraries of Stochastic Information Packets (SIPs) as discussed below.

SIP Libraries for Construction

Stochastic Data can come in many forms and is as old as uncertainty. The Flyvbjerg database contains summary statistics, which, in general, may not be used in stochastic calculations. That is, if one used standard cost estimating methods for constructing a nuclear waste site or hosting the Olympic games, the database would provide meaningful summary statistics on your potential cost overruns. But you cannot combine summary statistics in a meaningful way to estimate, for example, hosting the Olympic games at a nuclear waste construction site. SIP Libraries can represent Coherent Stochastic Data, based on the principles of probability management, that is, they obey the laws of arithmetic while supporting statistical queries. 

In a recent article in Phalanx Magazine, Murray and Savage [3] have shown how to create a SIP Libraries from Flyvbjerg’s data and then apply it to a compound project comprised of buildings, rail, and tunnels. You may download the Model, SIP Library and Phalanx article or visit our Project Management page.

Pace Murray is an Army captain with 8 years of service in the Infantry. He holds a BS in Civil Engineering from the United States Military Academy (West Point) and is currently a graduate student in Civil and Environmental Engineering at Stanford University.

Dr. Sam Savage is Executive Director of ProbabilityManagement.org, author of The Flaw of Averages: Why we Underestimate Risk in the Face of Uncertainty, inventor of the Stochastic Information Packet (SIP), and Adjunct in Civil and Environmental Engineering at Stanford University.

A View from the Trenches

By Jimmy Chavez

Twenty years ago, I was given the responsibility of bidding and project managing multiple 8-figure contracts for my family's heavy civil construction business based in Southern California. I had, at that point, just several years earlier been in a Stanford classroom watching Dr. Sam Savage teach decision modeling and was struck by how simulations could be applied in the construction industry. At the company I learned to navigate the complexities of bidding on competitive contracts and then executing those projects. Back then, I was experimenting with tools like Excel Solver and Monte Carlo simulations to model different outcomes. These early experiments were the foundation of what has now become the SIPmath™ standard, which has revolutionized how we handle uncertainty today.

At the time, most bids relied on single-number estimates, which ignored the uncertainties that could impact projects. Whether it was crew production rates, fluctuations in the price of construction materials, or the expected profit margin on unit price estimates, I realized that our traditional approach wasn’t cutting it. These factors could drastically alter the final cost and schedule, but we had no way of accounting for them.

Monte Carlo simulations changed all of that. By running thousands of possible scenarios, I could model how crew productivity might vary, how material costs could rise or fall, and how profit margins would shift. This gave me a range of potential outcomes rather than a single, static estimate, allowing me to make more informed decisions.

Fast forward to today. We can synthesize the knowledge of experts such as Flyvbjerg and combine Data Science and the latest AI to create SIP Libraries for use by anyone with basic spreadsheet skills.  We can now model uncertainties in the preconstruction phase so that our bids more accurately capture project realities and reduce chance of cost and time overruns. Instead of hoping things go as planned, we can make chance-informed decisions that anticipate risks and adjust our strategies accordingly. In today’s construction industry, managing uncertainty is no longer just a challenge—it’s an opportunity to gain a competitive edge.

Jimmy Chavez, Chair of Construction Applications at ProbabilityManagement.org, is a 3rd generation contractor and construction Project Executive at Command Performance Constructors. VP of Operations and Division Lead for federal contracting. Experienced in construction project management and probabilistic estimating.

How to Measure Anything in Project Management

Announcing an upcoming book by Doug Hubbard

Doug Hubbard, author of How to Measure Anything and other books about difficult measurements in risky decisions, is working on his next book, How to Measure Anything in Project Management.[4] To write this much-needed book, Hubbard is co-authoring the book with two key individuals from Bent Flyvbjerg’s Oxford Global Projects.  Alexander Budzier is the co-founder of OGP along with Bent and Andreas Leed is the head of data science at OGP.  Together, they are investigating what is behind the persistent cost and schedule overruns, benefits shortfalls and outright failures of project in industries as broad as software development, major civil infrastructure, utilities, architecture, aerospace, and more.  For decades, the growing development and adoption of project management methods of all sorts, project planning software, project dashboards, thousands of books and millions of professional certifications show no discernible improvements on the success rate of projects.  Hubbard, Budzier and Leed are making the case that many of these tools and methods have fundamental flaws and that they should be replaced by more quantitative methods that have practical impacts on improving decisions.  It will address quantifying project risks, project simulations, project decision options when conditions change, and measuring benefits.  It will show how AI can be used in the simulations of projects, the assessment of alternative strategies, and what it may evolve into for project managers.  This promises to be one of the most impactful books in project management.

 [1] Kahneman, Thinking Fast and Slow

 [2] Flyvbjerg, How Big Things Get Done

 [3] Phalanx

 [4] How to Measure Anything in Project Management. https://www.wiley.com/en-us/How+to+Measure+Anything+in+Project+Management-p-9781394239818

Gateway to AI – Videos from the Chance Age

By Dr. Sam L. Savage 

My Latest Album Will Drop at Risk Awareness Week

October 8, 2024

Stochastic Data: Gateway to AI

CHANCES* Consortium for Natural Hazards

Taking the Chances out of AI

*Conveying Hazards And Catastrophes through Extracted Simulations

Like David Foster Wallace’s fish who had no clue what water was in spite of being immersed in it, many of us have a similar lack of awareness of being immersed in AI. And AI, in turn, is immersed in stochastic data, that is, uncertain data. But isn’t all data uncertain? Exactly. That’s my point. The discipline of probability management allows us to store this data and more importantly do math with it.

• AI is trained on Stochastic Data

• AI can output Stochastic Data if you have a place to store it and a way to use it.

• The Open SIPmath™ Standard offers both and is the only such standard of which I am aware.

My three videos stress that probability management is really about a new category of data, not new tools. I believe that data categories may be defined in terms of the operations which may be performed on it and the queries which may be made of it.

For example, Numeric Data may be operated on by the Arithmetic (accent on the third syllable when used in this context) operators +, -, * and /. The queries are >, < or = relative to some other piece of numeric data. Stochastic Data that obeys the principles of probability management may also be operated on with +, -, * and /, but instead of simple inequalities, it supports statistical queries such as the chance of a data element being greater or less than some target, or a percentile or statistical average.

As another example, Audio Data may be operated on with a mixer, to combine various tracks into a finished piece of music. The only query is Listen or not.

Speaking of Audio Data, my last album, which dropped in 1999, was called Exponential: Music from the Analog Age. For those who want to learn more, or execute the Listen query, read on.

Exponential Liner Notes

In the early 1970's after I had abandoned traditional Management Science, but before I had discovered spreadsheets, I tried unsuccessfully to be a folksinger in Chicago.

There were two things that dissuaded me from a career in music. First, there were a lot of people who were a lot better than I was, and second, they weren’t making it either. During this period, I did some recording on a Sony 4 track reel-to-reel tape deck with my stepbrother John Pearce (who is still an active musician).

I found the decades old tapes in my garage in 1999 and discovered to my amazement that there were still magnetic signals on them. All the recordings were between 15 and 20 years old at that time. Some pieces, like the patient who has been frozen in liquid nitrogen until a cure is found for his disease, awoke to a world in which they could be substantially improved. The tempo of the title track, Exponential, for example, was sped up digitally without changing the pitch. Click here to listen to the full album.

Sam Savage (left) and John Pearce in the early 1980s in Sun Valley, Idaho

Copyright © 2024 Sam L. Savage

Our History in the Journal of Portfolio Management

By Dr. Sam L. Savage 

The prestigious Journal of Portfolio Management has just published a special issue in memory of Harry Markowitz, and I was honored by an invitation to contribute. I invited my old friend Ben C. Ball as my co-author. Together we applied Harry’s modern portfolio theory to petroleum exploration, as described in Ch. 28 of my book The Flaw of Averages, and changed the trajectory of my career.


The JPM article, written as a docudrama, chronicles meeting Ben in the1980s, Harry in the 1990s, and developing an application that I dubbed the Markowitzatron in the 2000’s while consulting to Shell. The work at Shell was really the dawn of the discipline of Probability Management.

An on-line version of the article is available here. You need to enter your name and email to gain access, then search for “Markowitzatron” to be taken to the article.

 

Harry Markowitz, 1927-2023

Sam Savage

Ben C. Ball

Copyright © 2024 Sam L. Savage

Foster a Dog: Get a Call Option on Love

By Dr. Sam L. Savage 

At ProbabilityManagement.org our ultimate goal is to assist people in dealing with uncertainty. In this context, optionality is of great benefit in reducing downside risk. For example, fire insurance is really an option to sell your house to your insurance company at market value, even if it burns to the ground. A call option lets you purchase a stock after the fact, if it goes up, but limits your losses if it goes down. Recently, optionality played a key role in an emotional decision that my wife and I had to make.

Losing Rosey

Recently we tragically and unexpectedly lost our one-year-old dog, Rosey. She had destroyed most of our furniture (thank goodness it was old), ripped out the entire sprinkler system in our back yard (so we no longer have a lawn), and was 50% Husky, which explained both the seductive blue eyes and also her aloof nature. But we loved her beyond words and were devastated by her loss.

Luckily, my wife and I were able to throw ourselves into our work, and eventually the grief faded to sadness and finally to empty spots in our hearts.

The Theory of Options: Harnessing Uncertainty

A call option lets you purchase a share of a certain stock at a certain price (the strike) for a certain period but does not compel you to do so. If the stock price is above the strike at the end of the period, you buy it below market value and cash in. If it is below the strike, you have the option to walk away, limiting your losses to the cost of the option. It is an investment with a potentially huge upside and little downside. See, for example, Ch. 25 of The Flaw of Averages.

The Theory of Dogs: Unnatural Selection

It goes without saying that evolution depends on the probability of an organism getting its genes into the next generation. This, in turn, depends on both the probabilities of reproducing and survival. So how does evolution support an animal that has a neon sign on its butt that screams “Come Eat Me” from 50 yards? What this loses on the survivability side, it makes up for on the reproduction side, by attracting members of the opposite sex, as any female peacock will confirm.

But domesticated species don’t need to appeal to the opposite sex. They need to appeal to their domesticators who arrange their marriages.

So, why did we unnaturally select Rosey from the rescue puppies? Along with her blue eyes, it was because of her cute markings, with a white blaze on her face, white chest, white feet and a white tip on her tail. Well, if this color scheme attracts humans, why don’t they breed horses and cows with Rosey’s décor? They do, of course. 

 
 
 

Finding Daisy
It was less than two months from the tragedy and my wife wasn’t ready to have another dog. But I wasn’t ready to not have a dog. What seemed like an insurmountable problem was that neither of us had the time or energy to train another puppy.

I began to surf the web sites of local dog pounds and saw Daisy on the San Jose animal shelter site. She had a white blaze on her face, white chest, white feet and a white tip on her tail and was listed as about 2½ years old. We had only had puppies in the past and I had no idea if we could bond with a fully grown dog. But I swiped right.

The site also indicated that we could foster her with the option to adopt. This reduced the downside to the cost of a couple of trips to the animal shelter. My wife still wasn’t ready to go with me, and if she had we probably would have come back with six dogs. A caring docent introduced me to Daisy who was very energetic and playful for an adult dog. She was also more obedient when asked to sit or lie down than any dog we had trained. The docent told me that it took dogs three days to decompress, three weeks to learn your routine, and three months to feel like they are home. This was not the case. Her nickname is Crazy Glue because she bonded instantly. We haven’t done a DNA test yet, but instead of Husky aloofness she displays a Pitbull’s strong cuddling instinct. She is a 60 lb. plug who filled a 50 lb. hole in our hearts. And of course we exercised our option on love and signed the adoption papers right away.

What did the Presidential Debate have to do with Probability? Everything!

By Dr. Sam L. Savage 

As I, like 51 million others watched the debate the other night, I suddenly realized there was something even more important to be watching: the prediction market reactions. Prediction markets, although not without potential problems, react instantly and reflect where people are putting their own money. They are much faster than polls and potentially more accurate.

The left graph displays real time “Presidential Win” probabilities and market volume for the top six people, candidates or not for a 24-hour period starting four hours before the debate. The right graph displays 30 days ending eight days after the debate for Trump, Biden and Harris, which shows the longer term impact of the debate.

Learn more about prediction markets and how my father’s work help lay the foundation of the prediction markets at my Medium Post.

What are the Chances of Finding Gold While Saving Our Grid?

Powering Tomorrow: Why Our Grid Needs Chance-Informed Decision-Making

By Dr. Sam Savage, Executive Director

and

Daniel Krashin, Chair of Renewable Energy Applications

ProbabilityManagement.org

What’s Not to Like

What’s not to like? Free energy from the sun and wind have arrived just as the demand for power skyrockets to feed our electric cars, and giant data centers toiling away on the production of crypto currency and artificial intelligence.[i] The good news is that there is plenty of renewable energy to go around. The bad news is that it can only be used if it arrives at the right time at the right place.

The Right Time

The “right time” problem is that renewable energy is generated at random times determined by Mother Nature. Solar power generated at midday does not help power air conditioners in the evening when it’s still hot. This problem can be solved fairly quickly through energy storage with battery systems of different sizes.

The Right Place

The “right place” problem will take more time.

According to NPR,

“So many people want to connect their new solar and wind projects to the grid right now that it's creating a massive traffic jam. All those projects are stuck in line: the interconnection queue.”[ii]

With carefully planned expansion of our power grid this problem will be conquered. However, it will take a while to re-engineer our power grid given that it could cost as much as the combined value of Google and Amazon, plus the need for thousands of skilled electrical engineers. In the meantime, we must figure out how to keep our grid running smoothly, ensuring it delivers stable, clean, and affordable energy.

The Right Communication of Uncertainty

What do all these challenges have in common? Uncertainty. In fact, from long term investment decisions to hourly operational decisions, chance-informed analysis of uncertain power demand and prices will separate the winners from the losers and the neighborhoods with electricity from the neighborhoods with outages. How will such uncertainties be communicated and managed? Our bet is SIP Libraries that take advantage of the discipline of probability management.[iii] For investment decisions they may be updated monthly, for operational decisions they might be updated by the minute.

The Right Analysis

Many of the required analytical techniques such as portfolio and option theory have already evolved in finance and are even more applicable here. Why? The renewable energy market is still young and inefficient, providing significant opportunities for capitalizing on investments, especially in energy arbitrage.

What will the correct analysis accomplish?

  • Identify locations with the highest potential for long-term profitability.

  • Optimize installed capacity to maximize return on investment (ROI).

  • Construct financial plans that accelerate the achievement of the break-even point.

  • Demonstrate the impact of policy changes on outcomes.

  • Provide transparent analysis of the effects from climate disasters or system failures.

Want to learn more?

Download Models at ProbabilityManagement.org - Renewable Energy


References:

[i] https://www.washingtonpost.com/business/2024/03/07/ai-data-centers-power/

[ii] https://www.npr.org/2023/05/16/1176462647/green-energy-transmission-queue-power-grid-wind-solar

[iii] https://en.wikipedia.org/wiki/Probability_management

Copyright © 2024 Sam L. Savage

FAIR Meets SIPmath

By Sam L. Savage

John Button of Gartner, Eng-wee Yeo of Kaiser Permanente, and I have published a three-part blog series at the FAIR Institute: Part 1, Part 2, Part 3.

We were inspired by Eng-wee’s use of SIP Libraries at Kaiser, to integrate their risk and investment models. In 1952 the late father of Modern Portfolio Theory, and co-founder of ProbabilityManagement.org, Harry Markowitz, showed us that risks and returns have inevitable tradeoffs and cannot be considered in isolation.

The open SIPmath™ Standard provides a means to easily network together stochastic simulations of all sorts, including risk and investment simulations.

The FAIR™ (Factor Analysis of Information Risk) Ontology is a construct to account for and measure the effectiveness of controls against cyber risk. It plays a role analogous to Generally Accepted Accounting Principles (GAAP).

The open SIPmath™ Standard expresses uncertainties as data structures called SIPs that obey both the laws of arithmetic and the laws of probability. That is, you may perform arithmetic operations on two SIPs to generate a third SIP representing the result of the uncertain calculation. In effect they play the role of the Hindu/Arabic Numerals of Uncertainty.

So, FAIR meets SIPmath is like accounting meets numbers, a good idea all around.

I hope you enjoy the blogs and the downloadable SIPmath models that accompany them.

Copyright © 2024 Sam L. Savage

The Three R’s of The Chance Age

Recognize, Reduce, Respond

By Dr. Sam L. Savage

Just as Readin’, ‘Ritin’, and ‘Rithmetic were the pillars of public education, as encouraged in the United States in the early 1800’s, the Chance Age will require its own foundational elements.

I offer you Recognize, Reduce, and Respond.

Recognize

Those who do not recognize uncertainty run afoul of the Flaw of Averages or worse. 

I used to think there was nothing worse than representing uncertainties as single numbers until I saw it done with colors.

With the advent of the discipline of probability management, once you recognized a set of uncertainties, you could store them as auditable data (SIPs and SLURPs) that preserved statistical coherence. This unlocked the arithmetic of uncertainty the way Hindu/Arabic numerals unlocked standard arithmetic. Yet even today many professionals do not realize that uncertainties may be added, subtracted, multiplied or divided.

Reduce

In general, reducing uncertainty is a good thing. Forecasts of future prices, costs and demands usually include something called the Standard Error, which is an indication of how much uncertainty remains. If you come up with a way to consistently forecast tomorrow’s stock prices with a lower standard error than anyone else’s forecast, congratulations. You will be the richest person who ever lived.

This subject is guided by the Theory of the Value of Information as discussed in Ch. 15 of my book on the Flaw of Averages and also by Doug Hubbard’s Applied Information Economics.

One important exception to the value of reducing uncertainty is in the area of stock options, in which the value goes up with the uncertainty of the underlying stock price. How cool is that? To survive in the Chance Age, you need to understand the power of options as discussed in Ch. 25, Options: Profiting from Uncertainty, and Ch. 30, Real Options in the Flaw of Averages.

Respond

When you’re uncertain, don’t just stand there, do something! But you must do something that explicitly recognizes the uncertainty you face. Making rational decisions in the face of uncertainty is the realm of Decision Analysis, in which my father, Leonard Jimmie Savage played a role. Decision analysis came of age before the widespread use of computers and assumed relatively simple yes/no decisions of the form, “Do I buy an umbrella in the face of a 15% chance of rain tomorrow?” Shortly after I became an Adjunct in Stanford University’s School of Engineering, I took a class in Decision Analysis from Professor Ron Howard. It was not rocket-science, but life altering in its simple applicability as I describe in Ch. 14 of the Flaw of Averages.

Harry Markowitz, who credits my dad for indoctrinating him with rational expectation theory at point blank range at the University of Chicago, made a Nobel Prize winning contribution on how to respond to uncertainty. His famous efficient frontier displayed rational choices of portfolios of stocks with uncertain returns for investors with any risk appetite. The discipline of probability management is indebted to Harry for his generous efforts in getting 501(c)(3) nonprofit ProbabilityManagement.org off the ground.

With the rapid evolution of computers, vastly more complex decisions could be made with thousands of variables using the methods of Stochastic Optimization. I applied this response to the uncertainty of oil exploration at Royal Dutch Shell in 2006. As a matter of fact, stochastic programming employs arrays of coherent Monte Carlo trials that are mathematically equivalent to the SIPs and SLURPs of probability management. At first, all probability management did was to come up with standard formats for these arrays so they could be shared between applications, including corporate databases used for merely recognizing uncertainty. So, in a sense, probability management is just stochastic optimization without the optimization.

Conclusion

As your organization enters the Chance Age you will hopefully work your way through the three R’s. Nearly every enterprise stands to benefit in some way or another from recognizing, reducing, and responding to uncertainty.

Copyright © 2023 Sam L. Savage

In Memory of Harry Markowitz

By Dr. Sam L. Savage

August 24, 1927 - June 22, 2023

 

It is with deep sadness that I announce the passing of Harry Markowitz, Nobel Laureate in Economics, father of Modern Portfolio Theory, and co-founding Board member of ProbabilityManagement.org, in San Diego on June 22. Harry’s obituary published by the New York Times can be found here.

Harry truly started the war on averages in the early 1950’s at the University of Chicago. He read the academic literature of the time which specified that investment decisions should be based on the average value of the assets. But he knew that averages did not take risk into account. For example, if you hijack an airliner, ask for $1 billion and have one chance in 1,000 of getting away with it, your average return is a cool $1 million, but count me out. 

So, Harry introduced another dimension that measured risk, forming the risk/return plane. He then showed how to create an optimal set of investments based on the covariance between stocks, called the efficient frontier. Any investment on the frontier was rational depending on your risk appetite. Anything to the right of the frontier was nuts because there were investments to the northwest that had both a higher average return and lower risk. Anything to the left was mathematically impossible, which, in fact, led to the detection of fraudster, Bernie Madoff.

When I told Harry that the chapter about him in my book, The Flaw of Averages, was called the Age of Covariance, he started singing Age of Aquarius. That was quintessential Harry, and I am choked up thinking about it.

Harry studied at the University of Chicago with both Milton Friedman and my father, Jimmie Savage, but I only met him by chance in the mid-1990s, and we hit it off. It was gratifying to show him how we had applied his efficient frontier concept at Shell in 2005. The article and model on this application may be found here.

In 2012, when the Microsoft Excel data table became powerful enough to support the discipline of probability management, Harry generously and eagerly agreed to help Michael Salama (Lead Tax Counsel of Walt Disney) and me in founding ProbabilityManagement.org. He even offered his office in San Diego as the venue for our first organizational meeting in May of 2012 (see photo below).

Harry’s passing triggers not only sadness but also deep gratitude for his generosity.  Without Harry, ProbabilityManagement.org would not have gotten off the ground. He will be greatly missed by us and many in the probability management community.

Copyright © 2023 Sam L. Savage