Powerfully Creating OpenAI, the Company

This is an excerpt from an article posted on The 41st Square.

OpenAI is something to keep an eye on and already teaching a powerful way to create and staff a company in a hyper competitive environment.
This mission is helping attract and retain desirable talent. One recount states that 9 of 10 researchers that were offered positions at OpenAI accepted and turned down excessively lucrative counter offers from their existing employers. It appears they did this because of the mission. Not because of the money.

Singularity University Exponential Data Talk

Yesterday I had the honor to present at Singularity University's Mountain View campus to speak about Data, Data Science and Big Data. I spoke to and met many of the 100 participants of the Executive Program. People come from all over the world for this program.

I had a great time sharing my Exponential Data presentation. Afterwards we had lunch and many excellent conversations!

Stephanie from http://www.thechrysalissolution.com/ was awesome as always and produced this lovely illustration from my talk.

Singularity University Executive Program - Exponential Data Lecture Illustration

Singularity University Executive Program - Exponential Data Lecture Illustration

My May 2016 lecture on big data and data science at the May 2016 Singularity University executive program.

The Exponential Data talk is my take on the nature of data, its impact on organizations and the people within them. All companies are data companies and need the processes and technologies to adapt. The Data Science Operations framework presented within the talk will help any company begin the process of deriving meaningful insights from data.

This lecture included:

  • Information on the nature of data itself and several key attributes that make data special
  • My framework that describes how to integrate and operate with data in your organization
  • Several real world examples of using data to save lives, heal the planet and generate immense economic value

With the right data, processes and technologies companies can create immense leverage for people. Data is power.

Exponential Data Implementation Framework

Exponential Data Implementation Framework

Three Things Observed in the Field That Help Make You a Data Scientist

An interesting and little understood fact about being or becoming a Data Scientist is that almost anyone can be a data scientist.

Like most professions there are different types and skill levels involved. Some are more hardcore in areas like math, software engineering or information design. Others are more day to day business. The traits that make it far more likely that someone may adopt the title or work of a data scientist regardless of their specific role are as follows. I tend to find these traits in the people I like to work with the most.

Scientific Process Approach

The scientific method reigns supreme in real data science. It's quite different to do data science that it is to do software engineering. Generally, when you build software to a specific goal it gets better and better and better until you can release. When you do data science, sometimes no matter how hard you work on your models you get the end and it's a dead end and you simply have to discard the results and go back to the beginning.

Have (and be able to keep) Beginner’s Eyes

To minimize the impact of bias as much as a human being can is extremely challenging. Some may even argue that it is impossible. Therefore, if you are approaching a data science problem it's very useful to do so from a beginner's mindset. Do not already know that you know the best algorithm. Hypothesis that you might then test. Do not already know the answer before you have even asked the question. Try to look at each opportunity to apply data science with fresh eyes.

Ability to Communicate Results Effectively

No matter how amazing the model you train. No matter how eloquent the solution you devise. No matter at all. No one cares. They care about results. They care if they can understand you. Becoming a master of the visual display of information could not be more critical to the efficacy of your results over time. This might be a pretty chart or graph. It might be a highly complex mobile or web application. Your output might be a powerpoint presentation. Whatever the delivery medium it pays huge dividends to invest in the consumability of the output.

Will Tensorflow Serving Ease Data Science Operational Pain?

One of the more exciting things I heard lately was Google continuing to open source more and more of the TensorFlow ecosystem with the release of TensorFlow Serving. 

TensorFlow
Http://www.tensorflow.org

From their site… TensorFlow™ is an open source software library for numerical computation using data flow graphs.

TensorFlow Serving
https://tensorflow.github.io/serving/

From their site… TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.

It is exciting news to see these types of releases and advances. The operationalization of data science is very challenging for any organization. Managing model builds, versions, deployment and maintenance is extremely challenging technically and procedurally within the organization. These products go part of the way to helping improve this state of affairs and are a welcome addition to the data science tool chain.

Level of Effort

When you try to estimate and then track how much time a software engineering project and individual tasks within it will take. You need an agreement for that on your team. For years I have used something I made up when I worked at an Agency because story points didn't work well for my needs. Later, that was revised by the engineering team at Ekho to be easier to understand and use. I thought the method might be useful to others so I'm posting it here. This is nice for estimations and simple, since you can just take anything > 10 and divide by 10 to get number of sprints. This can be translated easily to calendars, gantt charts, etc.

Each request that flowed into our issue tracking system and got assigned to a sprint was assigned an LOE when it was accepted or estimated. This gave the team the ability to gauge if they were on track as well and make reasonable promises about due dates. I would generally say that once your team is good at using this scale to estimate you can reliability predict the data of a project by +/- 5% and that isn't easy.

Any single feature or job that scores in that high range of 8-10+ really should be carefully analyzed before assignment and every attempt made to break it down into smaller pieces. Sometimes, to do big things, there will be big multi-sprint LOE's. But, that should be rare in most software development projects.

Level of Effort (LOE) SCALE

This scale assumes one week sprints. It's easy to adapt to different length sprint schedules.

LOE 1 = 1/2 day for one Full Time Employee (FTE)

0 - A quick fix

1 - Some work that will take about one half(1/2) day for one FTE

2 - Some work that will take about one day for one FTE

4 - Some work that will take between about 1/2 sprint for one FTE (about 2-3 days)

8 - Some work that will take a little longer than 1/2 sprint (around 3-4 days)

10 - Some work that will take 1 full sprint (i.e. 1 work week)

10+ anything over 1 sprint (1 week) (i.e LOE 20 = 2 weeks, LOE 30 = 3 weeks)