What is Data Science

The term “Data Science” has become very popular in recent times. However, there is no clear and universally accepted definition of the term.

I intend to reflect my views on “Data Science” here. In my opinion

—”Anything and everything pertaining to data is in the purview of data science”

This includes, but is not limited to,

  • Identifying data
  • Generating data/ collecting data
  • managing data
  • using data
  • enabling data usage
  • and so on…

More on this will follow.

Interpreting Probability

Over several years of experience of teaching probability, it has been observed again and again that students find it very difficult to interpret probability correctly. I believe this is mainly due to the following reason.

In teaching process as well as in evaluation process, the ability to ‘compute probability’ is given more importance than the ability to ‘understand probability’. This is possibly because evaluating ‘the understanding of probability’ is more challenging. Students are also unfortunately more interested in marks than the knowledge.

Having said this, probability interpretation , however, has remained debatable for decades. Before I give my views on that here are some excerpts ..

‘Interpreting probability’ is a commonly used but misleading characterization of a worthy enterprise. The so-called ‘interpretations of probability’ would be better called ‘analyses of various concepts of probability’, and ‘interpreting probability’ is the task of providing such analyses. Or perhaps better still, ….

– http://plato.stanford.edu/entries/probability-interpret/


The word probability has been used in a variety of ways since it was first coined in relation to games of chance. Does probability measure the real, physical tendency of something to occur, or is it just a measure of how strongly one believes it will occur? …

– http://en.wikipedia.org/wiki/Probability_interpretations


The single term probability can be used in several distinct senses. These fall into two main groups. A probability can be a limiting ratio in a sequence of repeatable events. Thus …

– From the book: Interpreting probability: Controversies and developments in the early twentieth Century, by David Howie; Cambridge University Press


This section considers two important interpretations of probability. …. And very intelligent people still disagree. So don’t expect this to be resolved by the present discussion.

– J. Pitman in his book Probability published by Springer.



‘Statistical Engineering’ is use

The term and the concept of Statistical Engineering is not new. However it has not received due attention that it deserves.

For example, The authors Roger W. Hoerl and Ron Snee of the book Leading Six Sigma: A Step-by-Step Guide Based on Experience with GE and Other Six Sigma Companies describe Statistical Engineering as 

“The statistical engineering discipline [is] the study of how to utilize the principles and techniques of statistical science for benefit of humankind. From an operational perspective we define statistical engineering as the study of how to best utilize statistical concepts, methods, and tools and integrate them with information technology and other relevant sciences to generate improved results. In other words, engineers—statistical or otherwise—do not focus on advancement of the fundamental laws of science but rather how they might be best utilized for practical benefit.”

  • The site http://statisticalengineering.net, that uses this term with the same meaning. However, it is a solution provider company and knowledge creation is not their purpose.
  • The site http://www.nist.gov/itl/sed/ of the Statistical Engineering Division of NIST also uses the term with same meaning, however their focus is on a specific domain.
  • There is also an Institute of Statistical Engineering” that again focuses on a specific domain and objective.

I propose that “Statistical Engineering” should be developed as a discipline with a wider objective to serve every domain (including Statistics !!)

Software Developer & Statistician

A Software developer and a Statistician, both play a similar role in the society.

  • Both of them are useful to almost every business domain
  • Both of them solve the problems for clients in other domains
  • Both of them need to understand the problem domain well in order to do develop a good solution
  • Both of them can specialize their services by focusing on specific domains
  • Every problem is essentially a new project for both of them
  • Both of them can benefit each other by collaborating

Statistical Engineering

I have always felt that statisticians have been applying Statistics more like an art. However, I strongly believe that the problem solving using statistics should be exercised with an engineering approach.

Engineering is all about developing a solution for a business problem. The developed solution is the engineered product.

If we take engineering approach to ‘applications of statistics’, the resultant discipline may be termed as “Statistical Engineering“.

Statistical Engineering ..

  • Forces you to think about the business problem being solved
  • guides you to follow a disciplined process for developing a ‘Statistical Solution’
  • Forces you to think about ‘Statistical solution’ as an engineered product, hence insist for the quality of Solution itself.
  • Helps you to think about the clients from business domain as customers of the developed product.
  • Forces you to pay attention to customer satisfaction and business relevance.

The discipline of Statistical Engineering needs to be developed by formalizing several years of experience of thousands of Statistics professionals who are into the profession of solving problems using statistics.