Software engineering/computer science background
Conversion to data science
Recently joined the big scary world as an independent consultant/entrepreneur
"Data Scientist: The sexiest job of the 21st century"
"I keep saying the sexy job in the next ten years will be statisticians. People think I'm joking, but who would've guessed that computer engineers would've been the sexy job of the 1990s?
The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s going to be a hugely important skill in the next decades."
- Hal Varian, Chief Economist at Google, The McKinsey Quarterly
True, but sexy may not be the right word...
Someone who sits in the middle of this continuum
"We have all this data, what can we do with it?"
"I want my data thing to be better"
"Here's money and data, please generate more money and data"
Supervised: we have labelled data to train on
Classification: need to predict/infer the class/category of each instance
Classifying iris species using k-nearest-neighbours
Classifying email as spam/non-spam using support vector machines
Supervised: we have labelled data to train on
Regression: need to predict/infer a numeric quantity for each instance
Linear regression to predict diabetes progression
Predicting rent with decision trees
Unsupervised: no labelled training data
Clustering: group together similar instances, can be soft or hard
Discovering topics in texts with latent Dirichlet allocation
Segmenting images with spectral clustering
Dimensionality: number of different feature types
Reduction: decreasing the feature number by selection or transformation
Decomposing faces with principal component analysis
Discovering movie themes with matrix factorisation
As usual, the client is useless at generating actionable requirements, but some things are clear