Hurray! We made it to the third lesson on Data Science. At this point, I’ll implore us not to be bored. I promise the next post will dwell more on the coding part of Machine Learning.
What is the final deliverable and what format should it have?
Before one jumps into analytics and working with data, one has to ask the first and most important questions. When all the entire data analysis is done, we have some answers and at this point, we need to ask what format are we presenting the answers to the stakeholders. You have to ask how your solution will be presented which is what is called the Final Deliverable. It could be a report, a powerpoint presentation, or a snippet of code. It could be in any form but the very basic question that is needed before analyzing is What format will the final deliverables be? Imagine you didn’t do this and you embark on the analytics and later discovered your solution is not properly formatted, this means going back and starting all over again, this is why it’s really important to know the format before starting to analyze your code.
Can the deliverable be in any format?
For every budding data scientist, you have to make the story telling a compelling one to those you are presenting to.The quality of the deliverables depends on how finely refined your questions are else the solution or result you find will suffer. So yes, it can be in any format but one that’s spectacular, fascinating and solution oriented. Personally, I will also suggest your kind of audience should play a part on how your presentation looks.
What is the second step of getting the final deliverable?
Before we begin to analyze data, we need to look into resources available, resources in terms of data sets, survey, questionnaires and available reports. Based on a quick scan of the resources, we would then be able to give a good summary of what we can come up it which is essential before analyzing. The second step is to know what answers are possible given what resources are available.
Is the third step in the process the actual Analytical work?
Well, Sad news; the answer is NO. Why not? As a data scientist, you have to be efficient, proficient and should not be the one busy re-inventing the wheel. Given that we have the internet and these online resources available, it is very IMPORTANT that we do not re-invent the wheel and write same code somebody else has written and made available. I’ll personally like to call this ESME(Eliminating Stupid Mental Effort). So the next step as a Data Scientist after you have gotten the questions and answers is to decide the appropriate tool or algorithm or method to check if it’s readily available for use, if not, try to check if others have tried to attempt what you want to do in order to avoid repeating others.
The Google Scholar is actually great for making research on papers, journals, publications and also relevant reports from industry leaders and at the same time with blog aggregators which is also popular among data scientist where you can search for algorithms, code or tools.
What is the last step in the process of getting the final deliverable?
Now we are ready, we know the answers to generate, we know the tools to use and so the first step to your analytics should be very basic tabulations and see how distributions are, how frequent data set exists and trends are by department, age, education, etc. At the same time, while you are developing these basic tabulations, cross tabulations, also try to generate graphics, scatter plots or bar charts. Why? Because you will see that different data sets with different distributions may give you the same average value or the same standard deviation. So the first process is to start with simple process and tool to develop a level of familiarity with the dataset. The more you are familiar, the more you will be quick to apply advanced algorithms and tools.
What are the considerations to present the final deliverable?
After analyzing, tabulating and now you’ve gotten your model. The next thing is to write a report and after writing, you want to make a final check; if you’ve done a good or compelling story, etc. Below are the seven major points to consider or ask yourself?
- Have you told your readers at the very outset what they might gain by reading your report? i. what benefits they stand to gain from your report? what is the question? what is the answer? and how they can gain from it?
- The second question is have you made the aim of your work clear? the purpose?
- The significance of the contribution. i.e the significance of your findings and it’s effectiveness.
- Have you set your work in the appropriate context?
- Have you addressed the question of practicality and usefulness? Your solution/findings have to be useful, you have to explain how your solution is solving problems.
- Have you identified future developments?
- Have you structured your report in a clear and logical fashion?
End to end scenario demonstrating the process and getting the final deliverable and presenting it
So this brings us to the end of today’s lesson but before that, I’d like to bring up a case study manner where we are going to be applying all the above-listed steps.
My case study is Nigeria since it’s a developing country with data limitations. So imagine this scenario where you are hired to work on a study on the rate of school drop-out and the likely cause.
The very first step is to think of the format we are presenting the data and the second is to take look at the question and refine it to see if this study is being measured for girls only drop-outs, boys only drop outs, primary school drop-outs, high school drop-outs or maybe Universtiy drop outs.
The second phase; what answers can be generated based on the available resources. Remember you’re in a developing country where authentic data may or may not be available.
After the second stage of being able to predict the kinds of answer you can derive from the available resources, you move on to making research for already existing solutions and see how others have found answers to this. You need to look at other answers that might have been provided by other people within Nigeria or other developing countries to see how you can apply that method to yours.
The fourth step is analytics, this is where you need to start with basic regulations and graphics.
The final stage is to put your model together and presenting it in the respective format. Afterward, you can ask the questions; what is my story here? what is the most compelling story? what are the most significant answers I found? what is the significance of my findings to the rate of school drop-outs in this country? how can I help my firm with this information? how can I help my country with this result? how does it come to solve and reduce the rate of school drop outs in my country? All these questions are powerful enough to make your analytical work efficient, effective, solution-oriented and accepted.
I hope this case study has been helpful enough to take you through the steps required for you as a Data Scientist. If not clear, please feel free to shoot your questions via the comments section.
To get prepared for the next set of tutorials, I’d advise we learn python because we are no more talking theory but PRACTICAL!!! Till then…Ciao