DATA SCIENCE WORKFLOWS

Blog post

Data Science

Mohamed Ben Hamoudine | 16th Dec 2020 | Read Time 5 mins

INTRODUCTION

As we discussed in our last articles, Data Science is a “recent” industry term and trend. As the industry is in its infancy, this likely means you will have some difficulties defining the right team setup and project tools for your new “data science” projects. In this blog, we will begin to talk about the best practices that you can use. At Zifo, our data science experts are here to provide you with help and solutions for your Data Science projects.

Regarding your project structure, we recommend that you set up auto-documentation processes and a retrospective of the good and bad points at the end of the project. One of the main mistakes we see is focusing on technologies and tools rather than proactively evaluating and optimising algorithms and methods.

This focus on outcomes will be one of the main differences between a company that leads in AI/ML from those that “just play”. It is a great aim to create a “perfect model”, but the fact is this will limit the impact and benefits. Nevertheless, it is best to keep in mind how your information quality and capability bring you the most business value – in essence, we need a balance.

We see nine main steps when working with a Data Science project:

UNDERSTAND >> AQUIRE >> CLEAN >> EXPLORE >> MODEL >> EVALUATE >> COMMUNICATE >> DEPLOY >> MONITOR

WHAT ARE THE MAIN IDEAS BEHIND THE “WORKFLOW”

The first step that is vital and must not be skipped for every project is understanding the question being posed and the objective of the question by asking … you guessed it – lots of questions. Without a deep understanding of the domain, context, data landscape and expectations, we risk missing the point and not delivering.

Since we are working on a Data Science project, we are more often willing to obtain the maximum amount of data to help build our analysis.

Once we have the maximum amount of data, the next challenge is to prepare the data for our purpose. Not all data is equivalent – some may be of high quality, FAIR in all aspects (Findable, Accessible, Interoperable, and Reusable), but life is not that simple. You will need support from data engineers and analysts to gather the data from various systems. Do not expect all data to be available – you may need to change the business processes to collect the data required.

Once we have the data collected, how do we get an overview of our data? It is by exploration; this involves building various visualisation methods and using simple statistics to get an excellent holistic perspective of your dataset. Having this global idea of our dataset can help us tune parameter(s) of the chosen model.

SOME THINGS TO REMEMBER

The construction of the model should be an incremental process, which starts with a simple model that adds complexity if needed after reviewing metric evaluations (the more complexity you add, the more your model will “cost” (time and C/GPU) you to perform).

As the model evolves, communicate results in many ways, like writing reports of different steps and plots that show metrics results coming with a demonstration with a test dataset, perhaps run workshops to help educate and gather feedback from your business partners. This direct interaction allows the teams to understand what is being done, and what is needed next time and develops more in-depth organisational knowledge and capacity.

Now that we have confirmed our model can deliver robust and reliable results, we might want to deploy it on the cloud or directly included it in an application running on any eligible device. Choosing where and how needs to be thought through and planned – ensuring the organisation is ready, and change management has been used to help smooth the deployment. Part of this is done a few steps back – engaging with the business with interactive sessions during model development.

Finally, we have now delivered this super, exciting solution’s development and deployment, we must facilitate updates and incorporate future changes easily. This again needs planning and consideration. One benefit of models and data science is as you collect more data and new insights, they can be incorporated into the model to enhance its efficiency, accuracy, and value.

To conclude, the backbone of data science in your company should be maintaining an agile and collaborative environment and researching technical topics for future challenges that you might confront.

To find out more about how Zifo can help with AI, ML, Deep Learning, and scientific Data Sciences, please email us directly at info@zifornd.com

Request for your individual training!

Registration form: Empower Diode Array Option

Registration form: Processing

Registration form: Empower Reporting

Registration form: Empower Basic Training

Registration form: Custom Fields

Registration form: Empower Diode Array Option