Advertisment

Ethics of data scientists more important than theft-privacy issues

Kunal Jain, CEO & Founder of Analytics Vidhya, talks about what it takes to be a data scientist in today’s day and age.

author-image
Sunil Rajguru
New Update
business charts commerce computer

Kunal Jain, CEO & Founder of Analytics Vidhya, talks about what it takes to be a data scientist in today’s day and age and how data is key in an ever-changing world.

Advertisment

What are the major applications of data science in today's day and age?

Data Science is already impacting our daily lives in more ways than we actually think. Evolution of Google Search is basically an application of data science, your smartphones improving low light pictures and improving your photography is an application of data science. I would actually go ahead and say that it is very difficult to imagine a world without applications of data science now. Imagine the number of posts happening on LinkedIn or Facebook or number of articles being published on the web—but we are able to find the relevant posts only because of data science and machine learning. We would actually be lost in today’s world without these technologies assisting us. From doctors getting help from machines to Artificial Intelligence being used in Defence—the impact of data science is actually far deeper than we actually realise.

What exactly is a data scientist and from what all diverse fields can one apply to become one? What are the major skills that a data scientist needs to have?

Advertisment

In very simple terms: A data scientist is a person who can apply scientific tools and methods to extract insights and value hidden in the data. Now you might be wondering that this is something which has been happening for a long time now and this is true. But with Big Data coming in and the amount of data growing exponentially—we need specialists to handle this huge data at fast speed for benefit of the business. This is what data scientists are expected to do.

Given this context, people from Computer Science, Mathematics, Engineering, Economics and Computational Applications become the most common choice for roles of data scientists. But, the field is not limited to them. Some of the best data scientists I know of have actually come from very diverse fields like Astronomy, Physics etc.

Now coming to the skills required for a data scientist, you basically need three kinds of skills to be a good data scientist…

Advertisment
  • First of all, you need good technical skills. You need to understand the maths behind these techniques and be able to code them. So you need to have coding skills and mathematical skills.
  • Next you need to understand the business and subject domain you are in. What are the processes to collect the data? What are the rules and regulations? Unless you know these details, you cannot truly solve business problems as a data scientist.
  • Finally, you need soft skills to interact with different stakeholders in an organization, influence them and communicate with them to implement your data science projects.

How are Artificial Intelligence-Machine Learning techniques helping in analyzing data?

As the amount of data getting generated is increasing exponentially, it is no longer feasible to analyse this data like we did a decade before. You need machines which can look at the latest trends and learn from this data automatically. This is where AI-ML techniques come into play and have been of great help.

Advertisment

There are tonnes of data that are either unstructured or unread, what are the latest techniques to bring that data to the surface and analyze it?

First and foremost I would like to clarify the concept of unstructured and unread data. Unstructured data is actually a relative term. For example, if a bank is storing all the customer information in a table in a database, a letter from the customer including images and videos would be unstructured data. On the other hand, for a social media platform, this would be structured data. Similarly, an unread mail in your inbox is a data point for your service provider telling your interest in that area.

Now, in the last few years, there has been tremendous development in techniques to analyse images, video, voice and free-flowing text. All of these formats were traditionally called unstructured data. Today you have Computer Vision and Natural Language Processing techniques to make use of this data. There is a lot of active research and development happening in this field currently

Advertisment

One report said that there will be as many 500 billion IoT devices by 2030. Are we ready to handle this huge data explosion?

I think this is an ongoing evolution. Since the last decade, I don’t think we have been ever ready for the data explosion which is happening. But we have improved significantly from where we started. I think the coming decade would also be like this to a large degree. We will never be prepared for this scale and velocity of data but we will figure out ways to extract more and more information from this data. New algorithms, better hardware and software would be developed and invented to handle this need. This is actually one of the reasons why this domain is so exciting.

Data privacy and data theft are two major concern areas. How serious are they and what are the major things we are doing to secure them?

Advertisment

Both data privacy and data theft are top concerns for any data science professional. Building machine learning and data science algorithms is a lot like raising a child. You obviously need to ensure that the kids are safe, but you also need to ensure that the kids grow with the right values and be a good citizen. The same applies to data science. While theft and privacy are definitely important, making sure that data science professionals have ethics of using data is much more important.

The first step to address these challenges is to make sure that the data science professionals, as well as the end-users, are aware of the implications of the data and these are called out explicitly. Some of the pro-active and responsible organisations have started building and adapting these practices. They have built a team of reviewers who are responsible to make sure that the data is collected and used in the right manner.

How will advances in hyper scale-quantum computing help in the way we collect, organize and analyze data in the future?  

I think quantum computing is still in very nascent stages and to be very honest I would not claim that I understand the implications for a hyper-scale quantum computing system. It is like talking about the potential of today’s supercomputers 20 years back. Having said that I think quantum computing would be transformational in the way we store and analyse data. For example, the way we encrypt data today, may not be secure with the advent of quantum computing. Similarly, new techniques to extract different types of insights from data based on quantum computing will come to the foray. Combined with today’s tools and techniques, these will enable us to extract more information from the data in shorter timeframes.

Advertisment