De meeste tijd van datawetenschappers gaat op aan het klaarmaken van data voor de analyse. Specialisten sleutelen aan programma's om datasets te manipuleren, zodat ze juist verwerkt worden. Deze IT'ers voelen zich data-schoonmakers in plaats van dataspecialisten.
Computerworld selecteert hier interessante artikelen uit het internationale netwerk van onze uitgever IDG.
That's according to CrowdFlower, a crowdsourcing company, which surveyed 80 data scientists with varying levels of experience. While an advanced degree is usually required for the position, a full 60 percent of respondents said they spend most of their time cleaning and organizing data, leaving little for analytical tasks like building training sets and refining algorithms.
"You have your hardest-to-hire resource spending most of their time cleaning data," said Lukas Biewald, CrowdFlower's cofounder and CEO. "It's a humongous waste for organizations."
Cleaning and organizing data, as it turns out, is also data scientists' least favorite part of the job, according to more than half of CrowdFlower's respondents. That makes for an unhappy combination, but data scientists remain undaunted: More than 80 percent said they're happy at work.
CrowdFlower's findings also confirm that there's a shortage of data scientists in the business world. In last year's survey, 79 percent of respondents said there was a shortage; this year, that figure was up to 83 percent.
Want to land a data scientist job for yourself? The most in-demand skills, according to CrowdFlower, are SQL, Hadoop, Python, Java, R, Hive, MapReduce, NoSQL, Pig and SAS.
Coming up next is machine learning, which was singled out as especially important by more than half of the respondents CrowdFlower surveyed. "Over the last couple of years every CEO has been asking, 'what's our big data strategy?'" Biewald said. "They need to start asking about machine learning."