Thursday, January 23, 2025
Thursday, January 23, 2025
- Advertisement -

Sourcing quality data is an “obstacle” for many technologists to create responsible AI

Must Read

- Advertisement -
- Advertisement -
  • Survey shows that the data used for AI models should be sourced responsibly, contain less bias, and remain highly accurate.
  • Business leaders and technologists report a significant gap in ideal vs reality in achieving data accuracy.
  • Covid-19 accelerated the AI strategy of enterprises and it is expected to continue to accelerate.

Data sourcing continues to be a major bottleneck for teams building artificial intelligence applications and many businesses are facing the challenges of trying to build great AI with poor datasets, an industry expert said.

“Many factors are in play like a lack of sufficient data for a specific use case, new machine learning techniques that require greater volumes of data, or teams don’t have the right processes in place to easily and efficiently get the data they need,”  Sujatha Sagiraju, Chief Product Officer at Appen, a leader in data for the AI Lifecycle.

According to the “State of AI “report conducted by the company, 93 per cent of respondents believe that responsible AI is the foundation of all AI projects while 42 per cent of technologists say the data sourcing stage of the AI lifecycle is very challenging.

However, business leaders were less likely to report data sourcing as very challenging (24 per cent).

As is true of 2021, Covid-19 played an important part in the continued growth that’s been witnessed in the industry.

Data diversity is important

Companies are no longer playing catch-up but are now planning for their next AI advances, albeit carefully, to ensure there won’t be a repeat in the future.

According to the report’s findings, 51 per cent of participants agree that data accuracy is critical to their AI use case.

To successfully build AI models, Sagiraju said that organisations need accurate and high-quality data.

“Unfortunately, business leaders and technologists report a significant gap in ideal vs reality in achieving data accuracy,” she said.

Appen’s research also found that companies are shifting their focus to responsible AI and maturing in their use of AI.

“More business leaders and technologists are focusing on improving the data quality behind AI projects to promote more inclusive datasets and, as a result, unbiased and better AI,” she said.

However, the report showed that 80 per cent of respondents stated data diversity is extremely important or very important, and 95 per cent agree that synthetic data will be a key player when it comes to creating inclusive datasets.

Solving the challenges

Mingkuan Liu, Vice-President of Data Science at Appen, said that the gap between data scientists and business leaders is slowly narrowing year over year when it comes to understanding the challenges of AI.

“The emphasis on how important data, especially high-quality data that match with application scenarios, is to the success of an AI model has brought teams together to solve these challenges,” he said.

For AI solutions to properly function, massive volumes of quality data are required to train the underlying neural networks.

A great example is multilingual natural language processing (NLP) which relies on millions of human speech inputs for each language prepared and delivered in formats ML models can ingest.

While 4 out of 5 of our survey respondents said that they have the right amount of data to support an AI project (81 per cent) and have access to the tools they need to do their AI-related job (90 per cent), the majority of them are still struggling with low data quality.

“This typically results in underperforming systems. This becomes even a bigger challenge when integrating multimodality in NLP or connecting multiple individual NLP solutions that support several languages and content types,” the report showed.

Using the right data at the beginning of the lifecycle drives greater results through later stages.

The average proportion of time spent managing and preparing data is trending down, on average 47.4 per cent time compared to 53 per cent in 2021.

Data management is major hurdle

“With a large majority of respondents using external data providers, it can be inferred that by outsourcing data sourcing and preparation, data scientists are saving the time needed to properly manage, clean and label their data,” Sagiraju said.

The report showed that the greatest hurdle for AI initiatives is data management, with 41 per cent indicating it as the biggest bottleneck.

Right behind, 39 per cent of respondents reported a lack of qualified talent–data scientists and technologists, data architects and engineers are scarce. 31 per cent indicated a lack of budget for adequate headcount, adding to the challenge of properly staffing data management teams.

“The shortage of qualified data scientists and technologists emphasises the importance of ensuring critical talent is focused on activities that require their valuable skills. To remedy this, companies look to external data providers to reduce their workload in areas such as data sourcing, freeing up scientists’ time for other AI initiatives,” it said.

Sagiraju said that the majority of AI efforts are spent managing data for the AI lifecycle, which means it is an incredible undertaking for AI leads to handle alone – and is the area many are struggling with,”

“Sourcing high-quality data is critical to the success of AI solutions, and we are seeing organizations emphasize the importance of data accuracy,” she said.

Related posts:

- Advertisement -

Latest News

Altegio redefines how businesses engage with customers

Altegio platform increases productivity, reduces operating costs and improves customer retention and engagement

Tata Electronics gets green signal to acquire major stake in Pegatron India

Move signals Tata's commitment to becoming a formidable player in smartphone manufacturing sector, particularly in collaboration with Apple.

AI-driven adaptive cardiac devices redefine heart disease treatment

Utilising AI to continuously analyse activity enables to adjust treatment in real-time based on fluctuations in cardiac rhythms
- Advertisement -
- Advertisement -

More Articles

- Advertisement -