Man and machine learning: Data projects and the opportunities for developers

Published: June 13th, 2019

As businesses increasingly move their operations to the cloud, they’re recognizing the potential to harness the almost limitless compute power available and tap into artificial intelligence and machine learning technologies to deliver insights and value to the business that were previously beyond their reach.

Businesses have never been in a better position to create value from the vast amounts of data they hold. Developers with the skills and knowledge to unlock this value are therefore in a prime position. But how should businesses approach such projects? Here are four tips for development teams and data scientists who want to help firms bring this value to market.

1. Agree on the use case
It’s imperative to be clear on the objectives for any AI project upfront. Use cases for AI fall into three main areas. Firstly, there are projects designed to improve customer engagement and serve up personalized recommendations to customers. Secondly, business analysis projects optimize processes and support decision-making. And thirdly, operational AI projects – using AI to digitize entire processes to deliver increased efficiency, reduced costs, and other savings.

Being clear about the scope of the project and how success will be measured is paramount. Targets could include a metric to reduce processing failures, to reduce the timeframe for a specific process, or to increase revenues by a certain percentage.

I’d recommend starting small, perhaps with one team in one geography. Proving the use case works in a particular scenario can allow initial success to be quickly demonstrated. The scope and sale of the project can then be gradually expanded – with the business value measured at every stage. This approach also allows for ‘fast failure’ so that if something isn’t working, resources can be re-directed and the team can start again.

2. Get your Agile game on
If data projects are to succeed once use cases are established, the right teams must be assembled. In my experience, Agile Scrum teams are the most effective. Take a nine-person team as an example. The breakdown of core disciplines should be as follows:

Firstly, a business analyst (BA) must take charge of establishing the use case that will be achieved with the project, understanding the ideation around it and feeding this back to the rest of the team. Through this process, clear objectives can be set, particularly relating to key results for the client, but also what is achievable with sprints on the development side.

Next, and perhaps the most important, is the data scientist. In the scenario set out above, four would be the optimum number – and this is by no means an overrepresentation. As with any data project, 70% to 80% of the work to be done involves cleaning and arranging the data such that it can be used to bring about the use case agreed at the start. Furthermore, unlike regular software products that are built once and then deployed, data projects demand continuous deployment due to the dynamic nature of data.

Machine learning engineers make up two members of the Scrum team and will be responsible for building the data pipeline, and lastly, two QA members, with specific knowledge of the use case agreed upon at the start, should complete the team.

3. Use the right data
One of the major concerns of any data project is data sensitivity. AI and machine learning algorithms need significant amounts of data to produce good results; the more data, the better the results. But there are of course limitations on the types of data that can legitimately be used.

Regulations and privacy concerns are the biggest issues to contend with. Where a data set contains private information that can provide significant value for machine learning, it’s essential to approach this in the right way. This could include anonymizing sensitive data before running the analysis.

Given that data is ever changing, and that data projects follow a process of continuous delivery, the best way to validate a use case is to start small. Once the scope of a data project is validated it can then be rolled out more widely, constantly expanding but always scaled to achieve the key objectives set from the start.

Scaling up can change the context of the data, as will dealing with different customers. It might be possible to build a very accurate model for one customer, but the same model may perform poorly for another. So, the model must be changed and run accordingly, then maintained once deployed. This is one of the key differences of data projects; 60% of the work follows deployment, largely due to maintenance requirements.

This is an issue that often leads to timeframes expanding beyond initial targets. In a regular development project, you can predict with some degree of accuracy how long it will take to deliver the end product, as there is a clear understanding of the software. When it comes to data projects, uncertainty should be expected, as the more data that is gathered, the higher the risk that the overall context will change.

Transparency is key. Being open about the nature of data projects from the outset will help to maintain a good relationship with the customer. Bringing them into the process early and piloting the solution as outlined above, will reduce the risk of surprises down the line. As long as there is a clear commitment to solving the problem you agreed to solve, friction can be avoided.

4. Take to the cloud
Data-minded developers are in an era that is entirely theirs to own. Open-source tools such as TensorFlow, and cloud platforms such as Microsoft Azure, Google, AWS and Alibaba, are providing strong support for AI and machine learning projects. In my experience, developers working with DevOps tools and techniques are the most adept at creating value from data propositions, as they are most familiar with open-source tools and increased automation, as well as the cloud platforms that marry the two.

These platforms offer major advantages when it comes to data projects. To train machine learning, massive infrastructure is required. A graphics processing unit that enables deep learning, for example, can be very expensive to buy and operate, whereas cloud platforms can provide the same capabilities for a fraction of the price.

So, the time is right for developers and data scientists with the knowledge and skills demanded by data projects to bring new value to businesses. The pressure on organizations to innovate at pace has never been greater, and data – when used effectively – can deliver this like never before.

Article Tags

AI, artificial intelligence, data, machine learning, ML

About Shuki Licht

Shuki Licht is Chief Innovation Officer at Finastra

View all posts by Shuki Licht

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

<img height="0" src="/images/sd-premium-pill.svg" data-eio="l">Man and machine learning: Data projects and the opportunities for developers

Article Tags

Subscribe to SDTimes

About Shuki Licht

Related Articles

Last week in AI dev tools: Cloudflare blocking AI crawlers by default, Perplexity Max subscription, and more (July 7, 2025)

The AI productivity paradox in software engineering: Balancing efficiency and human skill retention

Gartner: More than 40% of agentic AI projects will be canceled in the next few years

June 2025: All AI updates from the past month

Man and machine learning: Data projects and the opportunities for developers