Next for Gen AI: Small, hyper-local and what innovators are dreaming up

Published: February 21st, 2024

In late 2022, ChatGPT had its “iPhone moment” and quickly became the poster child of the Gen AI movement after going viral within days of its release. For LLMs’ next wave, many technologists are eyeing the next big opportunity: going small and hyper-local.

The core factors driving this next big shift are familiar ones: a better customer experience tied to our expectation of immediate gratification, and more privacy and security baked into user queries within smaller, local networks such as the devices we hold in our hands or within our cars and homes without needing to make the roundtrip to data server farms in the cloud and back, with inevitable lag times increasing over time.

While there’s some doubts on how quickly local LLMs could catch up with GPT-4’s capabilities such as its 1.8 trillion parameters across 120 layers that run on a cluster of 128 GPUs, some of the world’s best known tech innovators are working on bringing AI “to the edge” so new services like faster, intelligent voice assistants, localized computer imaging to rapidly produce image and video effects, and other types of consumer apps.

For example, Meta and Qualcomm announced in July they have teamed up to run big AI models on smartphones. The goal is to enable Meta’s new large language model, Llama 2, to run on Qualcomm chips on phones and PCs starting in 2024. That promises new LLMs that can avoid cloud’s data centers and their massive data crunching and computing power that is both costly and becoming a sustainability eye-sore for big tech companies as one of the budding AI’s industry’s “dirty little secrets” in the wake of climate-change concerns and other natural resources required like water for cooling.

The challenges of Gen AI running on the edge

Like the path we’ve seen for years with many types of consumer technology devices, we’ll most certainly see more powerful processors and memory chips with smaller footprints driven by innovators such as Qualcomm. The hardware will keep evolving following Moore’s Law. But in the software side, there’s been a lot of research, development, and progress being made in how we can miniaturize and shrink down the neural networks to fit on smaller devices such as smartphones, tablets and computers.

Neural networks are quite big and heavy. They consume huge amounts of memory and need a lot of processing power to execute because they consist of many equations that involve multiplication of matrices and vectors that extend out mathematically, similar in some ways to how the human brain is designed to think, imagine, dream, and create.

There are two approaches that are broadly used to reduce memory and processing power required to deploy neural networks on edge devices: quantization and vectorization:

Quantization means to convert floating-point into fixed-point arithmetic, that is more or less like simplifying the calculations made. If in floating-point you perform calculations with decimal numbers, with fixed-point you do them with integers. Using these options lets neural networks take up less memory, since floating-point numbers occupy four bytes and fixed-point numbers generally occupy two or even one byte.

Vectorization, in turn, intends to use special processor instructions to execute one operation over several data at once (by using Single Instruction Multiple Data – SIMD – instructions). This speeds up the mathematical operations performed by neural networks, because it allows for additions and multiplications to be carried out with several pairs of numbers at the same time.

Other approaches gaining ground for running neural networks on edge devices, include the use of Tensor Processor Units (TPUs) and Digital Signal Processors (DSPs) which are processors specialized in matrix operations and signal processing, respectively; and the use of Pruning and Low-Rank Factorization techniques, which involves analyzing and removing parts of the network that don’t make relevant difference to the result.

Thus, it is possible to see that techniques to reduce and accelerate neural networks could make it possible to have Gen AI running on edge devices in the near future.

The killer applications that could be unleashed soon

Smarter automations

By combining Gen AI running locally – on devices or within networks in the home, office or car – with various IoT sensors connected to them, it will be possible to perform data fusion on the edge. For example, there could be smart sensors paired with devices that can listen and understand what’s happening in your environment, provoking an awareness of context and enabling intelligent actions to happen on their own – such as automatically turning down music playing in the background during incoming calls, turning on the AC or heat if it becomes too hot or cold, and other automations that can occur without a user programming them.

Public safety

From a public-safety perspective, there’s a lot of potential to improve what we have today by connecting an increasing number of sensors in our cars to sensors in the streets so they can intelligently communicate and interact with us on local networks connected to our devices.

For example, for an ambulance trying to reach a hospital with a patient who needs urgent care to survive, a connected intelligent network of devices and sensors could automate traffic lights and in-car alerts to make room for the ambulance to arrive on time. This type of connected, smart system could be tapped to “see” and alert people if they are too close together in the case of a pandemic such as COVID-19, or to understand suspicious activity caught on networked cameras and alert the police.

Telehealth

Using the Apple Watch model extended to LLMs that could monitor and provide initial advice for health issues, smart sensors with Gen AI on the edge could make it easier to identify potential health issues – from unusual heart rates, increased temperature, or sudden falls with no limited to no movement. Paired with video surveillance for those who are elderly or sick at home, Gen AI on the edge could be used to send out urgent alerts to family members and physicians, or provide healthcare reminders to patients.

Live events + smart navigation

IoT networks paired with Gen AI at the edge has great potential to improve the experience at live events such as concerts and sports in big venues and stadiums. For those without floor seats, the combination could let them choose a specific angle by tapping into a networked camera so they can watch along with live event from a particular angle and location, or even re-watch a moment or play instantly like you can today using a TiVo-like recording device paired with your TV.

That same networked intelligence in the palm of your hand could help navigate large venues – from stadiums to retail malls – to help visitors find where a specific service or product is available within that location simply by asking for it.

While these new innovations are at least a few years out, there’s a sea change ahead of us for valuable new services that can be rolled out once the technical challenges of shrinking down LLMs for use on local devices and networks have been addressed. Based on the added speed and boost in customer experience, and reduced concerns about privacy and security of keeping it all local vs the cloud, there’s a lot to love.

Article Tags

chatgpt, Edge, Gen AI, GPUs

About Tiago Barros

Tiago Barros is the Principal IoT Technical Manager and professor of the graduate course in Computer Science at the CESAR Innovation Center and CESAR School in Recife, Brazil’s Porto Digital. He has 27+ years of experience in software development, with focus on hardware architecture, embedded realtime systems, IoT, and communication protocols.

View all posts by Tiago Barros

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Next for Gen AI: Small, hyper-local and what innovators are dreaming up

Article Tags

Subscribe to SDTimes

About Tiago Barros

Related Articles

OpenAI announces “deep research” agent that can complete online research

ChatGPT now allows users to set recurring reminders

The rise of “soft” skills: How GenAI is reshaping developer roles

ChatGPT can now include web sources in responses