With Cloud AI and Edge AI, the architecture becomes functional for the purpose for which it is used; those who design new intelligent services must know how to make good use of both technologies both to improve the intrinsic performance of the service and to ensure that innovative software solutions have an increasingly better user experience.
When we talk about artificial intelligence, we are essentially talking about methodologies that, if consciously applied to company data (numbers, images, sounds, texts), can undoubtedly speed up processes, automate them, and directly impact revenue, costs and risks. Corporate.
Furthermore, the ever-increasing democratization of the cognitive tools of AI and machine learning also allows smaller companies to embark on an exciting journey in this area. Today, more than ever, we can count on concrete hardware and software tools that integrate traditional business platforms and make them evolve towards higher performance levels, introducing the prediction technique statistics as a daily tool and standard descriptive statistics.
Predicting does not mean guessing. It means using methods that rationally and mathematically reduce to a minimum the error between an estimate of the probability that an event occurs or that a piece of data is such compared to its observation. If the error is minimal, it means that with great precision, it will be possible to predict the following data that my device will provide me, the category to which an image or sound belongs, or the words I will probably write in this article. It is the “magic” of AI.
To carry out these operations, the computer must carry out numerous complex calculations, therefore requiring a considerable amount of hardware resources, mainly if these calculations are carried out on our laptop or desktop. When we talk about artificial intelligence, we practically talk about matrix calculations, that is, mathematical operations, additions, multiplications, and transpositions of numerical matrices.
A typical CPU, even if of the latest generation, will always have a calculation capacity limited both by its intrinsic calculation power and by the fact that a CPU must manage an assortment of processes together (it must allow the entire machine to function), so it will not be able to be saturated much more. Nor would it be possible to parallel many CPUs in a standard architecture. This becomes a limitation. This is where the cloud and edge computing are.
To help with this important task, there are GPUs, i.e. Graphic Process Units: they are processors, like those built by NVIDIA, designed to manage the complex processing typical of video games (which in the end are always matrix calculations for determining the position of the points and faces of polygons in 3D space). By extension, if a GPU can do calculations for 3D, it can also quickly do so for any other task that involves continuous processing of numerical data.
Furthermore, it should be noted that, according to the CEO of Nvidia, the growth of the computing power of graphics processors increases more than Moore’s law establishes. This law, which comes from Intel’s cofounder, who announced it in 1965, establishes that CPU chips would double their computing power every two years. Huang, however, highlighted how Moore’s law today is disproved by an annual growth in CPU power of only a few percentage points per year, unlike the growth in GPU power, which grows considerably more.
In fact, in five years, the power of GPUs has grown by more than 25 times, while according to Moore’s law, CPUs should have grown by only ten times. Furthermore, it is possible to parallelize different CPUs, thus guaranteeing the right computing power for our AI or our 3D works. This preamble is necessary to understand that having adequate hardware equipment involves huge investments and the need to occupy the machines 24/7 to recover the invested costs. Not all companies have the possibility and the skills to acquire a proprietary server farm for these purposes.
Furthermore, it must be said that AI occupies machines in two very distinct moments: training, i.e. the moment in which we teach the machine to carry out a specific task and the moment of inference, i.e. when the machine uses the predictive model to anticipate future data. The first phase, training, in fact, intensely consumes the machine’s resources, performing calculations that sometimes last even days and then returning a statistical model that represents the mathematical explanation of the task we have given to the machine to process.
Subsequently, this model is used for the so-called inference, i.e. the actual prediction activity that is used to make new tasks equal to those of the training but on data never seen by the algorithm. For example, This prediction phase is less demanding on the machine but will still be carried out more often and will occupy the machine a lot in terms of time.
The Cloud comes to the aid of companies for these very reasons: to lighten the part of multi-year investments in machines, optimizing the TCO Total Cost of Ownership of the hardware structure by transferring all the processing load to remote machines. These machines can be appropriately activated or deactivated on demand. Therefore, they are paid per use and not per device.
The usage costs incurred by the company then also include a series of services, some essential, such as cybersecurity, disaster recovery, and GDPR compliance, and others optional, such as on-demand machine learning models already available and ready to use for general purpose uses. Cloud computing is a great opportunity, both for suppliers and customers. Amazon, for example, generates more than 50% of its revenues from the AWS platform.
Through services such as AWS, Microsoft’s Azure or Google Cloud, companies can transform their software products into real platforms (PaaS / Platform as a Service) for which companies pay a recurring fee. The advantage is being able to perform even complex functions remotely and scale the configuration of the machines according to one’s needs, even if these needs are not immediately known in the preliminary design phase.
With the simplicity of one click, you add video cards, increase the RAM, and add nodes with new machines. This drastically reduces service setup or upscale times if supporting a more significant load of requests is necessary. In addition, some providers also offer unique cards in their cloud services, the TPUs, Tensor Processing Units, which are processors explicitly designed to manage multidimensional data arrays (tensors, in fact) typical of the more complex AI algorithms.
This allows you to perform complex calculations in the Cloud in fractions of the time a local machine could do. Looking instead at the limits of the Cloud, it must be said that managing data from and to remote servers still involves intense use of the network bandwidth and continuous client/server communications towards the API (Application Program Interface) microservices. This is because the inference activity is transferred to the remote device.
Therefore, depending on the type of services we create, the server will be more or less loaded with operations. For example, a continuous recognition of objects (e.g. the face) will need a structured back end to support the demand for segmentation of the frames that will be sent in streaming.
To optimize some of these activities, there is Edge AI, i.e. the possibility of using devices closer to the user to carry out the same inference operations on-site without transferring information to the network. In short, Edge AI means equipping electronic devices with their intelligence with AI and possibly their connectivity to the network and between the devices themselves. What are the benefits of Edge AI?
The advantages are many; just think of the enormous utility that this architectural method can bring to the effectiveness of the service in terms of latency, for example. Facial recognition or the prediction of anomalies can be done more effectively on the IOT device (camera/sensor, etc.), possibly enhanced with specific chips for AI such as Movidius or OpenVino. It also decreases the privacy risk. If the data travels online, it could be intercepted or corrupted.
However, by processing the forecast locally, the data remains in the device and does not need to be transported elsewhere. Indeed, a single inference on a single device is a less demanding task than managing multiple threads of inference processing in the Cloud for data coming from multiple devices. Finally, having less data occupying bandwidth allows the construction of intelligent services using ordinary connection infrastructures without incurring additional costs for specific connections.
A scenario of vast potential opens up, mainly because these intelligent devices allow almost real-time AI model processing. With this technology, it becomes possible to imagine freeflow checkout services without queues because the camera recognizes the food on the tray or automatic detection services of safety equipment (such as masks for COVID), or for the identification of intrusions, but also the monitoring and forecasting of energy consumption or the automated forecast management of machinery, etc.
Transferring the power of artificial intelligence to the last mile allows, in specific use cases, to become a power multiplier, leading to a division of the computational effort and an improvement in the effectiveness and efficiency of the designed service.
We conclude by saying that, even if there is no unique choice, with Cloud AI and Edge AI, the architecture becomes functional for the purpose for which it is used, and whoever plans new intelligent services will have to know how to make good use of both technologies both for improve the intrinsic performance of the service but also to ensure that innovative software solutions have an ever better User Experience.