Radically more efficient deep learning to enable inference on low-power hardware.
We are working on a future in which a forest tells us there’s a fire. A warehouse directs us to the missing box. And a home lets us know if our elderly need help. It’s a future that is safer and more thoughtful thanks to intelligent sensors inside all the things around us.
Yesterday,
you had to sit at your PC.
Today, you take your
smartphone with you.
Tomorrow, intelligent sensors
turn computing ambient.
A large number of intelligent chips are required to make computing ambient, as a result they have to be small, cheap and low-power. Our technology works on $1 chips that consume less than 10 milliwatts. These are so efficient, they can be powered by a coin battery or small solar cell.
Processing data on the device is inherently more reliable than a connection with the cloud. Intelligence shouldn’t have to depend on weak WiFi.
Sending data from the sensor to the cloud, processing the data, and sending it back again takes time. Sometimes whole seconds. This latency is problematic for products that need to respond to sensor input in real-time.
Sending sensor data such as audio and video to the cloud increases privacy and security risks. To reduce abuse and give people confidence to let intelligent sensors into their lives, the data should not leave the device.
Ubiquitous connected sensors would overwhelm the network. Plumerai chips only use the network when they have something to report. This keeps bandwidth and mobile data costs low.
The farther we move data, the more energy we use. Sending data to the cloud uses a lot of energy. Processing data on-chip is more efficient by orders of magnitude. If a device needs a battery life of months or years, data needs to be processed locally.
Plumerai has developed a complete software solution for camera-based people detection. Trained with over 30 million images, our software detects people with a very high accuracy under a wide variety of conditions. These AI models are so small that they even run on Arm Cortex-M microcontrollers. On Arm Cortex-A CPUs, processor load is minimal such that there’s plenty of compute available for additional applications running on the same device.
By removing price and energy concerns our speech recognition and sound detection technology can be embedded in the microwave in your kitchen, in the lightbulb in your garage and in the thermostat in your bedroom. This way, no data can leave the device which makes it inherently secure.
Deep learning models can have millions of parameters and these parameters are encoded in bits. Where others require 32, 16 or 8 bits, Binarized Neural Networks use only 1 single bit for each parameter. This property makes it possible to perfectly calibrate the model to get the maximum performance out of every bit.
Deep learning models can have millions of parameters and these parameters are encoded in bits. Where others require 32, 16 or 8 bits, Binarized Neural Networks use only 1 single bit for each parameter. This property makes it possible to perfectly calibrate the model to get the maximum performance out of every bit.
A BNN needs drastically less memory to store its weights and activations than an 8-bit deep learning model. This saves energy by reducing the need to access off-chip memory and makes it feasible to deploy deep learning on more affordable memory constrained devices.
In addition to this, BNNs are also computationally radically more efficient. Convolutions are an essential building block of deep learning models. They consist of additions and multiplications, which – because the complexity of a multiplier is proportional to the square of the bit-width – can be replaced in a BNN by the simple POPCOUNT and XNOR operations.
Information is lost when the weights and activations are encoded using 1 bit instead of 8 bits. This affects the accuracy of the model.
Furthermore, the activation functions inside BNNs do not have a meaningful derivative. This is a problem when deep learning models are trained using gradient descent.
We solved these issues through our research and technology:
Helwegen et al, Latent weights do not exist: Rethinking binarized neural network optimization, NeurIPS (2019)
Bannink et al, Larq Compute Engine: Design, Benchmark, and Deploy State-of-the-Art Binarized Neural Networks, MLSys (2021)
Larq: Plumerai’s ecosystem of open-source Python packages for Binarized Neural Networks
We combine our BNN inference stack with our collection of trained BNN models to provide a turnkey solution on off-the-shelf chips. And where applicable, we also provide our IP-core for FPGAs.
To prevent the loss in accuracy that occurs when BNNs are trained as 8-bit deep learning models, we developed new software for training BNNs. Combined with our world-class research on state-of-the-art BNN architectures and training algorithms, this results in tiny but highly accurate BNN models.
We built inference software to enable BNNs to run efficiently on microcontrollers. Our compute engines are optimized for ARM Cortex-M, ARM Cortex-A and RISC-V architectures.
We collect, label and build our own datasets and our internal data pipeline identifies failure cases to ensure that our models are highly reliable and accurate.
For customers that use FPGAs and require the most energy-efficient solution, we provide a custom IP-core that is highly-optimized for our BNN models and software.
Plumerai’s collection of trained BNN models is available to our customers, along with our inference software stack and our IP-core for FPGAs. Learn more about the impact our technology will have on your product and specific application.
Get startedWe’re partnering with industry experts