facebook meta pixel Developing a Tool for Offline Text Classification | Inoxoft - Inoxoft
planet Earth Global

Building an open-source AI tool that puts LLM power on any device

Industries:
AI, edge computing, open-source
Duration:
1 year
Team:
3 core contributors (2 ML engineers, 1 OSS architect)
CLI tool in Docker, edge-optimized ONNX output
Community-led, stewarded by our team
Services:
AI architecture, ML engineering, OSS enablement
Technologies:
Docker, ONNX, Python, Pytorch, Tensorflow
Integrations:
GitHub Actions, OpenRouter LLMs, ONNX.js, Rust, Swift
DevOps:
GitHub CI, Flake8, Pytest, Pre-commit

Project Overview

The WhiteLightning project started with a question: Does text classification need the cloud every time?

It began as an internal experiment during our ML hack days, aimed at exploring what could be done with less. LLMs are great, but for most real-world use cases, you don’t need a 175B-parameter model on standby. Instead, you need something fast, portable, and private. Something that works offline, ships inside your app, and doesn’t rack up API bills.

                    Instead of utilizing LLMs at runtime, we use them once to generate synthetic data. Then, distill that into a compact, ONNX-based model that runs anywhere. No cloud, no lock-in, no friction. Just a simple way to go from idea to working classifier on your terms.                    

95% cheaper

Uses LLMs once for data generation (~$0.01 vs $1–10 per query)

<1 MB model size

Easily fits in mobile apps, kiosks, or embedded firmware

10–15 min training time

Generate a binary classifier on a laptop in minutes

2,520 texts/sec inference speed

That’s 0.38 ms per input on commodity CPUs

<512 kB RAM usage

Runs on low-power hardware like Raspberry Pi Zero

8 languages and runtimes supported

Identical logits across Python, Rust, Swift, and more

100% offline-ready

No cloud, no vendor lock-in, no latency risks

All-platform deployment

ONNX.js (web), iOS/Android (mobile), MCUs (embedded), laptops (desktop)

error icon
The challenge we tried to solve

Running LLMs is expensive. They’re cloud-bound, latency-prone, and often impractical for edge or privacy-sensitive environments. Our team saw a rising demand for fast, cost-free, offline text classification across use cases from email spam filters to IVR routing and parental control on offline consoles.


Existing options couldn’t cut it:


  • Commercial LLMs came with high per-query costs and data exposure risks.
  • TinyML frameworks handled inference but not training.
  • OpenVINO and spaCy require real datasets and GPU-heavy distillation.

We create a completely different tool: a command-line solution that distills LLMs into tiny, production-ready classifiers, which are fast, local, and open.

fixes icon
Under the hood of WhiteLightning

WhiteLightning isn’t a hosted SaaS. It’s a developer-first, no-nonsense CLI tool that does one thing extremely well: turn your prompt into an embeddable classifier. Here’s what’s under the hood:

  • Teacher-student distillation pipeline with LLM-generated synthetic data.
  • ONNX model export: <1 MB binary that runs on Raspberry Pi, mobile, browser, or bare metal.
  • Custom dataset support: for use cases with labeled corpora, which means 3–5x faster training.
  • Multi-language test suite: 8-language compatibility via GitHub Actions.
  • Robustness tools: edge-case generators, quantization, vocabulary pruning.
  • Docker packaging: zero setup, reproducible, portable.

These models can detect spam, classify voice commands while deploying, like config files.

screen icon
Inside the engineering

It’s not a typical “tinyML” framework. WhiteLightning delivers on real-world performance:

  • Language-agnostic ONNX: validated logits across Rust, Swift, C++, Node.js, Dart, and Python.
  • Blazing-fast inference: Up to 2,520 texts/sec in Rust (0.38 ms/text).
  • Secure by design: API key injection via environment variables, no local Python dependencies.

We built WhiteLightning to be easy to use: one CLI command to get started, clean output you can actually make sense of, and workflows that plug right into your CI without hassle.

How to make offline AI text classifiers with WhiteLightning

Who it’s for

WhiteLightning is made for builders who don’t want to rent intelligence from the cloud.

Indie developer

Add sentiment analysis to a desktop app without paying per query or relying on the cloud — it just works offline, out of the box.

Edge AI Engineer

Run NLP models on Raspberry Pi or Android with a lightweight (<1 MB) setup that delivers fast inference on any CPU.

Enterprise Privacy Team

Classify support tickets or chat messages locally and keep all data in-house to meet compliance requirements.

Real-world scenarios of using

WhiteLightning was developed to make intelligent text classification possible anywhere, even in environments where cloud access is limited, restricted, or simply not allowed.

Personal productivity and desktop apps

  • Smart quick-add (e.g., calendar vs. task vs. reminder)
  • Auto-tagging notes in Obsidian or Notion
  • Gmail-style inbox tabs, fully offline

Comms safety and moderation

  • Console games with offline parental controls
  • Secure chat platforms (Matrix/Element) enforcing code of conduct
  • SMS spam filtering on Android ROMs (no Google Play)

Healthcare and life sciences

  • Patient triage kiosks (e.g., refill request vs. symptoms)
  • Symptom classifiers for medical wearables
  • Transcription flaggers for allergy or dosage mentions

Customer support and compliance

  • On-prem ticket routing for banks and hospitals
  • VoIP transcription classifiers
  • Contact-center QA inside closed networks

IoT, automotive, and smart devices

  • Offline voice commands for home automation
  • In-car NLP for media or navigation
  • Industrial alarm logs classified by risk level

Retail and eCommerce

  • Offline voice commands for home automation
  • In-car NLP for media or navigation
  • Industrial alarm logs classified by risk level

Developer and DevOps tools

  • GitHub bots tagging issue types
  • CI pipelines detecting secrets or tone
  • IDE extensions nudging for better commit messages

Education

  • Adaptive e-readers
  • Captioning systems detecting topic shifts

OEM/Embedded hardware

  • Router firmware with built-in parental filters
  • 3D printer UI explaining G-code errors

Yes, it runs even on a potato

WhiteLightning was built to work even on extremely low-spec hardware. With models under 1 MB, no runtime dependencies, and ONNX compatibility, it runs smoothly on:

  • Raspberry Pi Zero
  • Old laptops
  • Budget Android phones
  • In-browser via ONNX.js
  • Microcontrollers with limited RAM

If it can run Python or Rust, it can run WhiteLightning.

Why WhiteLightning delivers

  • Pay once for LLM access (via OpenRouter or other API)

  • Generate synthetic labeled data from your task prompt

  • Distill into a <1 MB model that runs forever, for free, anywhere

OSS as a collaboration model

WhiteLightning is 100% open-source under GPL-3.0. The classifiers it generates are MIT-licensed and yours to use in commercial apps.
Our team maintains it publicly:

GitHub issues, PRs, and test matrix are open
CI/CD runs on every PR via GitHub Actions
Dev chat happens on Discord

Future plans

While still early in its adoption, WhiteLightning is poised for:

  • Widespread deployment in edge NLP applications
  • Integration into developer tools, CI bots, and offline apps
  • Future GUI and hosted-API wrappers — still true to its open-core model

We’re also experimenting with image classification, but text is where our solution shines.