Most AI tools promise flexibility, but they often come with high infrastructure costs, privacy concerns, and models that are far too large for real-world use. What if you could build AI that runs entirely offline, requires no ongoing API fees, and fits comfortably under 1MB?

 

Meet WhiteLightning, Inoxoft’s open-source CLI tool that transforms LLM-generated data into fast, compact text classifiers. It’s designed to help developers and product teams create deployable AI models with a single prompt — no massive datasets, no GPU clusters, no cloud lock-in.

 

In this article, we’ll walk you through how WhiteLightning works, why it’s different, and how teams are already using it to bring efficient, private AI to everything from mobile apps to embedded devices. You can explore the code on GitHub

Contents

Key Takeaways

  • Most AI is expensive, cloud-bound, and hard to deploy.
  • WhiteLightning is Inoxoft’s open-source tool that turns a simple prompt into a compact, offline AI model — no real data or API costs.
  • Train fast, export under-1MB classifiers, and run them anywhere.
  • Built for developers. Ready for real-world use.

The Problem: When Your AI Needs a Reality Check

Large Language Models (LLMs) are powerful, but often far more than you need. For example, if your task is to tag support tickets, classify user feedback, or flag content types, deploying a massive, cloud-hosted model isn’t inefficient.

Relying on cloud-based LLM APIs brings predictable challenges:

  • Costs scale fast. You’re charged per query, even for simple tasks.
  • Latency adds friction. Round trips to the cloud slow everything down.
  • Privacy concerns arise. Your data leaves your environment, which can be a deal-breaker for many industries.

Tools like TinyML can help with inference on small devices; nonetheless, they don’t handle training. Others like spaCy or OpenVINO are great for traditional pipelines, but they require real datasets and serious compute power to get started.

The missing piece: Fast, private, offline

WhiteLightning began with a question: Does every text classification task really need the cloud?

Again and again, we saw teams facing the same challenge: needing AI that was fast, cost-effective, and completely offline. Think spam filters on older phones, parental controls on a game console without internet, or support ticket routing in secure enterprise environments where data must stay local.

Their need was a model small enough to ship with your app, private by default, and free from recurring API bills. 

WhiteLightning: Inoxoft’s Answer to Lean, Local AI

WhiteLightning is built for developers who want results without the overhead. It’s a command-line tool that does one job exceptionally well: turn your task description into a lightweight, embeddable text classifier.

Rather than running an LLM at inference time, WhiteLightning uses it once (during setup) to generate synthetic training data. It then distills that into a compact ONNX model you can run anywhere: mobile, browser, embedded systems, or on-prem servers.

The engineering: What’s inside

WhiteLightning delivers practical performance thanks to deliberate engineering choices:

  • Teacher–student pipeline. An LLM generates synthetic training data, which is then used to train a smaller, task-specific model through distillation.
  • ONNX export. The final model is a compact ONNX binary (typically under 1MB) that runs anywhere — from Raspberry Pi and mobile devices to browsers (via ONNX.js) and bare metal systems.
  • Custom dataset support. If you already have labeled data, WhiteLightning can train on it directly, often reducing training time by 3–5×.
  • Built-in robustness. Features like edge-case generation, quantization, and vocabulary pruning help produce models that are both accurate and efficient.
  • Zero-setup deployment. Distributed as a Docker image for clean, reproducible runs with no local dependencies.

WhiteLightning is 100% open source (GPL-3.0), and the models it creates are yours to use freely under the permissive MIT license, even in closed-source commercial projects.

The performance: Where the proof hides

  • Cost-efficient by design. You pay once for synthetic data generation instead of per-query API charges, making it dramatically cheaper than cloud-based LLMs.
  • Compact models. Outputs are typically under 1MB, small enough to embed directly into mobile apps, firmware, or lightweight devices.
  • Quick to train. A binary classifier is ready in about 10–15 minutes on a standard laptop — no GPUs required.
  • Blazing-fast inference. Processes over 2,500 texts per second, with sub-millisecond latency on common CPUs.
  • Minimal resource use. Runs comfortably on low-power hardware like Raspberry Pi Zero, using under 512KB of RAM.
  • Cross-platform ready. Compatible with 8+ runtimes including Python, Rust, Swift, and more — consistent outputs across environments.
  • Truly offline. No cloud, no vendor lock-in, no internet dependency. Just AI that runs anywhere, anytime.
  • Universally deployable. Works in browsers (ONNX.js), mobile apps (iOS/Android), microcontrollers, laptops, and edge devices.

How We Brought WhiteLightning To Life: From Our Curiosity to Your Deployments

We like to experiment, and WhiteLightning started as exactly that. During one of our internal ML hack days, we wondered whether text classification needs the cloud every time. The answer, in many cases, was no. 

So our small team (two ML engineers and an OSS architect) set out to build a better way. We focused on what mattered: minimal setup, clean output, and smooth integration into real-world developer workflows.

What came out of that experiment was WhiteLightning: a straightforward, open-source tool that solves a real problem by turning your prompt into a deployable AI model — quickly, affordably, offline.

WhiteLightning in action: Your step-by-step guide to deployable AI

Getting started with WhiteLightning is simple; you don’t need to be a machine learning expert. All it takes is a clear description of the classification task you want to solve. That one line becomes the blueprint for the entire process.

For example, to classify customer reviews by sentiment, your command might look like this:

docker run --rm -v $(pwd):/app/models -e OPEN_ROUTER_API_KEY="YOUR_OPEN_ROUTER_KEY_HERE" \
ghcr.io/inoxoft/whitelightning:latest -p="Classify customer reviews as positive or negative sentiment"

Once you hit enter, WhiteLightning takes over, automating a complex sequence of steps. Here’s a breakdown of the process WhiteLightning executes to transform your prompt into a production-ready classifier:

Step 1: Synthetic data generation

No need to collect or label thousands of examples yourself. WhiteLightning uses a powerful LLM to automatically generate a balanced dataset tailored to your prompt.

You’ll see logs like:

🧠 - INFO - DataGenerator initialized. Config Model: 'x-ai/grok-3-beta', Model: 'openai/gpt-4o-mini'
🧠 - INFO - Generate Edge Cases: True (Target Volume per class: 50)

It runs prompt refinement cycles and includes edge case generation, making the final dataset more robust. In essence, it teaches itself what types of examples your classifier should learn from before any training begins.

Step 2: Model distillation and training 

Once the dataset is ready, WhiteLightning distills the LLM’s knowledge into a much smaller, task-specific model. 

Training starts automatically:

📦 - INFO - === Starting Data Generation & Model Training Process ===
⚙️ - INFO - Starting model training using tensorflow strategy...
📈 ━━━━━━━━━━━━━━━━━━━━ Epoch 1/20 - accuracy: 0.4164 - loss: 0.6194
📈 ━━━━━━━━━━━━━━━━━━━━ Epoch 20/20 - accuracy: 1.0000 - loss: 5.3911e-05
✅ - INFO - Test set evaluation - Loss: 0.0006, Accuracy: 1.0000

On a typical laptop, your model is ready in 10–15 minutes. What you get is a highly accurate, compact classifier, perfect for embedding in real-world systems.

Step 3: ONNX export

Once training is complete, WhiteLightning exports your model in ONNX (Open Neural Network Exchange) format, a universal standard for cross-platform AI deployment.

📤 - INFO - Model exported to ONNX: models_multiclass/customer_review_classifier/customer_review_classifier.onnx
⚡ - INFO - === Data Generation & Model Training Process Finished. Duration: 0:10:42 ===

This ensures your model runs seamlessly across environments — with no dependency on the original toolchain or cloud infrastructure.

Step 4: Ready to deploy anywhere 

With your ONNX file (typically under 1MB), you can deploy the model in nearly any environment:

  • Web apps: Run it in-browser via ONNX.js
  • Mobile apps: Integrate into iOS or Android with zero API calls
  • Edge devices: Deploy on Raspberry Pi or similar low-power hardware
  • Microcontrollers: Models run even on tightly constrained systems
  • Desktops and servers: Works offline or in your internal infrastructure

 

WhiteLightning brings AI down to size: fast, private, and portable.

Curious what you can build? WhiteLightning is open-source and ready to use. Grab the tool from GitHub, run one command, and start building practical, deployable AI right on your own device.

Who Benefits from WhiteLightning

WhiteLightning is built for anyone who needs fast, reliable AI without cloud dependency, recurring API costs, or heavyweight infrastructure. Here’s how different teams are already putting it to work:

WhiteLightning by Inoxoft: Who Benefits from On-Device AI?

Indie developers

Want to add smart features like sentiment analysis or auto-tagging without relying on external APIs?

With WhiteLightning, you can build lightweight AI directly into your app. Think offline inbox filters, note tagging in tools like Obsidian or Notion, or simple classifiers that run locally and respect user privacy.

Edge AI engineers

Trying to deploy NLP on constrained hardware like Raspberry Pi, Android devices, or microcontrollers?

WhiteLightning generates compact models (typically under 1MB) that run fast on basic CPUs (no GPU needed). It’s ideal for voice commands, IoT, and other edge-based NLP tasks.

Enterprise privacy teams

Need to keep sensitive data in-house, whether for compliance or internal policy?

WhiteLightning lets you classify support tickets, chat messages, or internal documents entirely offline. It’s perfect for on-prem ticket routing, secure communication platforms, or DLP tools that monitor local inputs for sensitive content.

Possible Use Cases

We believe AI shouldn’t be limited by cloud access, high costs, or heavyweight infrastructure. 

WhiteLightning makes smart, localized text classification possible in places where traditional AI can’t go.

Comms Safety and Moderation

  • Add offline parental controls to console games without internet access
  • Power secure chat moderation in apps like Matrix/Element, classifying messages locally for code-of-conduct compliance
  • Enable SMS spam filtering on Android ROMs, even without Google Play or cloud dependencies

Healthcare and Life Sciences

  • Run triage kiosks that classify patient messages like “refill request” or “new symptoms”, while keeping all data local
  • Support wearable medical devices with embedded symptom classifiers under 512KB RAM
  • Flag clinical keywords like allergens or dosages in offline ASR transcripts without sharing sensitive data

IoT, Automotive, and Smart Devices

  • Bring offline voice commands to smart home hubs or in-car media systems
  • Use tiny models (~600KB) on ARM Cortex-A processors to categorize industrial alarm logs (e.g. maintenance vs. safety)
  • Let SCADA or factory gateways make local decisions without cloud delay

Does one of these scenarios sound familiar? If so, WhiteLightning might be exactly what you need. Try it out on GitHub and see how far lightweight AI can go.

What’s Next for WhiteLightning

WhiteLightning is already solving real-world problems, but we’re just getting started. Our focus remains on making AI more accessible, portable, and developer-friendly.

  • Broader adoption. We’re seeing increased use in edge NLP applications across healthcare, embedded systems, and enterprise tools. We aim to support even more deployment environments.
  • Deeper integrations. Plans are underway to integrate WhiteLightning into CI/CD workflows, developer bots, and offline-first app ecosystems, streamlining how it fits into real engineering pipelines.
  • Simpler interfaces. While the CLI works well for many, the architecture is modular enough to support future additions like a lightweight GUI or optional hosted API wrappers. All is while keeping the project open-core.
  • Beyond text. While text classification is where WhiteLightning shines today, we’re experimenting with image classification use cases built on the same principles: small, fast, and local-first.

Inoxoft's Future with WhiteLightning: A Look Ahead

We’re continuing to evolve WhiteLightning with real developers and devices. 

Partnering with Inoxoft: Your AI Journey, Our Expertise

When working with our team, you team up with the very engineers behind WhiteLightning. As active maintainers of this open-source project, we also use this tool in our projects. 

How can we support your AI journey?

  • Custom AI/ML development. From architecture to deployment, we build tailored solutions and use WhiteLightning where it delivers the most value.
  • AI consulting. Not sure where to start? We’ll help define the right use cases and map your path forward.
  • AI agent development. We design intelligent agents that automate decision-making across workflows, with full integration support.

 

Our approach is the same one that shaped WhiteLightning: start lean, deliver fast, and keep things simple enough to scale.

Why you can trust us with your projects 

  • Real-world expertise. We’ve built and deployed production-grade AI tools, not just prototypes.
  • End-to-end support. From ideation to deployment and iteration, we’re with you at every step.
  • Future-focused. We stay ahead of the curve so your solution works today and tomorrow.

If practical, efficient AI is what you’re after, let’s talk. We’ll help you build something that runs smart, fast, and where you need it.

To Sum Up

WhiteLightning started as a small experiment and turned into a tool we now rely on. It’s a simple command-line utility that takes your task prompt, generates its training data, and produces a compact AI model that runs fully offline. No real dataset required. No ongoing API bills. Just a fast, portable text classifier you can use. If that sounds like something you’ve been missing in your workflow, give it a try. 

And if you need a hand making it work for your setup, we’re just a message away.

Frequently Asked Questions

Can I use WhiteLightning without a labeled dataset?

Yes. WhiteLightning generates synthetic training data based on your task description using a language model, so you don't need an existing labeled dataset to get started.

What kinds of text classification tasks can WhiteLightning handle?

WhiteLightning supports binary and multi-class classification for tasks like sentiment analysis, intent detection, spam filtering, content moderation, routing support tickets, and more.

Do I need machine learning experience to use WhiteLightning?

Not at all. If you're comfortable using the command line and Docker, you can get started with just a simple text prompt describing the problem you want to solve. The tool handles the rest.

Is WhiteLightning suitable for production environments?

Yes. It outputs ONNX models that are optimized for speed and resource efficiency, making them suitable for embedding in production applications across mobile, desktop, web, and edge devices.

Can I fine-tune or retrain models generated by WhiteLightning?

WhiteLightning supports training with your own labeled data as well. You can provide a custom dataset to speed up training or increase accuracy for domain-specific tasks.