Breaking the Shackles of the Cloud

Almost all AI-enabled developer tools depend on the cloud.  The main reason is model size;  two other important reasons are accessing a user’s code base for training and the convenience of replacing models or retraining them to combat concept drift.  This dependency on the cloud has two adverse consequences:  it prevents disconnected use of AI-dependent tooling, ironically inhibiting developer’s use of these tools when they are flying among the clouds (despite the increasing prevalence of inflight connectivity) and undermines developer privacy, potentially exposing trade secrets.

Below, we pose questions to seed the discussion:


Cloud-based AI:  False Dichotomy?

Above, we’ve given reasons why we think AI-enhanced dev tools are cloud-based.  Are we right?  What are other reasons?  Are there development tasks that would benefit from AI that do not require the cloud?

Multiple versions of a language and its APIs introduce concept drift in AI tools. Is retraining easier if the tooling is cloud-based? This also raises a related question of identifying and dealing with concept drift. When do we know we have critical mass of data for updating the models?

Development tooling cannot be the first machine learning problem domain where the question of where to place the model has arisen.  Natural language translation appears to require the cloud.  Spelling checkers do not, but that may be because they remain unsophisticated, relying mostly on regular expressions and edit distance.  

Multi-tier Models

Can we build multi-tier AI-enhanced dev tooling that can operate both online and offline?  How would we architect such a framework?  For instance, one could first query a local,  a light-weight small model built over a small feature set, then fail over to querying a cloud-based model if the local model’s answer is unsatisfactory. 

Does operating over the larger vocabularies available to a cloud-based tool necessarily good?

What should be the update protocol be for local models when the user gets connectivity back?  Would this be a new application for transfer learning?

Developer Privacy

How do we address the privacy issues that cloud-based AI-enhanced dev tooling entail?  What are the implications for trade secrets and intellectual property?  Can we obfuscate or obscure the data, while retaining its utility? 

This page was last modified on 25 Oct 2017.