All posts by Vladimir D

tinkerer, poet, rascal

Living in La La Land


For a time, this website carried the tagline “Living in La La Land”.   And… I should add, It has nothing to do with the movie. The expression “Living in La La Land” has been around for a lot longer than that.   It’s a way of saying, “You know… you might want to consider these couple of things,” or… “You know you’re bat-shit crazy, right?”


I’ve been tinkering with the idea of spoken word interfaces for… quite awhile now.  Building them, playing with them, using them, tweaking them, throwing them out.   I learned the word “munge”.   We’re still waiting for spoken word interfaces to come of age.  Think of how disappointing is Siri, and all of her clones.  And if you’d told me several years ago that people would buy “always listening” devices to place in their homes with… informed consent…?  But they do.

And why should I want to join that parade?   But I did….  And… the results were not spectacular.  But wait….

Initially, my objective was to build some mobile application functions suitable for a small business – a lot of functions constructed not so deep, but connected and organized in a rough and ready way.  Initially, I did this on the Force.com platform.  For those of you who don’t know… this is the programmatic undergirding of Salesforce.  I was playing with Salesforce 1, which was in its infancy.  I found the experience mostly unsatisfactory, mostly because I had a pretty clear idea of how I wanted my functions to behave, out there in the real world, and SF1 would not do… what I wanted it to do.  Not everything.  And I’m greedy that way.   What I wanted was an efficient way to collect, organize and store data.  Key word being “organize”.  It is possible, for instance, to collect and store information using the notes features on phones.  And you can use speech-to-text to do that.  I do this.  You do this.  I still do this.  But then you end up with information that is poorly-organized, and information which, supposing you have any further use for it beyond reference, requires further handling.   I wanted three things – (1) a bare minimum of crap, of things that only serve to get in the way; and (2) I wanted to achieve a high level of structure, right out of the barrel.  I didn’t want to “handle” the information again, at least not in the sense that I was duplicating the initial effort to gather it.  And (3) I wanted to accomplish this by talking and (hopefully) little else… when circumstances demanded.   I wanted my effing phone to listen to me.  And I guess that is the definition of bat-shit crazy.   I’m a busy person.   Hence, speaking….

So, the initial experience was kind of frustrating.   I was able to create the structures that I wanted for organizational purposes, but the processes to me were clunky.  Sometime in the fall of 2015, I was driving to New York City across I-80  from Pennsylvania and into New Jersey.  Daydreaming, I imagine… And it became very clear to me that what I wanted required that I jettison the idea of a constructed image-based user interface, such as SF1, or any other “container” app that would run on a phone.   I think I was as wedded to this idea of what “application” meant =  that it was supposed to “look like” something…. as anybody.  I was girding myself for some inevitable “work” building these visual interfaces to satisfy the purpose of speaking.   It was kind of a surprise – the idea that it did not have to “look like” anything.  That any container app was just going to add complexity. That the application could just “be” a spoken word interface glommed onto a text-messaging chassis.   How simple is that?

What would I call it?  Here’s an aside: I thought I would call it “Speak to me”.   Domain name search turned up the fact that “speakToMe.com” was owned and operated as a phone sex site.  Funny.  I found this out sitting in a rest stop looking over Delaware Water Gap.  That was a very fine day.

Most people, including people of the developer sort with whom I work from day-to-day, are invested… deeply-invested… in the very-structured graphical user interface.  They tend to not even think of this thing that I have called a spoken word interface because it is inherently unstructured and chaotic.  In fact, it basically boils down to this: because “it” does not have an “appearance”, it just is not.  Data acquires its structure from the user interface.  It’s always been that way.   The graphical interface has always been the easiest way to build key-value pairings.  I’ve spoken with developers who have gone so far as to suggest that this work that does not build upon a structured graphical interface does not constitute application development at all.  They’re entitled to their opinions.

Hence, La La Land, not meaning the movie, but the definition – a fanciful state or dreamworld.  Or bath-shit crazy.  Take your pick.

Next month, we’ll have a new tagline.

VeeDee

 

beSpokn

 


Even trial balloons have architectural considerations.


My “architectural considerations” such as they are, have led me to build spoken word applications using just three tools: text messaging, email and speech-to-text.

I am pretty sure that my guiding principle here comes from the old adage “keep it simple stupid”,  growing out of my belief that a lot of the things that we see, a lot of the things that we use, including software, are over-embellished. How many of you, for instance, use Word, and use all of its features? You know that could probably design a full Sunday edition of the New York Times as a word document, right?  But who does that?

So, for those of you who are technically-inclined, my initial interest is not to wire up a cloud data resource to a client application that runs in a smart phone. My interest lies simply in leveraging the built-in functions of that phone – specifically the speech-to-text rendering engine –  to plug the output straight into an engine that translates the output more-or-less directly into database elements. I call this a “spoken word interface”.  Anything else dilutes the focus and complicates the execution of functions.  In that context, the client application is garbage.  It’s just stuff that gets in the way, stuff that gives the designer, or the business, an opportunity to do some “branding” work or some other “work” that helps to monetize a campaign.  Like “free” list-keeping apps that tell you where to find a bargain on dish soap twenty miles away… So, for me… no app.  No “container” for functions. And, by the way, no need to “access” proprietary speech-to-text functions from that container application.  Building the container and embedding said proprietary functions would entail licensing.  Who wants to go there?  Whereas using speech-to-text with your text messaging is free and will likely always be.

The goal is to provide access to database functions with as little fanfare as possible, with a bare minimum of effort by the user, with a low bar for learning.  With a graduated path to using and leveraging increasingly sophisticated functions.  At the end of the day, a list-keeping application (first in the list) is a database function where you put a list item into the database, extract a set of list items from the database, delete one or more list items from the database. That’s it.  Now speak to it!

My premise is that there are a lot of things that can be accomplished using spoken word interfaces which need not impose the requirement of being “conversational”.  That seems to be where everybody’s headed.   In this case, you have significantly lowered the bar in terms of designing natural language interpreters, and everything else that comes along with the deconstruction of messages.

VeeDee