The online home of John Pollard

Google Assistant vs Alexa Development

Building my first action for Google Assistant, and how it compares to developing for Alexa

I’ve been pretty much all-in on the Amazon Echo as my home voice system, and still loving having multiple devices around our home we can call out commands to.

However I’m always looking to expand Yeltzland onto new platforms, so I’ve ported the Alexa skill I wrote about a while back on to the Actions on Google platform.

This is a summary of what I learnt doing that, and my view on the advantages and disadvantages of developing for each platform.

Actions on Google Advantages

Also available on phones

The main advantage of Google Assistant - one I hadn’t realised until I started this, even though it’s actually pretty obvious - is that it’s available on phones as well as the Google Home speaker.

On newer Android phones Google Assistant might be installed out of the box (or can be installed on recent versions), and there is also a nice equivalent iOS app.

I’ve just bought a Google Home Mini to try out, and it’s definitely is comparable to the Echo Dot it sits next too, but I’ve found myself using Google Assistant a lot more on my iPhone than expected.

Visual responses are nicer

Because the Google Assistant apps are so useful, there is a lot more emphasis in returning visual responses to questions alongside the spoken responses.

Amazon does have the Echo Show and the Echo Spot that can show visual card information, but my uneducated guess is they make up a small percentage of the Echo usage.

Google offers a much richer set of possible response types, which not unsurprisingly look at lot like search results answers.

In particular, the Table card - current in developer preview - offers the chance to provide really rich response which suit the football results returned by my answer very well.

Screenshot of Google Assistant answer

Nice development environment

Both the Actions on Google console (used for configuring and testing your action), and the Dialogflow browser app (used for configuring your action intents) are really nice to use.

Amazon has much improved their developer tools recently, but definitely a slight win to Google here for me. In particular, for simple actions/skills Dialogflow makes it easy to program responses without needing to write any code.

Using machine learning rather than fixed grammars to match questions to intents

Google states it’s using machine learning to build models that match questions to your stated intents, whereas Amazon expect you to be specific in stating the format of the expected phrasing.

Now from my limited testing - and since I’m basically implementing the same responses on both platforms - it’s hard to say how much better this approach is. However, assuming Google are doing a good job (and with their ML skills it’s fair to assume they are!), this is definitely a better approach.

Allowing prompts for missing slot values

Google has a really nice feature where you can specify a prompt for a required slot if they’ve matched an intent, but not been able to parse the parameter value.

For example, one of my intents is a query like “How did we get on against Stourbridge?” where Stourbridge is an opposition team matched from a list of possible values.

Amazon won’t find an intent if it doesn’t make a full match, but on Google I can specify a prompt like “for what team?” if makes a partial match but didn’t recognise the team name given, and then continue on with the intent fulfilment.

Actions on Google Disadvantages

Couldn’t parse “Yeltzland” action name

A very specific case, and not a common word for sure, but Google speech input just couldn’t parse the word “Yeltzland” correctly. This was very surprising, as I’ve usually found Google’s voice input to be very good but it kept parsing it as something like “IELTS LAND” 😞

You also have to get specific permission for a single work action name - not really sure why that is - so I’ve had to go with “Talk to Halesowen Town” rather than my preferred “Talk to Yeltzland” action invocation. It all works fine on Amazon.

SSML not as good

A couple of my intents return SSML rather than plain text, in an attempt to improve the phrasing of the responses (and add in some lame jokes!).

This definitely works a lot better on the Echo than on Google Assistant.

What about Siri?

All this emphasises how far behind Siri is behind the other voice assistants right now.

Siri is inconsistent on different devices, often has pretty awful results understanding queries, and is only extensible in a few limited domains.

I really hope they offer some big changes in next week’s 2018 WWDC - maybe some integration with Workflow as I hoped for last year, but I really don’t hold much hope any more they can make significant improvements at any sort of speed. Let’s hope I’m wrong.

Conclusion

As you can tell I’m really impressed with Google’s offering here, and definitely seems slightly ahead of Amazon in offering a good development environment for developing voice assistant apps.

In particular, having good mobile apps offering the chance to return rich visual information alongside the voice response is really powerful.

My “Halesowen Town” action is currently in review with Google (as of May 30th, 2018), so all being well should be available for everyone shortly - look out for the announcement on Twitter!

P.S. If you are looking for advice or help in building out your own voice assistant actions/skills, don’t hesitate to get in touch at johnp@bravelocation.com

Local Testing of an Alexa Lambda function

How I setup some simple unit tests for my Alexa Lambda function, so I could refactor it with confidence without breaking my skill

After watching last week’s fascinating Google I/O Conference, I’ve been thinking about porting my Yeltzland Alexa Skill to Google Assistant.

The Alexa Skill runs as an AWS Lambda function, and as it was my first attempt at writing a skill the code wasn’t particularly well designed for reuse.

Therefore I thought it was a good idea to find out how to:

  1. Run AWS Lambda code locally
  2. Write some unit tests against the code to check it’s running correctly
  3. Refactor the code to extract the reuseable business logic into a separate module, ready for reuse (using the unit tests to check I haven’t introduced any regressions)

Running AWS Lambda code locally

There are some great tools from Bespoken that make it pretty easy to run your AWS Lambda code locally.

The steps are as follows:

  1. Install npm install bespoken-tools -g
  2. Start the proxy server by running bst proxy lambda index.js where index.js is your Lambda code module

This sets up the Lambda function listening on http://localhost:10000 for requests.

Writing unit tests against the Lambda code to check it’s running correctly

Firstly, I wrote a simple test harness that would build some JSON in the same format as an Alexa request, which then POSTs to the proxy server setup as above and checks the response.

My skill uses dynamic data (my football team’s fixtures and results) that changes over time, so for my unit tests I just wanted to check the first part of the response - generally the non-dynamic part.

This was sufficient for my refactoring efforts, and I didn’t want to go to the effort of mocking the data requests part of my code right now.

I then wrote some simple Mocha unit tests to call each of my skills intents, and verify the response was basically as expected.

Refactoring the code

By adding the following sections to my package.json file, it makes it easy to simply run npm test to run all of the unit tests:

"devDependencies": {
    "bst": "0.0.1",
    "mocha": "^5.1.1"
  },
  "scripts": {
    "test": "mocha"
  }

I then moved all of by business logic to a separate yeltzland-speech module, and checked the tests still passed after each change, and I’m pretty confident I didn’t introduce any problems even though the code logic has been significantly refactored.

The code

If you are interested in the code described here, it’s all at GitHub at https://github.com/bravelocation/yeltzland-alexa/

Privacy Policy for My Apps

In an ideal world I'd like to transition to a fully self-hosted analytics and notification service, but until such a solution exists, I can’t see any practical alternative to using offerings such as those provided by Google

N.B. I posted an update to this policy in June 2019

There has been lots of news recently about privacy, driven mostly by Facebook’s rather laissez-faire approach over the years, as well as the upcoming GDPR changes required by the EU.

I’m pretty happy with the general tone of GDPR, and although it adds overhead to the development process, I think forcing developers to be clear about their use of your data is a very good thing.

Therefore I thought it would be a good time to outline the approach I generally take on my mobile apps around data, and try to justify the trade-offs I’m making.

Crash monitoring using Crashlytics

It’s extremely useful to have a crash monitoring reporting system in place, so I can see any problems as soon as possible. If an app is having non-critical but important problems it’s good that I can diagnose and fix any issues promptly (especially now Apple’s review process is much quicker).

I also get basic usage figures (daily and monthly active users) from Crashlytics that are sufficient for what I need.

Crashlytics is part of the Fabric suite of apps that was part of Twitter and has recently been bought by Google. I really like the simplicity of both their integration process and their website.

Obviously they/Google - like most of their development tools - give this away so they can get aggregated data about which apps are popular, presumably for search ranking and other corporate needs.

I also assume they do send enough information that allows them to piece together a (semi-)anonymous usage pattern for all the apps on a device that use Crashlytics, but that’s just idle speculation.

I think the trade off is (just about) fine, especially as there is no open-source self-service alternative that I know about. I’d definitely be interested in such a solution if it’s was easy to install and maintain, but practically I don’t have the time or interest to roll my own. I’m definitely going to investigate this more though.

Google Analytics

A couple of my clients have existing Google Analytics solutions for their websites, so I’ve added GA tracking into their apps so they have a one-stop solution.

I’ll only do this is really necessary, as for me the Crashlytics-only solution outlined above is sufficient. I don’t want to capture more information than necessary. However now that Crashlytics are owned by Google I’m not sure this policy makes sense, and I suspect the products will be merged together in the not too distant future.

Firebase notifications

For those apps that require remote notifications, I’ve moved over using Firebase to manage this process.

Firebase offer a nice cross-platform solutions for sending notifications, and again offer something I really don’t want to build myself.

Firebase is another Google acquisition, so just about all the caveats I mentioned above for Crashlytics apply here also.

Summary

Clearly I’m heavily dependent on using free Google services to provide useful services to help run and maintain my apps.

On iOS at least, Apple’s dedication to privacy means I think the trade-off is a reasonable one as the amount of personal information transmitted is reasonably restricted.

On Android, I’d assume that because the user is almost certainly logged into a Play Store account it’s easier for Google to join the dots on what you’re running on your device, but seeing as their Play Store data exposes that anyway it’s no additional change in your privacy.

In an ideal world I’d like to transition to a fully self-hosted analytics and notification service, but until such a solution exists, I can’t see any practical alternative.

Let me know on Twitter via @yeltzland if you know of a good alternative solution!

Linking to a Swift Framework in Xamarin

It's a right fiddle to link with a Swift framework in Xamarin. Here's an example of how I did it

I recently had to add a Swift framework into a Xamarin iOS app, and it was really complicated.

There are multiple web pages that try to explain how to do this, but none of them matched my eventual solution. Therefore I thought it would be useful to share what I did, in case it’s of some use to someone wrestling with the same problem.

The problem

A client I’m working with wanted to integrate the Visa Checkout SDK into their Xamarin iOS app.

Visa have pretty good documentation on how to do this for a native iOS app, but the latest version of their SDK is written in Swift.

Now Xamarin has a tool called Objective Sharpie which lets you import an Objective-C library, but it doesn’t natively support Swift frameworks.

However with quite a bit of effort you CAN get this to work. This is what I did …

N.B. For other frameworks and/or setups these exact steps may not work for you! Take what you can from these instructions and good luck (you’ll need it!)

Instructions for binding the VisaCheckoutSDK framework

  1. Install Objective Sharpie - instructions here
  2. Setup a new binding project in Visual Studio for Mac, using Add -> Add new Project … -> iOS -> Library -> Bindings Library on your existing solution
  3. Download the VisaCheckoutSDK via Cocoapods:
    • Make a new directory
    • Run the command sharpie pod init ios VisaCheckoutSDK.
    • This uses Objective Sharpie to setup a pod directory to download the latest SDK code.
  4. Hack the info.plist file in the downloaded pod to make it look like it was built by the version of Xcode Objective Sharpie uses:
    • Run sharpie xcode -sdks and note the SDK used for iPhone builds e.g. “iphoneos11.2”
    • In the info.plist file in the downloaded VisaCheckoutSDK pod, change the DTSDKName to the value from above e.g. “iphoneos11.2”
  5. You can now generate the C# binding files used in the binding project you setup earlier by running sharpie pod bind
  6. Overwrite the ApiDefinition.cs and StructsAndEnums.cs files in the binding project with the files generated in the previous step

The generated files won’t build out of the box, so need editing to get them to work. This is what I did to make them usable:

  • Changed enums in StructsAndEnums.cs to be long
  • Added Namespace of VisaCheckoutSDK.Touch to generated files
  • Fixed up warnings by commenting out duplicate implementations in generated code i.e. those that have the same parameters etc.
  • Removed empty generated interfaces
  • Commented out “using VisaCheckoutSDK” in ApiDefinition.cs - not 100% sure why this was necessary!

Hopefully you can then build the linking project, and hence use it in your Xamarin iOS app.

One thing to note is I had to add multiple Xamarin.Swift3.* packages in the main app to allow the Swift VisaCheckoutSDK to run. Without the correct Swift packages, the app failed to load after the splash screen on startup.

It was very difficult to consistent get the error message for the missing Swift packages, as the app often crashed before the debugger could attach and get the error message in the logs. I ended up adding all the Swift packages I could find, and then removing them one by one until the app crashed again. Clearly not a great way to do this :(

Summary

This was insanely hard, and took me a good couple of days to figure out all these steps.

Having to do this is definitely a big downside of using Xamarin versus native iOS solutions, and has definitely made me think again about when to use Xamarin for cross-platform projects.

Cross Platform Mobile Development Options

What technology to use when developing multi-platform apps? It's complicated

Over the last 12 months or so I’ve been lucky (probably!) to use a whole bunch of different technologies to build mobile apps.

I’m often asked about what’s the best strategy for efficiently building cross-platform apps, so I thought it might be interesting to look at the pros and cons of each approach to clarify my thoughts a bit.

TL,DR; It’s complicated, and depends on your business needs.

(New post: Read how my thoughts have changed in 2021)

Challenges

I don’t think it’s too controversial to state that apps written using native SDKs generally offer better user experiences than those written using cross-platform technologies.

Any solution that reuses the same code on both platforms by definition can only use common elements, and therefore can’t fully utilise all native elements on an individual platform.

In particular, any solutions use a browser control and use web technologies just don’t feel as slick as one using native SDK elements.

For most businesses, when they want a mobile app they want both an iOS and Android app.

It goes without saying they want to reduce the cost of doing this, but the UI quality they are happy with is a key factor in deciding what is the appropriate solution.

Cordova/Ionic

I inherited a project using Ionic, a Javascipt/Cordova/Angular framework that - as their website states:

… emulates native app UI guidelines and uses native SDKs, bringing the UI standards and device features of native apps together with the full power and flexibility of the open web

Once I’d got back into slight craziness of Angular, it was quite a nice way to develop. In particular the app worked pretty much identically on both iOS and Android without any extra effort.

The app is a pretty simple database-driven solution that presents information and checklists for workers when they are on a remote site. In this case functionality is much more important than a superslick UI.

Usuability was definitely more “functional” than “crafted” (although well designed), but TBH despite a lot of effort spent polishing the code, you can tell it’s a web app underneath.

Animations and transitions don’t feel native, and other UI elements aren’t quite right too - not surprising as they’re being emulated in a browser control.

For this particular app, using Cordova and Ionic was actually a decent solution. Development costs were definitely reduced compared to building two native apps, and the trade off with reduced UI quality was probably the correct one.

Xamarin

Despite having many years experience using C#/.Net I’d never considered using Xamarin before, mainly because it was quite expensive at first before Microsoft bought it a few years ago.

I inherited an app using Xamarin that needed completing, and I was pleasantly surprised how productive it was. All of the business and data logic could be shared across platforms, which was great saving of both development and testing time.

The app downloads lots of information from a web-based CMS to display to the user, as well as allowing them to build up a local photo gallery.

The downloaded info is stored in a local Realm database, which has a nice cross-platform Xamarin implementation.

Each platform has its own native UI code (which obviously means it was written twice). However this isn’t quite as inefficient as it sounds as much of the code is simply binding to UI elements to data objects and watching observable events in the shared data library.

The main non-reusable work is designing the native layouts for each platform. This does mean you can design the best UI for each platform, and in consumer-focussed apps this may be an acceptable trade-off.

For apps where the user experience is important, and there is a lot of business/data logic that can be shared, I’d definitely recommend Xamarin as a good solution.

I should also mention there is a “Xamarin Forms” option not used here, a cross platform UI solution that uses native elements - obviously only UI elements common across both platfors. In some cases this may be an even better solution, especially for simple UIs.

Separate native apps

For a few of my clients I’ve built two separate versions of the same app for iOS and Android, using the native SDK for each.

Clearly this means none of the code is reused, but I don’t think it’s necessarily as inefficient as it sounds.

In both cases I built most/all of the iOS app first, which meant the UI/navigation/business logic could be iterated on until everyone was happy with it. Then once all of the problems had been solved, it was “simply” a case of reimplementing on Android - a task that was really easy (if a little boring) as it was very well defined what needed to be done.

As a small team of one, this is actually a pretty efficient way of working, and probably wasn’t much slower than the Xamarin solution given above. This is especially true for apps where most of the work is in crafting nice UIs rather than writing complicated business/data layer code.

Other advantages of building separate fully native apps, including having the quickest way of implementing new technologies, plus the code written in a way that is easier to find developers to maintain and extend.

Other options

Clearly there are other cross-platform solutions out there - in particular React Native looks very interesting - but as I haven’t really used them I won’t comment on them here.

Summary

As an experienced C# developer, for projects where I get to choose the tech stack I’ll definitely consider using Xamarin again. Having native UIs and reusable business/data code is probably a good compromise in may circumstances. If you have existing .Net development skills, I’d definitely recommend looking into going down this route.

For larger teams, or on consumer-focused projects where UI quality is important, building native apps per platform can result in the highest quality experience, even if not the cheapest solution. Finding developers with native skills on each platform is probably easier too.

Using Cordova/web-based technologies can be the cheapest way of building cross-platforms mobile apps - especially if you already have those skills in house - but there is definitely a drop off in the UI quality compared to native solutions.