Saturday, August 23, 2008

Sync Status Page Goes Live

One of the challenges for us supporting iPhone and iPod Touch users is that the devices give absolutely no feedback to the user about their sync. They don't even get to know if it worked or not ! The whirring sun thing rotates and bit, and then all is silent. We wanted to provide a higher quality user experience. Since our web site has access to any error information logged when your device tries to sync (we already use this information when answering support tickets), the obvious next step was to provide a way to see it on our web site. This feature has been in development for a while and finally made it onto the live site earlier today. Here's the page. If an error has been logged, the page will display information about how to fix it. There is a simple expert system that understands how to diagnose the most common errors. If it doesn't understand the error then you will be advised to contact our support team.

Thursday, August 21, 2008

Nokia Configuration Instructions

Although we don't have any Nokia devices ourselves, our server logs indicated that several users have been using them with the service. Until now it was something of a mystery how they'd configured their phones. I'm very happy to see that one of our users has spent the time to write up the details in this article. One thing we're not sure about is whether all Noika phones are similar so if you have experiences with other models we'd be interested to hear about them.

Tuesday, August 19, 2008

Sync for all Google Apps users

After a few weeks where some users reported that sync didn't work, and by the way they were a Google Apps user, we have finally tracked down the cause. Thanks to one of our more persistent users who diagnosed the key issue and let us know his findings. Google Apps users who were seeing a 'red light' in the status page on our site should be able to get to green by simply requesting access to their Google account again. Click on 'setup' then on 'Request Account Access' and follow the instructions. If you have more than one Google account please make sure you have logged in to Google using the right one (our red/green test checks this so if you have green you're good to go).

Wednesday, August 13, 2008

Calendar status now available on the web site

One of the major causes of confusion for our users, and consequently support e-mails sent to us is that we don't sync all your Google calendars. More accurately, we sync calendars that you have permission to write to, and that have timezone information.
The rationale for not syncing calendars that you can't update was that this could lead to devices becoming out of sync with Google. You could change an event from one of these calendars on your phone, but we would not be able to update the event in Google's system. We recently added code to support one-way sync for read-only calendars but found in trials that users didn't always want those events on their devices. So the feature is on hold until we have the capability for users to select which calendars are synced.
We don't sync calendars that lack timezone data for the simple reason that we need this information in order to create the responses we need to send to your phone. Without the timezone we'd either need to guess it, or not sync the calendar. The choice made was to skip those calendars. You can't create a calendar without timezone data using Google's web site, but we have seen cases where third-party applications and sync tools have done so.
A new feature on our site today gives users the ability to see which calendars will sync and the reason for any non-syncing calendars. Simply visit this page to see your list.

Monday, August 11, 2008

The Case of the Missing Phone #'s - Closed

Over the last few days we'd had several reports from users that while their contacts were synced over to their phones, they arrived sans any phone numbers.  Other attributes seemed to be communicated OK.

We have very good logging of the conversion process in our system, and though for a few reasons we don't retain logs for long, we were able to look through and spot the issue pretty quickly.

The devices have a common set of attributes which they support.  The basic set of phone attributes is two home phone numbers, one mobile phone number, two work numbers, and then a handful of less common numbers such as radio phone, assistant's phone, etc.

Google, on the other hand, supports fewer categories of phone but a greater total number; someone could have five home phones, for example.  This difference is where our phone mapping system comes in.  The problem these users had encountered was that phones which lacked any category at all, or which were marked as 'Other' at Google, were being skipped entirely.

We are still working with this users to track down how these contacts were created--it isn't possible to add a phone number with no category through Google's own site, so it is pretty curious--but we have deployed a fix where these phones are held over in a special list until we finish mapping all the categorized numbers.  If there are any free slots left in the main five (mobile, home 1 & 2, and work 1 & 2) at that time, we start using them to hold these uncategorized numbers.

The fix has been deployed as of yesterday (08/10), and so far things are looking good, as affected users have started to report they now see the phone numbers they expect.

In testing, we did find one curious related circumstance worth mentioning. On an Apple device, if one sets a phone number to the Apple 'other' category (not the same as the Google 'other' category) it isn't sent to us at all, and consequently isn't synced to Google.  We'll dig into that one more deeply, but for now, we recommend not to use the 'other' category on your iPhone or iPod, or at least to understand that if you do, it won't be synced over at Google.

Friday, August 8, 2008

Support ticket system

We've deployed RT to manage the support e-mail workload. We hope this will give users more consistent support service because every request is automatically assigned a ticket number and logged in our system when its received. Any updates made to the ticket result in an e-mail sent to the requester. We can see at a glance any ticket that has not been resolved. This was not easy to do when we were looking at a busy mailbox.
Having done this we realized that we needed a new mailbox for non-support messages such as when users send us suggestions for new features, or just when they want to say hello. Therefore we've created the mailbox

Event calendar move consistency error fixed

One of the issues with data consistency mentioned a couple of days ago has been fixed. It turned out that if an event was moved from one calendar to another at Google, and at roughly the same time an event was changed on the user's device, the reconciliation processing was not done correctly. This resulted in a fatal error that blocked sync for the user. This problem is now resolved.

New server deployed last night

Our main sync server died yesterday evening. We haven't yet had time to analyze the failure but it seems likely that increasing load put the machine into a state where one or more kernel bugs were triggered. Rather than spend time trying to fix it, we instead deployed a new much bigger, faster server in its place. This took a little longer but we think it was worth the wait because this morning the service looks very healthy.

Wednesday, August 6, 2008

What are we doing today?

We have a few 'interesting' blog articles in the works, but since none of those are ready yet I thought I'd begin the blog by simply describing what we're up to today. Today is an important day because it's essentially the first day since the iPhone 2.0 launch that we haven't had some kind of developing crisis to deal with. No servers melting under load; no exponentially growing Internet traffic; we're not creating duplicate events in users' calendars. We've also been successful in reducing the support e-mail load by making improvements to our web site and configuration process.
So today we're focusing on the top three reasons users are currently not able to sync successfully. We know what the reasons are, and how many users are affected because we write a detailed problem record into a database whenever our servers encounter an error. In the past, we would find this information in our server log files. As our user population grew this became impractical (our servers generate 20 Gbytes of log data per day). The new database view we have onto user errors has allowed us to get a much better understanding of the range and frequency of problems.
So what do we see ? The number one problem that leads to persistent inability to sync turns out to be when the user enabled e-mail sync on their phone. At present we don't support e-mail sync (perhaps these users really wish we did !) but we're working on it. At present our server just can't deal with a device that says it wants to sync e-mail. It throws an error and no further syncing can happen. Later in the day we are planning to compile a list of all the affected users and send them an e-mail asking them to turn e-mail sync off on their phones. They should then find that sync begins working.
The second and third most common problems that lead to non-working sync are to do with inconsistent data. Our server code performs consistency checks in a number of places. For example it checks that when Google sends us calendar event changes, that they don't give us two changes (that are different) for the same event. It turns out that sometimes they do ! When this happens our server can't continue processing the sync request, and the result is no sync for YOU! We have some good clues about the cause and are working on a fix which we hope to deploy soon.
The last of the three more common problems also relates to data consistency. What happens is that the user's device sends us changes to events, but there is no 'server id' for those events. This is logically impossible, yet it happens for a fee users. We have a few theories about this one and hope to understand the problem in more detail soon.