Saturday, December 20, 2008

BlackBerry Support

No, not an early April fool's joke. It's possible to use a BlackBerry device with Nuevasync. An intrepid user told us that they were able to sync (apparently without trouble) using Astrasync. Now, we're not exactly sure if this is a useful capability for BlackBerry users, since there's already a supported Google sync solution, but we thought it was at least worth mentioning here.

Friday, December 12, 2008

Google Deletion Fix

A few users noticed trouble deleting entries, particularly appointments, since yesterday.  We tracked it down to Google's new library and found a spot where, if their HTTP server issues a redirect, some necessary information is lost when it retries, causing the operation to fail.  It is pretty rare, as it only affects deletion and only if there is also a redirect.

We made a fix and built the Google libraries ourselves, which we deployed a little bit ago.  So far we have not seen any similar messages in the logs and we are no longer able to reproduce the problem.

Happy syncing!

Google contact sync problem affecting some users

As noted in the comments on the article below, there is a problem that affects some users that arose when the new contact code was deployed last night. We're still investigating but it appears that users who set up their contact sync some time ago were assigned authentication tokens that do not work with the new Google contacts API. The problem is fixed by logging into our web site and re-requesting access to your Google contacts data. This results in a new token being assigned.
This problem does not affect all users, we suspect only those who signed up several months ago. The real problem here is that our web site does not yet detect and diagnose the problem, and the 'green light' checker says that everything is ok. We will be fixing that later today once we figure out exactly how to detect the problem.
At present, if you had working contact sync yesterday, and now it's not working, and you see 'contact support' in the sync status page, and your account was setup a while ago, then please perform the request access procedure again.

Update: the web site now does detect and display this problem.

Thursday, December 11, 2008

Nix on the Suggested Contacts

no_suggested_contacts_sm

Syncing only your "real" contacts has been one of our most requested features, and today it is here!  Google released version 2.0 of their API for contacts today.  With this new version we are able to discriminate between your real contacts and your suggested contacts (which include basically anyone you have ever sent an e-mail).

We worked hard to get this out to all our waiting users, and we are happy to say that as of about twenty minutes ago, suggested contacts are no more.

The feature is automatic for any suggested contacts which Google creates from now on.  However, if you already have suggested contacts on your phone we won't actively seek them out to delete them.  If do you want to get rid of them, you will need to resync your contacts (instructions); and in just a few seconds you'll be free of your suggested contacts.

Happy syncing!

Sunday, November 23, 2008

Service Restored

We believe that the stability issues we've been experiencing today have been resolved.

Service Instability Tonight

We're working on a thread leak bug that cropped up after the new SSL connection performance issues were fixed. This results in the web site becoming very slow, and syncing for any device that attempts to make a new connection will also be unreliable. This issue should be resolved before the morning.

Friday, November 21, 2008

New Connection Delay Problem Fixed

Over the past few weeks we've noticed a problem where new SSL connections to the service sometimes took tens of seconds to become established. This issue also affected the web site, since it requires https.
The problem has been identified as a bug in Apache Tomcat's APR connector. A workaround was deployed tonight. The result should be slightly quicker syncing and smooth web site page loads at the busier times of the day.

iPhone 2.2 Firmware Released

Apple released their iPhone/iPod Touch 2.2 firmware update today. We're excited about this because it includes a fix for the 'freezing iPhone' problem that some users have been experiencing. We'd like to hear from users who have updated their devices, particularly users who previously had experienced the freezing syndrome.

Monday, November 17, 2008

Plaxo Back Up

Plaxo have been kind enough to white-list our IP addresses on their high traffic blocker, while we resolve why we're overloading their API. We need to determine what level of traffic is acceptable to Plaxo and dial down our traffic to suit. So Plaxo service is back up at present.

Saturday, November 15, 2008

Plaxo Again Not Working

Plaxo sync has been down for several days. Since we've not been able to resolve the issue with Plaxo. They say we're using their API excessively, but after a fair bit of work on our end (we added special rate-limiting code to our service, used packet sniffing and log file data to analyze our API use) we don't see any evidence of excessive traffic. Until we can correlate what Plaxo are seeing with something we have control over on our end that we can fix, unfortunately Plaxo sync will be down.
Google contact sync is working well, and it's not hard to export contacts from Plaxo and import to GMail.

Thursday, November 13, 2008

New Data Center

Service is back up, running in our new Silicon Valley colocation facility. The move to the coast gives us access to vastly increased bandwidth, redundant network paths, highly reliable power and plenty of space for expansion. Plaxo service is also currently working, since we have new IP addresses that Plaxo are not currently blocking.

Wednesday, November 12, 2008

Server Maintenance

We are making some improvements to our network.  The service will be down temporarily while they are completed.  The down time should be brief, and everything is scheduled to be back up shortly.

Sunday, November 9, 2008

Plaxo Sync Back Up

We've resolved the problems with Plaxo sync (which related to the high volume of traffic we were sending to their API). Plaxo users should be syncing now.
UPDATE: it's back down again. It looks like we tripped the excessive traffic limit at Plaxo again. We're investigating...

Monday, November 3, 2008

Plaxo Sync Down

Plaxo sync is currently not functioning.  Our servers are unable to communicate with their servers.  We are investigating the cause.  So far we do know that it is not our code and it is not our general Internet connectivity, but we are still looking into what is happening, and what we can do.  Stay tuned.

Sunday, October 26, 2008

iPhone Battery Life Tests

Last night I was able, for the first time, to test the theory that the iPhone consumes significantly more power when doing 'push' sync with our service. I left a fully charged iPhone on a shelf in a location with 5 bars ATT 3G service, and push enabled (which is the default configuration). This morning, after 12 hours on the shelf, the iPhone's battery level indicator was still at 'full'. At first I suspected that it wasn't actually syncing so I pulled the server log records for the device. These indicated that the phone had completed a 'ping' sync operation roughly every 7 minutes all night, which is exactly what we'd expect it to do. In previous investigations, when users have reported increased battery drain over night, the server logs showed exactly the same thing: normal regular pinging.
Although this is a single data point, it does tell me that whatever is leading to the reports we see of significantly increased batter life is probably a function of the cell radio in the phone, rather than the sync client or some strange interaction with our service. Perhaps when it's in a location with marginal service it burns much more power sending packets or flipping back and forth between towers, for example.
I'm keen to dig deeper into this mystery and to do so we'll need reports from users who either do or do not see major battery drain from an otherwise idle iPhone that's using push sync. Please send any reports to support@nuevasync.com with subject 'iPhone Battery Drain Investigation'.

Tuesday, October 21, 2008

Update: Service Restored (was: ISP Down Hence Service is Down)

Update: our Internet service was restored a few minutes ago. Service is running normally now.

The ISP that provides connectivity to our servers is currently totally down. Yes, the entire multi-state ISP, affecting many cities and businesses. The outage began a couple of hours ago. They don't know the cause, nor how long it will take to restore service. My own investigation suggests that the problem is at their peering point with the outside world, not anywhere local to here. Needless to say we will be changing ISP as soon as we can. Apologies to our users. Service will be restored as soon as our ISP gets their act together.

Saturday, October 11, 2008

iPhone 2.1 Freezing Update

I'm sure many users are wondering what's been happening with this issue. Although we're not yet able to say that the problem is 100% resolved, some significant progress has been made. After a detailed analysis of server log records from devices belonging to users who reported freezing had happened, our engineering team were able to reliably reproduce the freezing problem.
There are two parts to the freezing syndrome: why the device freezes; and the conditions that led to it getting into the frozen state. On the first part, we believe that there is the potential for deadlock in the iPhone 2.1 sync software. We're also confident that the deadlocking problem will be fixed in the next iPhone software update. We don't know when that will be released.
Freezing seems to occur when a particular set of circumstances arises : a change is pending from Google, but the iPhone times out reading the change from our servers; then later before the device has caught up with that missed change, a second change is made on the device. Having discovered the set of conditions that can lead to the device deadlock, we wondered if we could make changes to the service that would reduce or even eliminate the potential to trigger it. As a result new service code was deployed this past Wednesday. It makes sure that any changes from Google are flushed to the device soon after they are seen. The result is that any device that might have got into the pre-freezing state, where a change was missed due to a timeout, will no longer do so. Unfortunately devices that were already in that state before the new code was deployed can still freeze up. This is because our change only addresses the first stage towards freezeing, not the second, which happens outside our control, on the device.
So far the results are encouraging. The number of users reporting new freezing episodes has dropped significantly. Evidence we are able to gather from server logs is also positive.
However, I don't feel that we can declare complete victory yet. There may be other conditions that can trigger the deadlock than the ones we have studied.
We'd like to determine the best method to un-freeze a device. So far only the 'Reset All Settings' method works reliably for us, although users have reported other methods working for them here (change Neuvasync password, turn on flight mode, etc). If you have thoughts on this please post comments.

Monday, September 29, 2008

iPhone Configuration Article on theiLife.com

Nuevasync user Keith Hobin from theiLife posted an excellent article today that has a step-by-step Nuevaysync-on-iPhone configuration guide, including the first documentation we've seen on how to preserve your existing contacts and calendar events (the iPhone rather unhelpfully deletes them when you enable sync). We're grateful to Keith for taking the time to research and write the article. Thanks Keith !

Sunday, September 28, 2008

Service was down for an hour or two

UPDATE: Service was restored at 12:21 MDT.

Service (including the web site) is currently down. The cause is a defective ethernet card on the router that feeds the servers. The backup router isn't responding so we need to get physical access to the server room to fix the problem. That will take a couple of hours, at which point service will be restored.
Update: actually the Ethernet port is fine, but the Linux kernel on the router has a bug that led to the interface being mis-named when the router booted. For reasons not yet understood, the router rebooted this morning and it appears to have picked up a new kernel at that point, with the bug in it. I'm updating its kernel now and that should fix the port problem.

Tuesday, September 23, 2008

Fixing Duplicates

Though not common, over time there have been users who have reported entries duplicated either on their device or at Google/Plaxo.  As we've made continued improvements to squash duplicates, we've kept in touch with those users.  But, we want to broaden access to that information and make sure all our users know first, where duplicates can come from, and second, how to deal with them.

By far the most common cause actually isn't a NuevaSync problem at all, but a bug in the Apple synchronization client.  The precise details are rather convoluted, but the quick version is that if an error ever happened when syncing, it was possible for the device to get confused and start adding entries it already had.

In response to that, we’ve written and deployed code that actively intercepts duplicates before they get committed.  It isn't possible to intercept 100% of duplicates this way, but the results so far have been excellent.  And since the code is in our server, it applies equally to devices of all types and versions: iPod, iPhone, Windows Mobile, Nokia, etc.

Apple apparently noticed the problem as well, and another major step came with the release of the 2.1 firmware.  When we began testing the 2.1 beta releases a few weeks ago, we noticed an improvement in the devices' behavior; they did not create duplicates in the same circumstances as the 2.0 devices.  This is a great enhancement for Apple users, and though we are still a little reserved about 2.1 (cf. some of our earlier posts), we are very pleased with this particular fix.

While those changes (and several other, smaller ones we've made) work to prevent duplicates, what do you do if you already have them?  To answer that we have created some experimental tools to "dedupe" your account.  The tools for Google Contacts and Calendar have been available for some time, but today marks the release for Plaxo.  All these tools are specifically targeted at removing duplicates which might have been created through us.  If your service offers its own tool, you may want to use that instead.  Plaxo, for example, offers an advanced duplicate merging and removal tool which is well worth a look.  Our tools are experimental and by their nature designed to delete entries.  Make a backup first and exercise caution when using them!

They can be accessed at:
https://www.nuevasync.com/PublicSite/user/tools/dedupe.htm

Detection and removal can take a few minutes, so be patient when it runs.  If a lot of entries are removed, it is probably a good idea to resync your device afterward (cf. https://www.nuevasync.com/PublicSite/user/troubleshooting.htm), but that is entirely up to you.

By raw numbers, duplicates have never been common, but nevertheless they have been one of--and perhaps the--most serious of issues to us. With these new measures, we’re hoping most users will never have a problem.  If you do, e-mail us at support@nuevasync.com and we'll do everything we can to get it fixed and you back syncing.

As always, many thanks to our users!

Thursday, September 18, 2008

2.1 Firmware Investigations

Just a quick update, but following information provided by diligent users, we have some lines of investigation which we are pursuing on the freezing issue which can sometimes occur with iPhones or iPods running the new 2.1 firmware.  We've also managed to reproduce this ourselves on a test device, and are working on a consistent reproduction case to nail it down.

If you encounter this problem, please do e-mail support with the subject 'iPhone Freezing Returned'.  We've had a number of users send in reports, but more information is always good.  Please include some details about--as best you can recall--the last operations you were performing before the freeze.  That information is especially helpful.  Things like, were you adding an entry,  changing one, or not changing anything?  If you were changing one, was there anything unique about the entry?  We may not respond individually to each e-mail, but we are studying them.

And our thanks again to all the users who have sent us information.  Thank you!

Wednesday, September 17, 2008

More on the iPhone Freezing Syndrome

So far unfortunately we have been able to identify any specific data returned by the service that might be upsetting the iPhone sync client. The best workaround we have is to use 'Settings\General\Reset\Reset All Settings' to clear the device sync state. This will also reset various other setting such as your icon positions and stock tickers. Most users who have used this method report success, although some have reported the freezing subsequently returns.
If you have had freezing return after device reset, please send us e-mail and mention 'iPhone Freezing Returned' in the subject so we can identify your message quickly. We want to check the server logs for those users to see if we can correlate us sending a specific event to their device with the onset of freezing.
Also, if anyone has the freezing problem and is also an iPhone developer, we'd be interested to see the log from your device (can be viewed using the XCode utility with the device docked).

Tuesday, September 16, 2008

iPhone 2.1 Freezing Syndrome

It would seem that all is not well with the 2.1 firmware update. We've had reports from several users (although by no means all users with the 2.1 update) that their phone freezes while attempting to sync. Reading the comments on the previous article and the support e-mail, it does appear that the problem can be resolved. We're working to determine the exact correct set of steps to un-freeze the 2.1 iPhone and will post it when it's ready. In the meantime it might be prudent to not update your phone/iPod to 2.1 just yet.
We were a little asleep on this issue unfortunately because although there were a few support tickets over the weekend that mentioned freezing, we had seen similar reports trickle in for the past couple of months (from users with earlier firmware) and so for a while we assumed this was just more of the same random iPhone hangs, and not something related to the use of our service. There were more reports on the problem, and more detail in the blog comments here than in the support e-mail, but we were focusing on the support e-mail.
Check back here for updates on this issue.

Saturday, September 13, 2008

iPhone 2.1 Firmware Update

A number of users have written to report that they had problems after updating to 2.1. There are no known problems with the 2.1 firmware and Nuevasync. However, the firmware update process seems to reset the sync state on the device which results in a 'resync' the next time the phone contacts our service. The resync operation is quite expensive and can take a while. Also at times a resync can fail if Google is busy. The users who reported problems subsequently let us know that their iPhones picked up sync again after a while. During the time between the firmware update and the next successful resync, your iPhone will loose its calendar events and/or contacts. Therefore we recommend updating your firmware at a time when loosing sync for a time will not cause major inconvenience. If in doubt, check your device sync status.

Saturday, September 6, 2008

Scheduled Maintenance Complete

As mentioned in an update to the earlier notice, scheduled maintenance has been completed for all servers and the service has been turned back on to full capacity.

A Post on Service Quality Issues

In the interests of preserving the useful comments on this post (which was originally about a scheduled downtime required to replace defective hardware and to deploy a faster database machine), I'm leaving this post up forever.

The hardware upgrade appears to have been a great success. Server load is quite low and so far there are no reports of serious problems.


The post originally said :

Nuevasync service will be unavailable for a time today while a major hardware upgrade is done. The planned outage time begins at 11am MDT (18:00 UTC) and is expected to last two hours. During this time the web site will redirect to a maintenance page and no syncing will be done. iPhone/iPod users : PLEASE DO NOT DO ANYTHING. Don't 'resync'. Just leave your phones alone and they will pick up sync again when the service is restarted. If you resync, all your data will be deleted from the iPhone and it may take a long time for it to be synced back.
Update : the upgrade went well, and it turned out we had a bad hard drive (bad in a bad way in that it didn't report any errors but silently hung the system when certain sectors were read). Dealing with the drive issue (which means re-installing the OS on the machine since the old drive couldn't be imaged) will take a little longer than the expected down time. We should be up in about 45 mins time (2:30 MDT, 21:30 UTC).
In response to the people who commented that the old service status posts should not be deleted : well I agree in so far as I'd like the comments preserved, but on balance it seemed messy to leave all those rather quickly written and often in hindsight incorrect posts on the blog. What we really need is a page that a) stays up when our connection or servers are down and b) isn't a blog. Right now we don't have that in place so the blog got abused with the status posts. It's not ideal but we've taken the approach that you the users would rather we fix the problems than make things all pretty and neat.

Saturday, August 23, 2008

Sync Status Page Goes Live

One of the challenges for us supporting iPhone and iPod Touch users is that the devices give absolutely no feedback to the user about their sync. They don't even get to know if it worked or not ! The whirring sun thing rotates and bit, and then all is silent. We wanted to provide a higher quality user experience. Since our web site has access to any error information logged when your device tries to sync (we already use this information when answering support tickets), the obvious next step was to provide a way to see it on our web site. This feature has been in development for a while and finally made it onto the live site earlier today. Here's the page. If an error has been logged, the page will display information about how to fix it. There is a simple expert system that understands how to diagnose the most common errors. If it doesn't understand the error then you will be advised to contact our support team.

Thursday, August 21, 2008

Nokia Configuration Instructions

Although we don't have any Nokia devices ourselves, our server logs indicated that several users have been using them with the service. Until now it was something of a mystery how they'd configured their phones. I'm very happy to see that one of our users has spent the time to write up the details in this article. One thing we're not sure about is whether all Noika phones are similar so if you have experiences with other models we'd be interested to hear about them.

Tuesday, August 19, 2008

Sync for all Google Apps users

After a few weeks where some users reported that sync didn't work, and by the way they were a Google Apps user, we have finally tracked down the cause. Thanks to one of our more persistent users who diagnosed the key issue and let us know his findings. Google Apps users who were seeing a 'red light' in the status page on our site should be able to get to green by simply requesting access to their Google account again. Click on 'setup' then on 'Request Account Access' and follow the instructions. If you have more than one Google account please make sure you have logged in to Google using the right one (our red/green test checks this so if you have green you're good to go).

Wednesday, August 13, 2008

Calendar status now available on the web site

One of the major causes of confusion for our users, and consequently support e-mails sent to us is that we don't sync all your Google calendars. More accurately, we sync calendars that you have permission to write to, and that have timezone information.
The rationale for not syncing calendars that you can't update was that this could lead to devices becoming out of sync with Google. You could change an event from one of these calendars on your phone, but we would not be able to update the event in Google's system. We recently added code to support one-way sync for read-only calendars but found in trials that users didn't always want those events on their devices. So the feature is on hold until we have the capability for users to select which calendars are synced.
We don't sync calendars that lack timezone data for the simple reason that we need this information in order to create the responses we need to send to your phone. Without the timezone we'd either need to guess it, or not sync the calendar. The choice made was to skip those calendars. You can't create a calendar without timezone data using Google's web site, but we have seen cases where third-party applications and sync tools have done so.
A new feature on our site today gives users the ability to see which calendars will sync and the reason for any non-syncing calendars. Simply visit this page to see your list.

Monday, August 11, 2008

The Case of the Missing Phone #'s - Closed

Over the last few days we'd had several reports from users that while their contacts were synced over to their phones, they arrived sans any phone numbers.  Other attributes seemed to be communicated OK.

We have very good logging of the conversion process in our system, and though for a few reasons we don't retain logs for long, we were able to look through and spot the issue pretty quickly.

The devices have a common set of attributes which they support.  The basic set of phone attributes is two home phone numbers, one mobile phone number, two work numbers, and then a handful of less common numbers such as radio phone, assistant's phone, etc.

Google, on the other hand, supports fewer categories of phone but a greater total number; someone could have five home phones, for example.  This difference is where our phone mapping system comes in.  The problem these users had encountered was that phones which lacked any category at all, or which were marked as 'Other' at Google, were being skipped entirely.

We are still working with this users to track down how these contacts were created--it isn't possible to add a phone number with no category through Google's own site, so it is pretty curious--but we have deployed a fix where these phones are held over in a special list until we finish mapping all the categorized numbers.  If there are any free slots left in the main five (mobile, home 1 & 2, and work 1 & 2) at that time, we start using them to hold these uncategorized numbers.

The fix has been deployed as of yesterday (08/10), and so far things are looking good, as affected users have started to report they now see the phone numbers they expect.

In testing, we did find one curious related circumstance worth mentioning. On an Apple device, if one sets a phone number to the Apple 'other' category (not the same as the Google 'other' category) it isn't sent to us at all, and consequently isn't synced to Google.  We'll dig into that one more deeply, but for now, we recommend not to use the 'other' category on your iPhone or iPod, or at least to understand that if you do, it won't be synced over at Google.

Friday, August 8, 2008

Support ticket system

We've deployed RT to manage the support e-mail workload. We hope this will give users more consistent support service because every request is automatically assigned a ticket number and logged in our system when its received. Any updates made to the ticket result in an e-mail sent to the requester. We can see at a glance any ticket that has not been resolved. This was not easy to do when we were looking at a busy mailbox.
Having done this we realized that we needed a new mailbox for non-support messages such as when users send us suggestions for new features, or just when they want to say hello. Therefore we've created the mailbox info@nuevasync.com.

Event calendar move consistency error fixed

One of the issues with data consistency mentioned a couple of days ago has been fixed. It turned out that if an event was moved from one calendar to another at Google, and at roughly the same time an event was changed on the user's device, the reconciliation processing was not done correctly. This resulted in a fatal error that blocked sync for the user. This problem is now resolved.

New server deployed last night

Our main sync server died yesterday evening. We haven't yet had time to analyze the failure but it seems likely that increasing load put the machine into a state where one or more kernel bugs were triggered. Rather than spend time trying to fix it, we instead deployed a new much bigger, faster server in its place. This took a little longer but we think it was worth the wait because this morning the service looks very healthy.

Wednesday, August 6, 2008

What are we doing today?

We have a few 'interesting' blog articles in the works, but since none of those are ready yet I thought I'd begin the blog by simply describing what we're up to today. Today is an important day because it's essentially the first day since the iPhone 2.0 launch that we haven't had some kind of developing crisis to deal with. No servers melting under load; no exponentially growing Internet traffic; we're not creating duplicate events in users' calendars. We've also been successful in reducing the support e-mail load by making improvements to our web site and configuration process.
So today we're focusing on the top three reasons users are currently not able to sync successfully. We know what the reasons are, and how many users are affected because we write a detailed problem record into a database whenever our servers encounter an error. In the past, we would find this information in our server log files. As our user population grew this became impractical (our servers generate 20 Gbytes of log data per day). The new database view we have onto user errors has allowed us to get a much better understanding of the range and frequency of problems.
So what do we see ? The number one problem that leads to persistent inability to sync turns out to be when the user enabled e-mail sync on their phone. At present we don't support e-mail sync (perhaps these users really wish we did !) but we're working on it. At present our server just can't deal with a device that says it wants to sync e-mail. It throws an error and no further syncing can happen. Later in the day we are planning to compile a list of all the affected users and send them an e-mail asking them to turn e-mail sync off on their phones. They should then find that sync begins working.
The second and third most common problems that lead to non-working sync are to do with inconsistent data. Our server code performs consistency checks in a number of places. For example it checks that when Google sends us calendar event changes, that they don't give us two changes (that are different) for the same event. It turns out that sometimes they do ! When this happens our server can't continue processing the sync request, and the result is no sync for YOU! We have some good clues about the cause and are working on a fix which we hope to deploy soon.
The last of the three more common problems also relates to data consistency. What happens is that the user's device sends us changes to events, but there is no 'server id' for those events. This is logically impossible, yet it happens for a fee users. We have a few theories about this one and hope to understand the problem in more detail soon.