Issues this Morning: What, Why, and What-now

Disqus experienced a database error that resulted in a number of people unable to access the service. This lasted for about 45 minutes until we were able to get everything consistent again. While we don’t have a cute aquatic mammal for this situation, we do have this comparably mundane (and brief while hopefully thorough) post describing what’s up.

Why it happened

Some of the heaviest hits on the database come from the widgets — particularly the “Popular Threads” widget. We use memcached for this relatively expensive query, but an expired cache combined with a spike in traffic will cause a dog-pile effect.

From highscalability.com:

Data freshness requires a refrigeration truck or an expiry time on your cache entry that causes stats to be periodically recalculated. Now, what happens when your cached data expires and a 1000 requests simultaneously try to recalculate the expensive to calculate data? Database load spikes and the world nearly ends.

The world didn’t end, but it doesn’t make for a happy Friday morning.

Why it won’t happen again

Ok, that’s a bit presumptuous. Let me restate that.

What we’re doing to prevent it from happening again

Scaling is not a problem we’re only beginning to address. It has always been a priority ahead of everything else. We’re building out a service that you can depend on, not just a product with fancy features. Fortunately, we know how to address the issues.

We are implementing MintCache, which will alleviate some of the woes around the dog-pile effect. This past week was spent scrutinizing and improving the core architecture. And by next week, we hope to finish scaling out the database servers.

With the new help of Andrew and Devin, we’ll be building out Disqus faster than ever. I’m very proud of what the team has been up to lately and I hope you all will be able to directly experience the results.

Communication

You rely on us to provide a vital component of your blog or website. We don’t ever forget that and we realize we let many of you down this morning. I apologize. As always, if you don’t feel comfortable with Disqus, you can export your data from us. We are working hard to make sure you feel comfortable with our service, so please voice your concerns.

We’re listening. We monitor tweets, email, and blog posts. So let us know the good and bad. Just like you, we want Disqus to be great.

View Comments

Daniel on June 6th 2008 in disqus

  • wallace530
    great post sir..
    thanks for sharing. really helped a lot here.
    --------------------------------------------------
    Ugg Boots | Uggs
  • Dear Disqus,

    Thank you for the very useful commenting system you've made.
  • Dear Disqus,

    Thank you for the very useful commenting system you've made.
  • linda999
  • linda999
  • linda999
  • warsaw
    good
  • warsaw
    good
  • ditto!
  • Amit
    Just wanted to make sure as to how the comments appear.
  • I have seen this happen on blogger too. Had to disable comments completely to take care of it.
  • Excellent Communication Guys! Keep it up!
  • Excellent communication! More business could take a lesson from you.
  • Excellent Communication Guys! Keep it up!
  • Excellent communication! More business could take a lesson from you.
  • Hi, personally, when I first joined I already expected downtimes. I've already thought about well, what you've explained. It's almost impossible not to experience those spikes and not to cause downtimes especially for a service like yours (which is growing exponentially), and a startup.

    I'd be afraid if you don't experience that :p you know what I mean? hehe.

  • Good to hear the update. Cheers @Niclas, and his comment.
  • Good to hear the update. Cheers @Niclas, and his comment.
  • Disqus is doing a great job, I totally understand and sympathize with your scaling delimma. In fact, please keep us informed about your solutions; scaling is a common problem and a lot of us would like to learn more about how different products deal with it.
  • thanks for the quick turnaround and detailed explanation. fwiw, i'm keeping my disqus comments.
  • Question: does the API plugin revert to standard WP comments when Disqus is down? I ask only because all of a sudden I've got WP comments again on new posts, and I haven't touched the plugin?
  • Hi Duncan -- could you show me where this happened? I don't see a case of this on Inquisitr.com after a cursory glance.

    In the case of Disqus not being responsive, WP comments do appear. We'd like to handle this more gracefully with Version 2, consolidating comments from both systems when possible.
  • Amit
    Just wanted to make sure as to how the comments appear.
  • I have seen this happen on blogger too. Had to disable comments completely to take care of it.
  • Disqus is doing a great job, I totally understand and sympathize with your scaling delimma. In fact, please keep us informed about your solutions; scaling is a common problem and a lot of us would like to learn more about how different products deal with it.
  • Mighty fine job guys, good look!
  • These things happen. Thanks for figuring it out and working on it so it's less likely to happen again. And thanks for replying to my email.

    How come J. Phil gets a T-Shirt? ;-)
  • playitcool
    Dear Disqus,

    Thank you for not sucking.
  • ditto!
  • Mighty fine job guys, good look!
  • I'm as impressed with your explanation as I have been your quick responses to questions posed. It's good to hear you guys have the situation firmly in hand.
  • Sorry, that was me above. :)
  • Tom_Fishman
    Great job getting back up and running quickly guys, and thanks for the communication!
  • Nice job communicating the situation. I have confidence that your team will take the necessary steps to prevent this from happening again.
  • playitcool
    Dear Disqus,

    Thank you for not sucking.
  • Thanks, Daniel.
  • You're still cool with me Disqus!
  • Since Daniel is too nice to say it, I will:

    Remember it's only beta, and that Disqus has just recently started.

    And a good beta at that.
  • next time you have a problem like this, just twitter about it. haha.
  • Tom_Fishman
    Sorry, that was me above. :)
  • I'm still sticking with you guys, for many reasons including the hopes that, one day, I will get a T-Shirt.
    By the way, MintCache looks sweet.
  • Thanks for the explanation, from a very near-future customer :)
  • Tom_Fishman
    Great job getting back up and running quickly guys, and thanks for the communication!
  • Awesome, thanks for the recap & reassurances!
  • Nice job communicating the situation. I have confidence that your team will take the necessary steps to prevent this from happening again.
  • Thanks, Daniel.
  • next time you have a problem like this, just twitter about it. haha.
  • Thanks for the explanation, from a very near-future customer :)
  • Awesome, thanks for the recap & reassurances!
blog comments powered by Disqus