Saturday, January 15, 2011

How Klout plans to fight spammers

Last November I found a flaw in the Klout service and wrote this blog post Can you trust Klout's accuracy. Since then I have been in touch with the CEO of Klout, Joe Fernandez. He was kind enough to grant me an interview. This interview is rather extensive so thank you in advance for taking the time to read it. As always, any feedback you might have for me is welcome in the comment section at the bottom.

 

Wesley: I have a couple of questions but before I get into those, do you want to talk about the changes, improvements that you've done to Klout since my original posting?

Joe:  Yeah, definitely.  We put ourselves out there and we want to be the standard and that's a lofty claim but, we make our scores out there in public so that people can give us feedback.  And we look at it as positive that at least we're relevant enough for people to care and we've hired the team up around wanting to be the best and keep working on this.

In terms of your feedback specifically which was clearly spam accounts with high scores, that obviously doesn't make us look great and something we take extremely seriously.  And it's a challenge, because there's so many variations on what a bot is and what a spammer is, that for us as a small team to handle and let alone even Twitter as a big team still struggles with it, but we're definitely somebody who has background from Myspace on the team and Myspace certainly is no stranger to spammers.  We're building classifiers that look at behavior and we've come up with a bunch of things that we think help identify spammers more than even what we had before and we've been slowly rolling those out. 

I wish from our conversation till now all that was in production but we have to do a lot of testing on our end that were not ‑‑ that the false positives are acceptable basically.

Wesley:  And are you able to talk about any specifics on what you're doing to combat spammers or what's in the works or what you're testing?

Joe:  We really rely on machine learning and so what we've been trying to do is dig through Klout and dig through Twitter more broadly and better, kind of see what spammers we can stumble on ourselves and throw down our classifications rules and took up more and more patterns of what Twitter spam looks like so that we can just get better at identifying it.  If you're interested in more of a technical discussion I could definitely get one of our scientists to kind of talk about it.

And I would say kind of beginning of January to middle of January you'll start seeing much more of that actually hit production.  We hope to even have an API that a thousand companies could make an API call and send us a name where we could tell them if we know it's a spam or not.

Letting other people leverage the work we're doing also.

Wesley:  Well Twitter obviously has mechanisms to identify spam.  There's a button that says this person's a spammer or report to spam and they also suspend accounts.  Are you working with them directly to use or to leverage some systems that they have already in place so you're not reinventing the wheel, or what kind of relationship do you have with Twitter in general, I guess, to help make your process a little bit more efficient?

Joe:  So it's funny and, we're actually in the same building as Twitter.  They're just right above us, so we certainly know those guys and chat with them.  We've talked to them about this in the past and it's an ongoing thing for them and an ongoing thing for us.  We've had discussions and kind of got in some guidance but there's not an official working relationship where they're sharing their insights right now with us.  And kind of the reason before was they were stretched really thin and so were we, so it's like intercompany collaboration isn't necessarily something from a formal process that anyone's very good at yet.

So we're ‑‑ we kind of, at least right now, are a little bit on our own here.

Wesley:  Even with that, what about ‑‑ for instance, the issue that I had specifically.  The account was actually suspended by Twitter but yet it was still up on Klout.  Are you pinging the service more or do you have access to the back end to know when an account has been suspended?  Is that part of your daily refresh?  Since you're going through scores daily, is that something that you check for?

Joe:  There are multiple different API feeds from Twitter.  There's the kind of streaming API, which is just the message sent out and then there's the rest API to get social graph information.

And how our service works, we're taking the streaming API and we're looking at all the retweets and interaction, the path messages and content people create on a daily basis and calculating daily scores off that.  But the problem is a suspended account is just not going to create new data, but there's no way for us to tell who didn't create new data because they have a suspended account or they're on vacation, without going and hitting the other API of Twitter.  And we do that but it's on a rolling basis because they won't let us hit the whole user base every single day.

Wesley:  Right, right.  That's an internal restriction I know that they have.

Joe:  Yeah, so we hit those accounts on a rolling basis and as we hit people, we get info on whether it's a suspended account or if the name has changed or all these different factors and then they get handled in some way.

Wesley:  Uh‑huh.

Joe:  It's always been actually one of our biggest challenges is cycling through that and so that's basically where the issue comes down to and I we're working with them and working internally to make that faster but that's just kind of why that issue is right now.

Wesley:  So this is a little bit of an inside baseball question but what would be the restriction of having a business agreement with getting the firehose from Twitter?  I know they have formal agreements with other companies to allow them to have access to the firehose.  What is the restriction or what is the difficulty of getting that for Klout?

Joe:  There's a lot.  All I can say ‑‑ we have a ‑‑ there's an agreement which let's you actually not really talk about it.  Basically if you're doing something valuable in the ecosystem and your business shows that there is a real need for that level of data, and most companies actually don't need that much data, they're very open to conversation and will try to work with you.  Everyone's different and that they're actually pretty strict. If I was advising another company on how do I get access to the Twitter firehose, I would say if you're doing something that's got enough traction both in the market and are interesting enough and a benefit to the overall ecosystem, they'll talk to you but generally I think a lot of people think they need firehose access and they really don't.

Wesley:  Okay. 

Joe:  Part of this access is actually like, in a lot of ways, a burden just because how much freaking data it is and it's like ‑‑

Wesley:  It's basically replication on their database onto yours so I know that's a lot to deal with, but I can understand the restrictions there as imposed on both sides not just getting the access but handling it once you do.  Going back to the previous statement about how accuracy is so important, I sent a message to Phil Hotchkiss who I believe is your chief product officer and I have yet to hear from him.  Do you know if he has an official stance on what accuracy is and how your company plans on improving the product to increase accuracy?

Joe:  Phil's much more intense around getting to the bottom of every single issue and has the team running reports and doing huge analysis to understand the impact of things and it's definitely brought a different level of importance around that to everyone's daily procedures.  And now's the time because we are much more in the public spotlight and people expect that and, we wanted that so since he's come on, it's been great.

Phil:  We are always trying to make our score better. Regarding your question about bots in particular, this is an area that we are focused on and we will soon be releasing a new algorithm designed to deal with bots. Interestingly, not all bot accounts are bad actors. Our research uncovered bot accounts that are intelligently designed to curate content that people find useful and that drive genuine action. People follow these accounts, re-tweet and comment on content that is bot generated and click on associated links. So the issue of bots is not as black and white as it may appear on the surface. We take the accuracy of our score very seriously and we will continually iterate our algorithm to combat bad actors in whatever form they take.

Wesley:  Speaking of Klout and scores and reputation, by any chance do you monitor your own Klout score?

Joe:  Yes.  I check everyday.  I check everyday mainly from a QA standpoint where I know my own data really well and I know how the algorithm works.  I'll even complain that I had a big day yesterday, why did my score only go up this much ‑‑ so from that standpoint, I definitely monitor everyday and then just from a competitive standpoint in the office, you know, like if we go out to get a drink or something we try to make the guy with the lowest score pay or things like that.  So it's fun from that standpoint too.

Wesley:  I know in the past with some users who had high Klout scores ‑‑ I'm not sure if you still do it now ‑‑ that you would expose them to extra features.  And I'm just kind of curious on what that magic number is that you expose those features to and what are those extra features?

Joe:  Yeah, we haven't done that in a really long time.  I can't remember the last time we did that, but we used to do it all the time.  It would be interesting to bring that back now.  I wonder how people would react.

In terms of what score it was, I don't know that we ever set one.  We used it to roll out new features and to scale them in terms of feedback so it was kind of a sliding scale where I would start high and I would keep inching the bar down. 
The reason we abandoned it was just an extra layer in getting stuff out and managing that process and we just wanted to move faster, so we just abandoned it.

Wesley:  I can understand why that sounds like it's not very automated.

Joe:  Yeah, you have to build controls to do it and then you've got to have admins handling it and then you have to message it and the changes we've made in the last, I don't know, six or eight months have been impactful enough that we wanted to get them out as quickly as possible to everyone possible.

In the new year we're spending time on two things;  our accuracy and the second is really the way we look at the consumer site.  It's what it is right now and I think people enjoy it, but we've never put the attention on it.

We've always had to spend so much of our time on the platform like how do we handle this much data versus how do we make it fun to play with and engaging and useful.  So we were excited to circle back.  There will be more features to Klout.com and it might make sense to do fun stuff and roll it out in interesting ways.

Wesley:  Speaking of features and scores, if we move on to Klout perks, I've noticed that it seems like there is a layer of data that's not exposed to users but may be exposed to brands, and that's just me looking at it as an engineer.

Joe:  Yeah.

Wesley:  For instance, looking at the most recent Tangle promotion from Klout perks I noticed that some people with lower scores the perks were available but some people with higher scores, it wasn't available so I'm guessing there's an extra layer of data that was added into that.  I was wondering if you could share a little bit about what kinds of data is not exposed and how you use that for your promotions.

Joe:  So a couple of things.  There's definitely data there, but we use the data.  The brands actually don't pick who gets in and who doesn't, we do.  The brands just trust us to get the best group of people to drive away on this, whatever the campaign is, but the things we look at are three factors.

One is geography, the Virgin one was a great example where Virgin came to us and they only wanted people in San Francisco, L.A., or Toronto so then we got all these messages from people pissed off because they had high Klout scores but they live in Boston or wherever and they just didn't fit the campaign.  Disney actually picked cities that they wanted us to focus on.  Everything always has this geographic layer and we need to probably message that better.  We keep trying to figure that out.

But the second is, what type of person it is, like a conversationalist or a curator.

 We do look a little bit at that more and more but like the New York Times somebody who's a curator who just shares lots of links, it doesn't really make sense to the Disney movie.  We want people who are conversationalist or networker or specialists, people who talk.  There's multiple ways to get to a score of 50, let's say.  It could be because you share a lot of links and people love it,  because you share really great content or because you engage in a lot of conversation.  For perks, conversations are what are valuable so we use that for a layer of targeting.

And then the third thing is we do look at topics.  Internally we have topic scores so, like the one with Audi right now and they want people influential about cars, design, technology, kind of things like that so your score might be like a 20 overall but when it comes to design, you're influential and you would be a perfect person for the Audi campaign.

And that's not shown on the site at all, but again we're putting more emphasis on the public site and we're excited to expose more of that.  And the fact our ability to do that is what's got these brands really excited and why they're coming back and now doing multiple campaigns because of the quality of that data.

Wesley:  Now you mention topic scores so how influential someone is on a specific topic and you also mentioned location in different regions.  Do you have Klout scores for different regions?  For instance, someone in Austin might be very influential in Austin but nationally not very influential, like especially if we're talking about, let's say, a local paper or a local blogger who only writes about local events.  Is that a score that you keep track of at all?

Joe:  So maybe ‑‑ and it's funny, we were just having a huge conservation about this yesterday where right now we'll look at your location and then your score might be 62 but that, for Austin, is super high.  There's not a lot of people above that.  What we don't do is re‑normalize the score so that you're a 62 across the world but 100 in Austin.

It's just like you're a 62 and that's high in Austin versus you're 100 in Austin and 62 ‑‑ and we were debating on what's the right way to handle that, like more scores are more confusing in some ways but it's also kind of cool.  We're taking LinkedIn data and it would have your score for how influential you are in your company or from your school or in your town.  There's lots of ways to layer this on and we're thinking of it in that way but how we construct that and kind of play that externally is still in progress.

Wesley:  Okay.  So it's under consideration basically.

Joe:  Yeah,  it's fun because we're right in the middle of it all.  We have feedback from customers, feedback from brands, our own internal thinking and trying to mesh those together and make decisions. There are a lot of nights of beer and pizza in the office arguing about all the directions we could take.  We get such great feedback from people like you and all these blogs but we also have to balance either puting out fires all day everyday or pushing towards the longer vision of making our service really important.  So it's an interesting time.  It's cool.


Wesley:  Awesome.  It sounds like a really good environment there and creativity and pushing that ball forward is what moves the whole company forward and something I admire actually but previously we were talking about tools, some that you have internally and some externally I'm curious what tools do you show the brands to help them increase their Klout score?

Joe:  To help brands increase their Klout score? We really don't have much.  We pretty much direct them to Klout.com.  When we do the campaigns, we'll give the brand a report to show them how the campaign went and what happened and close the loop as much as possible there but I wouldn't say we have like a pro version of Klout that gives special insight at this point.

It's funny though.  I think if you Google Klout phone number, my cell phone comes up.  So I'll get these really crazy calls.  A company called me the other day and said, we just got in our Twitter account; I want to run our first tweet by you to make sure it won't lower our Klout score.  So brands and companies definitely want to know how to be more effective and have a better Klout score, but we haven't done a great job or really any job at coaching them.

Wesley:  The reason why I ask is because on your website, when you click under the section for Klout for brands and business, it kind of gives that impression.

Joe:  Yeah, I think there was an aspirational writing there.

Wesley:  Okay, all right.  And I was talking about ‑‑ speaking of brands, I was talking to a friend who works at a pretty big brand here.  I was looking at his Twitter stream just a few minutes ago before our call and I notice that he sent out a message about Klout and it seems like there's back and forth between you and him.  I gave him a call and I talked to him on the phone for a little bit and his question has to do with Klout members, people who have actually registered an account with Klout and non‑Klout members.  He was curious that it seems to be people who register with Klout will have a higher Klout score than people who don't register with Klout.  I was wondering if you can speak on whether that is true or not true.

Joe:  It's hard to say because there's people that have high scores or low scores on both buckets, I suppose. What does happen is, people who are registered for Klout, they get processed more from the social graph side.  Everyone get's the streaming processing the same amount but, we use like our REST rate limit more on the people who are registered because they're coming back to the site and interacting with it and to have a better experience.

What we have found in the past that there was some skewing of scores because we were just getting more information about those people so it wasn't necessarily that they're favorited, it's that the score is more accurate.  We get more information about them and so we're constantly working to balance between good user experience and accurate scoring, and always leading towards accuracy but that's what they might be seeing.

Wesley:  Speaking about Twitter, since a registered user has linked their Klout score to their Twitter account, do you factor in direct messages in your calculation of Klout?  For instance, publicly I might not speak with someone who has an extremely high Klout score but back channel via DM, I might speak with them a lot.  Does that factor at all into Klout calculations?

Joe:  Not at all, and we talked about this a lot internally.  I think it would be a great signal for us but just because we get access to DMs through Twitter we are always worried about how people would feel about us looking at their private DMs and just decided that it wasn't worth risking pissing people off. We thought through how the whole thing would be handled and we just haven't circle back.

Wesley:  Okay.  So that was just something that was a design decision after deliberation and at this point you just haven't revisited it yet.

Joe:  Yeah, I think it can be done if it's handled on the front end where we let people opt in or opt out. In the past we have pinged our friends and had informal discussions, and deep discussions with you and get feedback from other people.  It's a mix of people who want us to do it because they think it'll help their score versus people who don't want us to do it because that's their private messages. We just made that decision to not deal with it.

Wesley:  Well, that's a safe choice.  Going back to Twitter and other tools and registered members and not registered members, you mentioned previously that you're going to be rolling out a LinkedIn connection also?

Joe:  Yeah, so you can see if you sign in and go to your dashboard, it'll prompt you to add LinkedIn and if you register now you get prompted for LinkedIn at sign-up.  We have been doing that for several months, and with Facebook too before we rolled out the scores so it's in the data collection stage.

Wesley:  Do you have some of the same API limitations with LinkedIn as you do with Twitter?  Meaning that you have restriction to access certain types of profiles that the registration allows you to and that's the reason why you require people to not just register LinkedIn account but to attach to it?

Joe:  Both LinkedIn and Facebook have way more speed bumps before you can get access to anyone's data.  We need that permission granted and then there are limits on how much we can pull and it's definitely tougher but we're building the relationships with the companies there that we hope are going to open those doors a little wider, but the way we think of things here is we want to be the standard.  We want to be very public facing, both to consumers and partners like Facebook and LinkedIn so we act somewhat conservatively in terms of always being a good ecosystem player where we want to limit the surprise of what they think we're doing as much as possible and we want our partners to know we were very careful around the terms and services and build personal relationships.  That's why we moved into Twitter's building.  You'll see some interesting news in the coming months that'll set these relationships in a little bit more context, we're just very conscious of those kinds of things.

Wesley:  What would you say to someone who feels that they're very influential but they've never registered with Klout and they have a seemingly low Klout score than someone who may be less influential but have linked all their accounts through Klout, and has a higher score?  What would your response to that person be about their score and the reason why it's lower?

Joe:  There's here are definitely realities where there's scores that I see and I say, no way.  This person's score should be either higher or lower and I'll send that right away to the data team and ask, why is this score so low or so high.  We have a model based approach here so that stuff gets fed into the model and, whether the feedback's from me or anyone is great.

I know that  the idea that your score is lower because you haven't hooked your Facebook and LinkedIn account could be confusing, I suppose, or even worse put into question the quality of what we're doing but unfortunately it's just the reality of where everything is. We haven't been beaten up too bad on that at this point and maybe it's coming but that's something that though we're working again like the relationships with those companies, the Facebook, the LinkedIn, the YouTubes, the Foursquare that those limitations are going to disappear over time and that's not a long period of time.  There's definitely push on our side to make stuff better.

Those companies are coming to us now and they want us to crawl their data and will work with us to do special business relationships that don't box us in to having to making deals with every single person in the world.

Wesley:  I've noticed that a lot of companies are coming on board and especially a lot of the client software with moderating the social space.  I'm just curious as to for the data for Klout it seems primarily to be based on material that is sourced from that person.  One example that was brought up that I saw that Oprah Winfrey has a relatively low Klout score and she's a very influential person.  Do you happen to plan on bringing in offline data or data not exactly associated with their Twitter,  Facebook and LinkedIn profile to add up to the calculation to compute the Klout score?

Joe:  Actually internally we call it the Warren Buffett problem where you put Warren Buffett's name in Klout and it's a zero but obviously he's super influential and we look dumb.  We talk about it a lot and it's definitely something we plan to address  I would say probably in the second quarter. Right now our focus is data issues we're dealing with now and perfecting that, but that's definitely second quarter next year when we'll get really serious about that.

Wesley:  Well it sounds like the service is going to be evolving continuously in giant leaps over the next year or so, so I look forward to writing a year from now another story about how far Klout has come.

Joe:  It's funny because in March/April of this year, we were basically three people and we're considerably more people than that now, but a lot of that work of all these new guys haven't even hit production yet and so there's this backlog of innovation that hasn't been seen outside our office yet.

We're continuing to really ramp up who we're bringing in and the way we think about this stuff and our own internal expectation and so I'm excited to see where we'll be in a few months.  Our score is everywhere and we're trying to be the standard and we put ourselves out there.  While we know it's not perfect and that it's a work in progress and we take heat for that.  At the same time I think there's a lot of value in getting the feedback and being out there and so I just hope when we do get that chance for people to maybe not give us an easy pass right now but let us prove that we can keep growing.

Wesley:  Well, just out of curiosity and with all the features you're rolling out, do you have anything planned, speaking of influencers, for South By Southwest specifically?

Joe:  South By's the most fun week of the year and everyone always rolls a bunch of stuff out.  I don't know specifically yet, but we have to do something cool and it's coming up so quick.  We'll do something.  I don't know whether it'll be an event or a product thing or a partnership but we'll definitely do something.  I have no clue yet what that will be.

Posted via email from Wesley83's Posterous

No comments:

Post a Comment