Hi all!
Today I was working on my data mining class homework, we had to get a streaming of tweets through Twitter API and then do some stuff (I'll do a separate topic about that.. some day).
I already worked with twitter streaming before so I went straight "recycling" my code.. but then an error showed up. This particular error to be more precise
Ugly, isn't it?
Well then, it turned out that the problem was that I put the string "twitter" as filter. It grabbed all the tweets that contained an URL shortened automatically by twitter (those like t.co etc..). Apparently for some kind of Twitter bug it seems that when you get tweets through API calls you get also some of them that aren't well formatted, so that they don't match the expected format and when Python goes and try to parse the REST response it throws this error.
This seems to happen most frequently if you use the filter 'Twitter', I suppose it is because in this way you grab more "trash-tweets" than usual. I searched online for a while before coming out with a solution and I found out that some people got this error after hours and hours of streaming, and it didn't matter which library they used (I use twython, btw).
Well then, at the end of the day (literally) what was the solution?
Easy: I changed filter. For my purposes I found really useful to look for trending topics, but an even better solution was to use a stopword (yes, they're not totally evil), so to get a lot of tweets in short times that concerns different topic (most of them will be about One Directions or Taylor Swift, but that's another story). For example I used '#50factsaboutme' and 'any', and in about 15 minutes I reached my goal of 10.000 tweets.
Hope that this post will help you!
Let me know in the comments if you solved your problem or not! I'll be glad to help you!
Today I was working on my data mining class homework, we had to get a streaming of tweets through Twitter API and then do some stuff (I'll do a separate topic about that.. some day).
I already worked with twitter streaming before so I went straight "recycling" my code.. but then an error showed up. This particular error to be more precise
requests.exceptions.ChunkedEncodingError: IncompleteRead
Ugly, isn't it?
Well then, it turned out that the problem was that I put the string "twitter" as filter. It grabbed all the tweets that contained an URL shortened automatically by twitter (those like t.co etc..). Apparently for some kind of Twitter bug it seems that when you get tweets through API calls you get also some of them that aren't well formatted, so that they don't match the expected format and when Python goes and try to parse the REST response it throws this error.
This seems to happen most frequently if you use the filter 'Twitter', I suppose it is because in this way you grab more "trash-tweets" than usual. I searched online for a while before coming out with a solution and I found out that some people got this error after hours and hours of streaming, and it didn't matter which library they used (I use twython, btw).
Well then, at the end of the day (literally) what was the solution?
Easy: I changed filter. For my purposes I found really useful to look for trending topics, but an even better solution was to use a stopword (yes, they're not totally evil), so to get a lot of tweets in short times that concerns different topic (most of them will be about One Directions or Taylor Swift, but that's another story). For example I used '#50factsaboutme' and 'any', and in about 15 minutes I reached my goal of 10.000 tweets.
Hope that this post will help you!
Let me know in the comments if you solved your problem or not! I'll be glad to help you!
Commenti
Posta un commento