Author Topic: Google Testing Popularity Poll  (Read 2021 times)

Imosa

  • Any sufficiently advanced technology
  • is indistinguishable from magic
Google Testing Popularity Poll
« on: December 31, 2012, 08:53:11 AM »
I remembered a program a program I had for automated Google testing and tried testing touhou popularity. I've heard problems with the official poll of people preferring Reimu and other main characters because they are put on the spot. I figured testing a database is probably a better indicator of the most popular character.
A google test determines the popularity of something depending on how many hits it has on Google.
To clarify I used the names I got off of the 2012 popularity pole on the touhou wiki: http://en.touhouwiki.net/wiki/THWiki_Popularity_Poll#9th_.282012.29
I did make some changes like renaming the "Unnamed book-reading youkai" to Tokiko removing UFOs, and Alice's Dolls, and removing the clarifying parentheses from the end of all names. Google searches were done on "touhou " followed by the given name.
Ignoring the the first few winners for being common names that makes the top character, Cirno at number 10 which doesn't surprise me a lot. I havn't had that much exposure to the fandom but at the last convention I went to, the room wasn't making jokes about Reimu's inability to count. Next come Marisa, Flandre, Eirin, Remilia, Elis (yeah, ok), Yukari, Alice, Reisen (too, not the original), and Miko... and then Reimu.
Some things do concern me about the test. Mainly that the first or common name of a character reveals higher search results then the full name, which is why "Reisen" is more popular then "Reisen Udongein Inaba". A cursory test also showed "touhou reimu" to be more popular then "touhou reimu hakure" and "touhou cirno". "Eiki Shiki, Yamaxanadu" also gave me some trouble and after verifying it manually I found the result to be lower then what the program thought, which makes me wonder what the program found. The comma might have been an issue here. In any case I switched out my manual number since that's what I wanted it to search. Also the last time I tried this I think I failed miserably, judging My Little Pony shippings.
I've been thinking of trying a similar test on Danbooru but then I would have to convert names and make sure they actually work, and that sounds like a lot of work. The point of this is kinda for me to do as little manual work as possible. Otherwise I may try searching for first names only, and cutting out minor characters. Other suggestions would also be taken into consideration.

https://docs.google.com/spreadsheet/ccc?key=0Andq4bIa7CW8dHE0ZkRseDlmNkpNMTRMRVVIMHhhcmc
« Last Edit: December 31, 2012, 04:19:18 PM by Imosa »

KFCbbQ

  • Taking over the world.
Re: Google Testing Popularity Poll
« Reply #1 on: December 31, 2012, 03:34:59 PM »
The first thing I notice from your chart is that you need to fix your program, because it is quite clear that you aren't getting the results you are supposed to get. With characters like Reimu, Marisa and Remilia being the most conventional representatives of Touhou, they are expected to be ranked among the top fives, and certainly not 23, 12, and 17. If that's not obvious enough, I can assure you that googling "touhou Eirin Yagokoro" will produce a much lower hit count than "touhou Reimu Hakurei".

Anyway, the biggest issue with this as opposed to a popularity poll, is that normally people don't refer to a character by their full name. This puts the characters with multiple words in their name at a disadvantage. Also considering certain characters have variances in their names, like Yuuka/Yuka are different spellings for the same character, Google will fail to pick that up. There are things you could do to optimize your search and improve the overall consistency of your result, such as putting quotation marks around character names, or searching by their most commonly referred name, but in the end the result you're getting are still going to be somewhat biased.

Nevertheless I think this is an interesting experiment, there are additional things you could do to make it even more interesting. You could consider running the tests in Japanese or other languages, to see how popularity varies in different regions. Or perhaps utilise Google Trend to investigate how people's interest has changed over time. This is a 15 minutes talk on TED explaining why trendiness can be fascinating.

Tengukami

  • Breaking news. Any season.
  • *
  • I said, with a posed look.
Re: Google Testing Popularity Poll
« Reply #2 on: December 31, 2012, 03:45:41 PM »
I think it might be less cumbersome to just set this up as a GoogleDocs poll.

"Human history and growth are both linked closely to strife. Without conflict, humanity would have no impetus for growth. When humans are satisfied with their present condition, they may as well give up on life."

Imosa

  • Any sufficiently advanced technology
  • is indistinguishable from magic
Re: Google Testing Popularity Poll
« Reply #3 on: December 31, 2012, 05:13:02 PM »
The first thing I notice from your chart is that you need to fix your program, because it is quite clear that you aren't getting the results you are supposed to get. With characters like Reimu, Marisa and Remilia being the most conventional representatives of Touhou, they are expected to be ranked among the top fives, and certainly not 23, 12, and 17. If that's not obvious enough, I can assure you that googling "touhou Eirin Yagokoro" will produce a much lower hit count than "touhou Reimu Hakurei".
Not sure I understand, I agree the program needs work but not for the reasons you mentioned. We can't go in expecting Reimu, Marisa, and Remi to be the most popular characters, that would defeat the point of the experiment. Also you can be a little more critical in viewing the poll and see that they did not get 23, 12, and 27. You can pretty confidently eliminate the top 8 results as being more random words then actual characters. Yuki in particular doesn't even appear on page 15 of her google search while Cirno still fills page 30. Also what do you mean with that last sentence?

Anyway, the biggest issue with this as opposed to a popularity poll, is that normally people don't refer to a character by their full name. This puts the characters with multiple words in their name at a disadvantage. Also considering certain characters have variances in their names, like Yuuka/Yuka are different spellings for the same character, Google will fail to pick that up. There are things you could do to optimize your search and improve the overall consistency of your result, such as putting quotation marks around character names, or searching by their most commonly referred name, but in the end the result you're getting are still going to be somewhat biased.
I agree. I did not think about quotation marks. A quick test on Reimu reveals her hits doubling. That's interesting.

Nevertheless I think this is an interesting experiment, there are additional things you could do to make it even more interesting. You could consider running the tests in Japanese or other languages, to see how popularity varies in different regions. Or perhaps utilise Google Trend to investigate how people's interest has changed over time. This is a 15 minutes talk on TED explaining why trendiness can be fascinating.
Forgot about Google Trends too. I'll have to look into that but "Reimu Hakurei" produced no results right off the bat.

I think it might be less cumbersome to just set this up as a GoogleDocs poll.
Yeah

Tengukami

  • Breaking news. Any season.
  • *
  • I said, with a posed look.
Re: Google Testing Popularity Poll
« Reply #4 on: December 31, 2012, 05:41:12 PM »
Setting up a poll is super-easy, too. I did it a couple months ago for the Caf? to determine the Gensokyo religious alignment of the board based on hypothetical questions that ranged from Strongly Agree to Strongly Disagree. A popularity poll should be a piece of cake. Let me know if you need tips or whatever, but it looks like you have the resources to do it yourself. I be very curious to see the results myself!

"Human history and growth are both linked closely to strife. Without conflict, humanity would have no impetus for growth. When humans are satisfied with their present condition, they may as well give up on life."

KFCbbQ

  • Taking over the world.
Re: Google Testing Popularity Poll
« Reply #5 on: December 31, 2012, 06:21:37 PM »
Not sure I understand, I agree the program needs work but not for the reasons you mentioned. We can't go in expecting Reimu, Marisa, and Remi to be the most popular characters, that would defeat the point of the experiment. Also you can be a little more critical in viewing the poll and see that they did not get 23, 12, and 27. You can pretty confidently eliminate the top 8 results as being more random words then actual characters. Yuki in particular doesn't even appear on page 15 of her google search while Cirno still fills page 30. Also what do you mean with that last sentence?
23, 12, and 17 was what I wrote, not  23, 12, and 27.

Anyway, it appears that the inconsistency I'm getting is due to Google's weird regional filter rather than the fault of your program. In fact, if you try searching touhou Eirin Yagokoro on google.com it yields 808,000 hits, while google.com.au gives only 172,000. On the other hand searching touhou Reimu Hakurei on google.com gives 648,000 hits, while google.com.au gives 1,560,000. Is this an indication that Australians idolize Reimu so much that they talk about her twice as much than the rest of the world, while feeling absolutely appalled to talk about Eirin?

Imosa

  • Any sufficiently advanced technology
  • is indistinguishable from magic
Re: Google Testing Popularity Poll
« Reply #6 on: December 31, 2012, 06:55:27 PM »
Setting up a poll is super-easy, too. I did it a couple months ago for the Caf? to determine the Gensokyo religious alignment of the board based on hypothetical questions that ranged from Strongly Agree to Strongly Disagree. A popularity poll should be a piece of cake. Let me know if you need tips or whatever, but it looks like you have the resources to do it yourself. I be very curious to see the results myself!
A popularity poll would be interesting but that's not what I'm looking at here. I don't doubt the results of the wiki's poll, I doubt the people who took it. I also can't get their opinions anytime I want. It's a megalomaniac's dream.

Anyway, it appears that the inconsistency I'm getting is due to Google's weird regional filter rather than the fault of your program. In fact, if you try searching touhou Eirin Yagokoro on google.com it yields 808,000 hits, while google.com.au gives only 172,000. On the other hand searching touhou Reimu Hakurei on google.com gives 648,000 hits, while google.com.au gives 1,560,000. Is this an indication that Australians idolize Reimu so much that they talk about her twice as much than the rest of the world, while feeling absolutely appalled to talk about Eirin?
Oh, yeah I guess it would be silly to expect the regions to have the same result. The difference is striking though. Honestly, Google probably isn't the best place to make these kind of judgments.

Tengukami

  • Breaking news. Any season.
  • *
  • I said, with a posed look.
Re: Google Testing Popularity Poll
« Reply #7 on: December 31, 2012, 07:41:38 PM »
Ah, I think I was unclear - by "poll" I don't mean the thing itself would be a poll, so much as you would use a GoogleDocs poll to deal with the data better.

"Human history and growth are both linked closely to strife. Without conflict, humanity would have no impetus for growth. When humans are satisfied with their present condition, they may as well give up on life."

Goldom

  • Whee
Re: Google Testing Popularity Poll
« Reply #8 on: January 01, 2013, 04:06:56 AM »
Yeah, something's certainly a little off here if Orange is in 7th place. Doing the search myself shows why, and it's no real surprise - the results have nothing to do with the character Orange. Many of the results that seem higher than they should be have common words in them inflating their result count - including names with common words like Scarlet.