After airing my modest proposal for "Google Kids," a search engine for children, I quickly realized that I had whacked a beehive. The Internet poses all sorts of new risks for the young and connected, and all sorts of opportunity for those who can devise ways to keep them safe. I, for one, had not heard of the practice of typosquatting. This illegal activity involves buying a URL that's a misspelling of a popular kids site, e.g. bobthebiulder.com, and then serving hardcore porn ads. One guy had to give back $164,000 in revenue he earned through this tactic.
The typosquatting trap is lesson one in why building a search engine for kids is a challenge: They are curious, creative, exploratory beings. They're going to try and type in URLs for the stuff they want, whether it's Scoop, Muck, and Dizzy or "justin bieber kissing selena gomez." The unending ability of children to test limits is why I am initially comforted by the notion of dedicated kid browsers like KidZui (which some of you pointed me to as a solution). All of the sites have been vetted. In practice, though, I find KidZui dispiriting. With its emphasis on games, major movies, and sponsored content, it feels like I am dropping my kid off at the Internet mall.
Plus, I want my children to learn basic computer literacy. KidZui and similar products have their own navigation buttons and kid-smiley user interfaces, but part of looking up stuff on the computer is really learning how to use a computer. The standards of scroll bars, search fields, closing windows, and the like. Also, kids are smart. They know there is a "real" Internet. And it's impossible for any filter or dedicated browser to keep up with the freshness and variety of the Web. There are also many things on "adult" sites, such as newspapers or Flickr, that would be interesting to children. So, what are the best approaches to sorting the Web by age? The most interesting publicly-available research that I came across is being done by a European consortium called PuppyIR.
I don't like to leave my son alone on YouTube because the path to crap can be short. Carsten Eickhoff and Arjen P. de Vries, at TU Delft in the Netherlands, examined YouTube videos to see if an algorithm could identify the ones that are suitable for children. They focused on a video's comments, a space not known for conspicuous displays of erudition. Yet, previous research had shown that comments are more useful in figuring out the actual content of a video than a title or tags. The researchers also looked to see if the comments contained "controversy" by examining the frequency of posts and their tone (a technique called "sentiment analysis.") The theory being that a Hello Kitty video would not spark a troll-baiting flame war.
The top five indicators that helped uncover kid-friendly videos were the number of views, average rating, how many times a video had been favorited, the number of views on the other videos the YouTube user has uploaded, and a low level of "controversy" in the comments. The researchers were surprised that the number of views came out on top, but my own experience has been that view count is a good way to distinguish good kid videos. The Thomas the Tank Engine episode that someone uploaded in a high-quality video typically has more views than the Thomas mash-ups, tributes, parodies, etc.
The bigger implication of this experiment is that community interaction can be very helpful in classifying videos. Consider that the fourth-best indicator was if a YouTube user has posted several videos with lots of views, suggesting they are a reliable source of worthy, watchable content. In effect, by leaving comments and ratings, we are already doing a form of policing, giving algorithm builders clues upon which to base their formulas. YouTube is now experimenting with "reaction buttons"—LOL, OMG, EPIC, CUTE, WTF, FAIL—that could be taken in a more child-centered direction.
But what about the rest of the Web? Karl Gyllstrom and Marie-Francine Moens, at Katholieke Universiteit Leuven in Belgium, wrote a fascinating paper where they tried to assess what the "Children's Web" looked like. Their starting assumption is that "a page designed for children is more likely to link to and be linked from other pages designed for children than to link to or be linked from pages designed for adults." They call their system AgeRank, a play off the classic PageRank signal that propelled Google to fame and fortune.
They discovered that the Children's Web does indeed exhibit a high degree of "locality," i.e. kids sites do indeed interlink—or, "they are more likely to link to pages that are themselves more likely to link to children's pages." To radically simplify their work: The limitation of PageRank is that it values pages of general usefulness while what we want are pages that will be more useful to children. Gyllstrom and Moens propose ways that AgeRank could supplement PageRank by indicating when a site has a more suitable reading level or seems to exist in a neighborhood with a lot of other kid's sites.*
They haven't pulled the sword from the stone, of course. But their work and similar projects show the way forward for more specialized kinds of search. The now-defunct kid search engine Piffany had an informal catchphrase that sums up the dilemma nicely: "Why should a 9-year-old and a 29-year-old get exactly the same search results?" I suppose this is where the lawyers step in. The moment a company markets something as "for kids," they open themselves up to criticism and lawsuits. It just takes one wrong result to slip through at the wrong time to the wrong person. My colleague Farhad Manjoo told me that back in the early 2000s, Google went so far as to mock-up a Google Kids that featured a clip-art dinosaur on the classic Google home page.
That didn't happen, and the quest for safe-search continues.
* I discovered an extremely kid-helpful site from this paper: There is a version of Wikipedia called Simple English Wikipedia that uses shorter sentences and basic words. The other day I was looking up trilobites with my son: The main English Wikipedia page had too much detail, but the Simple English page is just enough.