As with verbal communication, people often give off subtle hints on social media posts of things they are planning to do, such as committing suicide, carrying out an act of terror, or engaging in other antisocial behavior. Ferreting out those potential antisocial actors on social media before they can act has become an important goal of law enforcement agencies worldwide. And now, a machine learning-based sentiment analysis system, developed by a Technion student, could end up being used for just that purpose.
The system developed by Technion computer science student Eden Saig is described in a paper called “Sentiment Classification of Texts in Social Networks,” which won the recently-completed Amdocs Best Project Contest, an annual event for university students. According to Saig, it’s possible to detect the intent of those posting on social media from the subtle hints they give off in the way they present their ideas – in a far more accurate manner than smiley faces and frown emoticons used by social media posters to express their feelings.
You can’t use those to gauge sentiment – the real meaning behind a message. “These icons are superficial cues at best,” said Saig. “They could never express the subtle or complex feelings that exist in real life verbal communication.” They’re too gross, too general, unlike the subtleties used by people when they communicate verbally – the tone of voice, the inflection of a word, the turned up eyebrow, the “taken aback” look, and much more.
Is there a text equivalent of those signals – the ones that often tell what’s on a person’s mind far more accurately than their words? To find out, Saig analyzed posts on Hebrew-language Facebook pages that are almost pure opinion, called “superior and condescending people” and “ordinary and sensible people.” The pages are basically forums for people to let off steam about things and events that get them mad, a substitute for actually confronting the offending person. Between them, the two pages have about 150,000 “likes,” and active traffic full of snarky, sarcastic, and sometimes sincere comments on politics, food, drivers, and much more.
As a vehicle for people to express thoughts about all and sundry, the two Facebook pages were a good source for material to test his theories, said Saig. “The content on these pages could provide a good database for collecting homogeneous data that could, in turn, help ‘teach’ a computerized learning system to recognize patronizing sounding semantics or slang words and phrases in text.”
Taking about 5,000 posts from the pages, Saig ran them through specially designed algorithms to identify patterns in text – checking cues, references, syntax, colloquial usages and foul language, and evaluating the level of irony or sarcasm.
For example, one post, using a sentence that might be addressed to a beggar, says “I’m happy to buy you food, but not give you money,” generated 1,093 likes and dozens of comments. Based on those comments and similar ones, Saig’s system can identify the intent of the person posting. Was he being condescending? Sincere? Helpful? Was the intent to put down the beggar, implying that if he wanted money he should get a job, or a concern on the part of the prospective donor that the beggar (who is unlikely to have money management skills) would use the money for the wrong purpose (buying drugs or alcohol instead of food), or motivated by a concern that the beggar get a square meal, lest he lose the money or be robbed by someone?
After checking thousands of such messages, Saig said that his system was now able to successfully figure out the motivation and sentiment buried in the text. “Now, the system can recognize patterns that are either condescending or caring sentiments and can even send a text message to the user if the system thinks the post may be arrogant,” he said.
While examining chatty social media posts for snarkiness is well and good, Saig sees a much more practical and productive purpose to his research. In preparation for this year’s Boston Marathon, for example, police combed through social media posts looking for indications that a terrorist would try to repeat the terror attacks that killed three people and injured over 250 at the 2013 race. Authorities hired a big data company searched social media for words related to violence, conflict, bombs, and the like – but basing mitigation on keywords opens up the possibility that officials will come across false positives, where social media users “jokingly” plant threats or clues about “terror plots.”
While most people are unlikely to do that nowadays, there’s always the possibility that some dumb kids trying to be “funny” might do so, eating up important resources as police check out a false alarm, and perhaps allowing actual terrorists to pull off an attack while authorities are distracted. By including sentiment analysis, the possibility of false positives could be greatly reduced, Saig’s study concludes.
The same goes for issues such as depression, teen suicide, on-line intimidation and bullying, How serious is the cry for help? Is someone just having a good time at the expense of others? How can mental health authorities prevent suicides among kids who don’t necessarily post their intentions or suicide notes on-line? Sentiment analysis provides a window into intent even when that intent is unspoken. Saig feels it holds great promise, for both society and individuals. “I hope that ultimately I can develop a mechanism that would demonstrate to the writer how his or her words could be interpreted by readers,” said Saig, “thereby helping people to better express themselves and avoid being misunderstood.”