Behind the data
- Rasha Kashef
- Oct 1, 2016
In the last decade, social networking has gained huge momentum. People have become more interested and have even begun relying on it for gathering information, keeping informed with recent news, and hearing other users’ opinions on diverse topics. Such reliance generates massive data – so big, in fact, that it’s difficult to analyze using traditional data analysis methods.
That’s where Assistant Professor Rasha Kashef and her research in machine learning come in.
Machine-learning techniques have the capacity to analyze big data and can be used to detect useful knowledge like trends, patterns, and rules.
“Let’s say you have 1,000 Facebook friends and you’re seeking expert advice on an outfit,” Kashef explains. “You want to know which of your friends works in fashion without having to look through all the web documents – in this case, Facebook profiles. You just want to ask that small, specific group. We provide the tool to do this.”
Kashef's research interests focus on the application of learning patterns, trends, and structures in data through automated procedures known as machine learning algorithms and tools in big-data analytics, software analytics, social media, operations research, and bioinformatics.
“The best tool to use for big data is ours,” she says. “It’s like if you give a human one million transactions, can they process it? No way. You have to give it to a machine.”
It’s the same way search engines work. A quick Google search can bring up thousands of websites, but which site do you choose to visit? Most of us place our trust in Google, and hit one of the first few links. And that tends to be where the best information is.
“When you type a keyword into a search engine, that’s where my tools are,” Kashef says. “We categorize all the documents on the Internet by keywords. The user doesn’t have to do the categorization – we do that, to make sure it’s faster and more accurate.”
How it works
There are four stages to this complex machine learning Kashef specializes in.
Collect the data
Data is collected in one of two ways: in real or synthetic collection. “Wherever your application is, this is where you collect your data,” she says. “So if you’re working on social media, you collect your data through Facebook profiles, messages, tweets, and more.”
Filter the data
Also known as “cleaning,” filtering is when you might have to work around noise, or incomplete or missing data.
Analyze the data
Data analysis using machine learning is broken down into three methods: supervised learning, when one has prior knowledge on the data and is trying to prove it’s right; unsupervised learning, when one has or knows nothing; and semi-supervised learning, a balance between the two.
“It’s always preferable to use unsupervised learning as a pre-stage to everything,” Kashef says. “We use both unsupervised and supervised for big data analytics.”
Make a decision
Based on what you’ve learned, the next step is to make a decision. If there’s a degree of uncertainty, Kashef uses quantitative tools and measures to make better decisions.
“I don’t say things like ‘there’s a good chance,’” she explains. “I’ll say ‘there’s a good chance by 90 per cent.’ I use the quantitative measures to criticize my analysis and to give a better, clearer understanding of what’s going on.”
Ensemble methods in data analytics
“There are millions of algorithms to analyze data,” Kashef says, “and different algorithms work on different data.”
That’s why Kashef and her team use ensemble methods. The goal is to have a combination of algorithms that together will perform measurably better than individual ones, but still takes a reasonable amount of time. Kashef is working on perfecting the algorithms so people can get high quality results with reasonable overheads.
Kashef joined Ivey this year after completing her PhD at the University of Waterloo in 2008 and working at the University of Waterloo and the Academy for Science and Technology.