This project is based on work of Riedl, M. Skip to content. Dismiss Join GitHub today GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. Sign up. Jupyter Notebook TeX. Jupyter Notebook Branch: master New pull request.
Find File. Download ZIP. Sign in Sign up. Launching GitHub Desktop Go back. Launching Xcode Launching Visual Studio Latest commit edf8 Aug 6, The results revealed that Slack is doing great with most reviews talking mostly positively about the company:. Furthermore, we also performed aspect-based sentiment analysis on the reviews to understand which aspects people are praising or complaining about.
Text Classification Applications
The results showed that users love things like its ease of use, integrations, and file sharing system, but hate stuff like the search tool, the notifications system, the pricing, and the performance and reliability:. Building a good customer experience is one of the foundations of a sustainable and growing company.
Text classification can help support teams provide a stellar experience by automating tasks that are better left to computers, saving precious time that can be spent on more important things. For instance, text classification is often used for automating ticket routing and triaging. To do this, a person is needed to manually assign the ticket to the correct team who can understand and reply to the customer in the right language. With text classification, instead of using humans you can use a language detection classifier to do this task for you.
Text classification can also be used for routing support tickets to a teammate with specific product expertise. For instance, if a customer writes in asking about refunds, you can automatically assign the ticket to the teammate with permission to perform refunds. This will ensure the customer gets a quality response more quickly.
Without the need for triaging every single ticket, support teams can work more efficiently and reduce response times. Support teams can also use text classification to automatically detect the urgency of a support ticket and prioritize accordingly. By using machine learning to set priorities, you can ensure your team is always working on the most urgent tickets, every time. Companies are also leveraging text classification for getting insights from support conversations, thus improving their reporting and analytics.
Example: Analyzing customer support interactions on Twitter. There are different trends around how to deal with customers in social media. Some support teams try to appear hip and cool while others project a more professional appearance. But, which approach is better received by customers? First, we analyzed the most relevant keywords in all these tweets and found out that each carrier has its unique approach towards interacting with customers.
For instance, T-Mobile has a friendlier and more personal approach, with every support representative signing each message with their name, while Verizon tweets are very dry and professional. Then, we performed sentiment analysis on the data, and the results suggest that a friendlier take on social media elicits more positive responses:.
When companies leverage surveys such as Net Promoter Score NPS to gather feedback from customers continuously, they start to drive their business decisions based on its results. To be able to do this, the information gathered, which usually involves open-ended responses, must be processed.
By manually annotating responses into different categories, product teams can identify valuable insights and trends over time. The problem is that this manual process is tedious and very time-consuming. Instead of relying on humans to do this task, you can quickly process customer feedback with machine learning. Classification models can help you analyze survey results to discover patterns and insights like:.
By combining the quantitative results with this qualitative but structured analysis, product teams can make more informed decisions without having to put so much time or resources into reading every single open-ended response. In their effort to obtain actionable insights for roadmap improvements, Retently wanted to figure out what was driving their NPS score. But, manually sorting through all the feedback was quite a time-consuming task that they preferred to avoid. So, they turned to machine learning to automate this process and trained a classifier that was able to classify NPS open-ended responses into the following tags:.
Excited about the results of the classifier, Retently decided to implement a new reporting system that can showcase customer priorities from their own custom words:. This new report system allowed Retently to discover actionable insights about their customers that now drives strategic decisions to provide a better user experience. So, you want to start using text classification?
Great idea! Machines are much faster at processing than humans are.
Applying Machine Learning to classify an unsupervised text document
You can begin to automate manual and repetitive tasks so that you can focus on more important and fulfilling activities. But…how the heck do you get started? Building your first text classifier can be simple and straightforward. You just need two things:. Sounds good? A text classifier is worthless without accurate training data to power it. Just like humans, machine learning algorithms can make predictions by learning from previous examples. By telling the algorithm that you expect a specific set of tags as output for a particular text, it can learn to recognize patterns in text, like the sentiment expressed by a tweet, or the topic mentioned in a customer review.
An accurate classifier depends entirely on getting the right training data, which means gathering examples that best represent the outcomes you want to predict. If you train your model with another type of data, the classifier will provide poor results.
You can use internal data generated from the apps and tools that you use every day such as CRMs e.
Salesforce, Hubspot , chat apps e. Slack, Drift, Intercom , help desk software e. Zendesk, Freshdesk, Front , survey tools e. SurveyMonkey, Typeform, Google Forms , and customer satisfaction tools e. These tools usually provide an option to export data in a CSV file that you could use for training your classifier. Another option is using external data available on the web, either by using web scraping, APIs, or public datasets.
Unsupervised Information Extraction by Text Segmentation
The following are some publicly available datasets that you can use for building your first text classifier and start experimenting right away. Reuters news dataset : probably one the most widely used dataset for text classification, it contains 21, news articles from Reuters labeled with categories according to their topic, such as Politics, Economics, Sports, and Business. You can get an alternative dataset for Amazon product reviews here.
Twitter Airline Sentiment : this dataset contains around 15, tweets about airlines labeled as positive, neutral, and negative. Spambase : a dataset with 4, emails labeled as spam and not spam. Hate speech and offensive language : this dataset contains 24, labeled tweets organized into three categories: clean, hate speech, and offensive language. Now that you have training data, it's time to feed it to a machine learning algorithm and create a text classifier.
Luckily, many resources can help you during the different phases of the process, i.
Unsupervised Information Extraction by Text Segmentation | Ebook | Ellibs Ebookstore
Broadly speaking, these tools can be classified into two different categories:. One of the reasons machine learning is becoming mainstream is because of the myriad of open source libraries available for developers interested in applying it. Although they still require machine learning knowledge for building and deploying models, these libraries offer a fair level of abstraction and simplification. Python, Java, and R all offer a wide selection of machine learning libraries that are actively developed and provide a diverse set of features, performance, and capabilities.
Python is often the programming language of choice for developers and data scientists who need to work in machine learning models. The simple syntax, its massive community, and the scientific-computing friendliness of its mathematical libraries are some of the reasons why Python is so prevalent in the field. Scikit-learn is one of the go-to libraries for general purpose machine learning. It supports many algorithms and provides simple and efficient features for working with text classification, regression, and clustering models. If you are a beginner in machine learning, scikit-learn is one of the most friendly libraries for getting started with text classification with lots of tutorials and step-by-step guides all over the web.
Its super handy for text classification as it provides all kinds of useful tools for making a machine understand text such as splitting paragraphs into sentences, splitting up words, and recognizing the part of speech of those words. SpaCy has also integrated word embeddings which can be useful to help boost accuracy in text classification. Once you are ready to experiment with more complex algorithms, you should check out deep learning libraries like Keras, TensorFlow, and PyTorch.
Keras is probably the best starting point as its designed to simplify the creation of recurrent neural networks RNNs and convolutional neural networks CNNs. TensorFlow is the most popular open source library for implementing deep learning algorithms.
Developed by Google and used by companies such as Dropbox, eBay, and Intel, this library is optimized for setting up, training, and deploying artificial neural networks with massive datasets. Another programming language that is broadly used for implementing machine learning models is Java.
Like Python, it has a big community, an extensive ecosystem, and a great selection of open source libraries for machine learning and NLP.