Our research, knowledge, thoughts, and recommendations about building and leading businesses on the Internet.
With the recent advent of GPUs and increased computational power, machine learning and neural networks have risen from the grave and are now one of the forefront technologies in tackling anything a human would normally do. One of the biggest areas of research for this approach has been in understanding the nuances of language. Computers have traditionally struggled to learn languages due to thousands of rules and even more exceptions to each rule. Simple logic approaches fail to take into account context and interpretation and are rarely able to accurately interpret sentences and paragraphs.
In the past decade, researchers have begun applying recurrent neural networks to understand text. Neural networks are combinations of artificial neurons modeled off of the human brain. These networks can change the strength of connections in between the neurons based on training data given to them. For example, if a neural network receives pictures of apples and oranges along with labels for each picture, over time it can tune these connections and learn to distinguish the two objects.
Recurrent neural networks, frequently abbreviated to RNNs, are an extension of this idea and take input from previous iterations. So if an RNN was run on a sentence, it would take the classification of the previous word and use that as additional information for the current word. This makes RNNs particularly effective at handling sequential and time correlated data. In this case, since sentences are sequential constructions and previous words impact the interpretation of the current word, RNNs can better pick up contextualization and the nuances of language.
However, there are still some issues with this idea. Firstly, RNNs can only recall one state which often isn’t enough. Most modern structures actually use something called LSTMs (Long-Short Term Memory), which are a variant of RNNs that can store multiple states and decide which ones are important enough to still keep. Another common modification is the usage of BRNNs (Bi-directional RNNs). These systems stack two opposing RNNs together in order to extract contextual information from both before and after a target word. This way, if the network is looking at a noun, it can get descriptive information such as adjectives, which are usually before the noun, and information about its current state and actions, which are usually after the noun. For example, if the network read “A red cat sits here,” the two directional approach would allow it to extract what the object (cat) looked like (red) and what it was doing (sitting).
So now we have a tool that can potentially learn and understand text. But what exactly can we do with it? How can we use this information? It turns out that while we haven’t been able to fully create a system that understands everything about language, we can build specific structures to extract certain characteristics.
For example, RNNs can determine the part of speech of a word, separating them into categories such as noun, verb, and adjective. This serves as the foundation for grammatical analysis and other insights. Google’s Cloud Natural Language API builds on this and is able to find all the different entities from a sentence, along with their relative importance and connotation. This kind of information can help identify key parts of a piece of writing and separate them out automatically.
Another approach has been in encoding words and sentences. Certain machine learning techniques are used to convert words to vectors, such as what is done by word2vec, allowing computers to represent words in mathematical terms. From this, computers can automatically learn relationships and patterns, such as the similarity between “man” and “women” compared to “king” and “queen” as the vectors between these points will be of similar size and angle. In this way, computers can symbolically represent the same information about these words that we have in our brains.
This kind of approach of encoding information has been extended to other applications, such as translating. The idea is that if you can encode and map different languages to the same vector space, then your vector space now can be used as a universal translator. One RNN can map a sentence to this space and another can take this mapping and convert it back to a different language. This actually turns out to be very similar to Google Translate functions.
From all these different applications, higher level features and characteristics of the text can be extrapolated and greater insight can be made into the content of the text. This is essential to a variety of problems, from chatbots to translators to text editors and much more and can greatly help in automating complex, repetitive work for efficient scaling.
During our most recent webinar, Rahul Singh covered a few of our favorite tools for team collaboration: Slack, Airtable, and Trello. The first two tools are both relatively new to our company whereas the latter one has come to the forefront of our company’s processes after a bit of time in the dark.
Slack was initially a tool that we tested out in our company to improve our internal communication. At the time, we turned away from it because we were afraid that it would become a major time suck that would slowly but surely decrease the efficiency of our team. We stuck with Google Hangouts / Chat for a bit but decided to give Slack a more solid go about a year and a half ago. Our experience thus far has been nothing but positive. The standard tier of Slack is more than enough for a small business to operate on, and the maximum amount of integrations allowed in the free tier is optimal for a company of our size. We use Slack with a variety of integrations, some of which are:
We have most often used Trello as our “Master Backlog” project management system, there are many tasks we are often unable to complete in the current time window but nonetheless would like to keep in the backlog for when there are future opportunities to address them. Recently, we have started relying more heavily on Trello to assist us with Agile Scrum Planning for some of our external projects (clients) and internal initiatives (Research & Development, Sales & Marketing…etc).
The clean interface, the ease with which a user can move cards around, the overall ability to get a bird’s eye view of the project at any time, and the ability to add burndown charts and estimates to cards via plugins such as Scrum for Trello and Burndown for Trello make this tool an indispensable part of our workflow.
Below you can find a screenshot of a template for Agile Scrum Planning in Trello. Additionally, here is the link to it.
Airtable is what you would get if you mixed Google Spreadsheets, Microsoft Excel, Trello, and a lightweight online database. We’ve covered this particular software tool more extensively in a blog post you can find here. We’ve used airtable for a good amount of things ranging from job applicant tracking to sales territory planning. Below you can find a screenshot of the “Advertising Campaigns” Airtable template found here.
Additionally, some of our favorite features of all three of these tools are:
At the end of the day all of these tools can work in conjunction with one another and help your team get a boost, but nonetheless, it is important that your process is ironed out and clear for everyone to understand. No amount of stellar software tools will fix issues that stem from faulty processes…and well most issues stem from faulty processes.
If you’re interested in finding out more you can access the slides and a recording of the webinar below!
Why should an organization undertake such a project?
Comes down to how does it affect the organizations’ perpetuity. Some of the questions a business should be able to answer are:
What other solutions provide the same end user results?
Tools like Domo, or Tableau, or recently something like Periscope (in the SaaS world) can be useful to gain basic insights without having to do ETL if the data is ready. Other open source tools can be used as well such as Kibana, Metabase, and Redash as long as the data is available.
What are the trade-offs between the various solutions?
Ultimately if the data isn’t ready, ETL may be required to get it clean enough for those tools to allow users to visualize/explore it properly.
Rahul Singh, CEO of Anant, had the opportunity to co-organize and MC the May Meetup of Data Wranglers DC where the speakers, John Clune and Timothy Hathaway, covered two topics related to open government and public data for data processing and visualization. We had a great turnout at the event and had the chance to do some networking after.
Our next two meetups will focus on Search & Knowledge Management (June Meetup) and Machine Learning for Data Processing (July Meetup), check out the Meetup page for more details when they become available.
Big thank you to John Clune and Timothy Hathaway for taking the time to present to the group if you have any interest in speaking please don’t hesitate to reach out to Rahul at email@example.com.
Below you can find a recording of both presentations.
Assuming you have already created a SearchStax account and do not already have a deployment set up, click on the Deployments tab and then click on the Add Deployment button at the top. Enter a Deployment name, and select the most appropriate Region, Plan, and Solr Version for your needs. In the example below, we will be using Solr Version 6.4.2.