Anant Corporation Blog

Our research, knowledge, thoughts, and recommendations about building and leading businesses on the Internet.

Teaching Computers to Understand Language

With the recent advent of GPUs and increased computational power, machine learning and neural networks have risen from the grave and are now one of the forefront technologies in tackling anything a human would normally do. One of the biggest areas of research for this approach has been in understanding the nuances of language. Computers have traditionally struggled to learn languages due to thousands of rules and even more exceptions to each rule. Simple logic approaches fail to take into account context and interpretation and are rarely able to accurately interpret sentences and paragraphs.

In the past decade, researchers have begun applying recurrent neural networks to understand text. Neural networks are combinations of artificial neurons modeled off of the human brain. These networks can change the strength of connections in between the neurons based on training data given to them. For example, if a neural network receives pictures of apples and oranges along with labels for each picture, over time it can tune these connections and learn to distinguish the two objects.

Recurrent neural networks, frequently abbreviated to RNNs, are an extension of this idea and take input from previous iterations. So if an RNN was run on a sentence, it would take the classification of the previous word and use that as additional information for the current word. This makes RNNs particularly effective at handling sequential and time correlated data. In this case, since sentences are sequential constructions and previous words impact the interpretation of the current word, RNNs can better pick up contextualization and the nuances of language.

However, there are still some issues with this idea. Firstly, RNNs can only recall one state which often isn’t enough. Most modern structures actually use something called LSTMs (Long-Short Term Memory), which are a variant of RNNs that can store multiple states and decide which ones are important enough to still keep. Another common modification is the usage of BRNNs (Bi-directional RNNs). These systems stack two opposing RNNs together in order to extract contextual information from both before and after a target word. This way, if the network is looking at a noun, it can get descriptive information such as adjectives, which are usually before the noun, and information about its current state and actions, which are usually after the noun. For example, if the network read “A red cat sits here,” the two directional approach would allow it to extract what the object (cat) looked like (red) and what it was doing (sitting).

So now we have a tool that can potentially learn and understand text. But what exactly can we do with it? How can we use this information? It turns out that while we haven’t been able to fully create a system that understands everything about language, we can build specific structures to extract certain characteristics.

For example, RNNs can determine the part of speech of a word, separating them into categories such as noun, verb, and adjective. This serves as the foundation for grammatical analysis and other insights. Google’s Cloud Natural Language API builds on this and is able to find all the different entities from a sentence, along with their relative importance and connotation. This kind of information can help identify key parts of a piece of writing and separate them out automatically.

Another approach has been in encoding words and sentences. Certain machine learning techniques are used to convert words to vectors, such as what is done by word2vec, allowing computers to represent words in mathematical terms. From this, computers can automatically learn relationships and patterns, such as the similarity between “man” and “women” compared to “king” and “queen” as the vectors between these points will be of similar size and angle. In this way, computers can symbolically represent the same information about these words that we have in our brains.

This kind of approach of encoding information has been extended to other applications, such as translating. The idea is that if you can encode and map different languages to the same vector space, then your vector space now can be used as a universal translator. One RNN can map a sentence to this space and another can take this mapping and convert it back to a different language. This actually turns out to be very similar to Google Translate functions.

From all these different applications, higher level features and characteristics of the text can be extrapolated and greater insight can be made into the content of the text. This is essential to a variety of problems, from chatbots to translators to text editors and much more and can greatly help in automating complex, repetitive work for efficient scaling.

Scaling Cloud Web & Data Technologies

Long gone are the days where software had to be self-hosted, needed hundreds if not thousands of servers to run at a proper speed, and your enterprise needed to hire an entire IT department with specialized staff with the ability to configure, maintain and run updates on the various pieces of technology powering your business. Now, more than ever before, can modern enterprises benefit from distributing the various parts of their tech infrastructure without having to break their budget.

 

With Docker it’s much easier to create applications that are more environment agnostic than ever before with the use of “containers”. If you’re looking for a database that is fault-tolerant, can scale rapidly, and has the ability to be queried very fast, a distributed NoSQL database such as Cassandra is a good option.

 

We recently conducted a presentation covering this topic to one of our clients and have attached the slides below. There may be a chance we conduct a webinar covering this topic in more depth based on interest. If you’d like to hear more about this topic shoot me an email at arturs@anant.us!

 

Alternatively, have us come and give this presentation at your company and see how to build your roadmap to cloud software, you can sign up for a free no-hassle 30-minute introductory roadmap conversation with one of our experts here!

 

Please see the slides below:

CRM

Customer Relationship Management Systems (CRM) – Nutshell, Salesforce, & Pipedrive

Long gone are the days where companies had only a few choices to pick from for their CRM system. Nowadays, we have the luxury of being able to pick the system that fits our particular process best, that corresponds with our budgetary constraints, and also gives us the choice of where we want to host the system (on-premise or cloud) without any headaches.

 

According to Forbes, the CRM market was valued at $23.2 billion in 2014, and approximately 50% of that market was dominated by big players such as Salesforce (of course), SAP, Oracle, Microsoft, and IBM. That still leaves about $11.6 billion unaccounted for. In this webinar, we will cover the leading CRM, Salesforce, as well as two components of that remaining 50% of the market, Nutshell, which we use, and Pipedrive.

 

Both Nutshell and Pipedrive are smaller scale applications that concentrate on a specific set of features and deliver them all relatively well. They are both priced about the same with Nutshell’s lowest tier coming at $19 per user per month whereas Pipedrive’s costs for their lowest tier is $10 per user per month. Salesforce CRM, on the other hand, is a much more robust, enterprise-ready system that benefits from a slew of integrations as well as thousands (if not millions) of Salesforce experts and developers around the world that can configure it best for your organization.

 

If you’re interested in learning more about the different types of CRM’s available for your modern enterprise make sure to sign up for the event here! You can find additionals details below.

_____

This month’s theme focuses on how to choose and leverage a CRM for your modern enterprise. We’ll be featuring 2 tools we’ve used in the past ourselves and with clients and 1 that we’ve heard about and are evaluating for this webinar.

Our goal is to have interviews of users as part of the webinar so it will be slightly different. Some of the questions we will be asking are:

  1. What are the important steps of implementation for a CRM?
  2. What features to look for in your next CRM?
  3. Quick demos and potentially interviews of users.

 

Slides will be uploaded near the date of the webinar to Slideshare.

Team Collaboration – Slack, Airtable, and Trello: What Makes Them Good

During our most recent webinar, Rahul Singh covered a few of our favorite tools for team collaboration: Slack, Airtable, and Trello. The first two tools are both relatively new to our company whereas the latter one has come to the forefront of our company’s processes after a bit of time in the dark.

Slack

Slack was initially a tool that we tested out in our company to improve our internal communication. At the time, we turned away from it because we were afraid that it would become a major time suck that would slowly but surely decrease the efficiency of our team. We stuck with Google Hangouts / Chat for a bit but decided to give Slack a more solid go about a year and a half ago. Our experience thus far has been nothing but positive. The standard tier of Slack is more than enough for a small business to operate on, and the maximum amount of integrations allowed in the free tier is optimal for a company of our size. We use Slack with a variety of integrations, some of which are:

 

  1. Sending automated weekly reports to our managers from Metabase, an awesome open-source BI tool, to seamlessly get critical data in the hands of leadership.
  2. Receiving instantaneous updates on commits and deployments across our different set of software development tools in our #_devops channel:
    1. GitHub
    2. CodebaseHQ
    3. DeployHQ
    4. AppVeyor
  3. Sending tasks to either of our project management tools through our #_chief channel:
    1. Trello
    2. active.collab

Trello

We have most often used Trello as our “Master Backlog” project management system, there are many tasks we are often unable to complete in the current time window but nonetheless would like to keep in the backlog for when there are future opportunities to address them. Recently, we have started relying more heavily on Trello to assist us with Agile Scrum Planning for some of our external projects (clients) and internal initiatives (Research & Development, Sales & Marketing…etc).

 

 

The clean interface, the ease with which a user can move cards around, the overall ability to get a bird’s eye view of the project at any time, and the ability to add burndown charts and estimates to cards via plugins such as Scrum for Trello and Burndown for Trello make this tool an indispensable part of our workflow.

 

 

Below you can find a screenshot of a template for Agile Scrum Planning in Trello. Additionally, here is the link to it.

 

 

Airtable

Airtable is what you would get if you mixed Google Spreadsheets, Microsoft Excel, Trello, and a lightweight online database. We’ve covered this particular software tool more extensively in a blog post you can find here. We’ve used airtable for a good amount of things ranging from job applicant tracking to sales territory planning. Below you can find a screenshot of the “Advertising Campaigns” Airtable template found here.

 

Additionally, some of our favorite features of all three of these tools are:

  • Their ability to work with other systems in our process
  • The fact that they are SaaS products
  • Their focus on team collaboration and activity tracking
  • The speed at which they operate
  • Good native mobile applications for iOS and Android

 

At the end of the day all of these tools can work in conjunction with one another and help your team get a boost, but nonetheless, it is important that your process is ironed out and clear for everyone to understand. No amount of stellar software tools will fix issues that stem from faulty processes…and well most issues stem from faulty processes.

 

If you’re interested in finding out more you can access the slides and a recording of the webinar below!

 

 

What Makes a Good ETL Project?

Bad

  1. Bad ETL (extract, transform, load) projects are ones that don’t have a strategy for different types of information or lack of knowledge management on how to add/remove different data sources, add/remove processors & translators, and add/remove different sinks of information.
  2. It doesn’t necessarily have to be on any particular platform, just that it has structure.. just as any software should have.. an architecture.

 

Good

  1. Simple systems that separate E / T / L into composable blocks that are scriptable or configurable.
  2. Compiled systems are good too if the volume is extreme.
  3. A good bash pipeline is just as good as any as long as its well documented.

 

Ugly

  1. Using ESB (Enterprise Service Bus) for ETL.
  2. Using Spark for ETL.
  3. Basically using things that have advanced features for business logic for doing simple transformations that really don’t need to belong in these computing environments. Conjoining simple message delivery (ETL) to an advanced message delivery (ESB) or advanced computing (Spark).

 

Why should an organization undertake such a project?

To meet a business goal(s). Sometimes it’s to gain intelligence, sometimes it’s to create data products to create value, sometimes it’s to show predictions.

Comes down to how does it affect the organizations’ perpetuity. Some of the questions a business should be able to answer are:

 

What other solutions provide the same end user results?

Tools like Domo, or Tableau, or recently something like Periscope (in the SaaS world) can be useful to gain basic insights without having to do ETL if the data is ready. Other open source tools can be used as well such as Kibana, Metabase, and Redash as long as the data is available.

 

What are the trade-offs between the various solutions?

Ultimately if the data isn’t ready, ETL may be required to get it clean enough for those tools to allow users to visualize/explore it properly.