Anant Corporation Blog: Our research, knowledge, thoughts, and recommendations about building and managing online business platforms.
I had the pleasure this past Wednesday of introducing Eric Pugh (@dep4b) to the Data Wranglers DC Meetup group. He spoke about using Solr and Zeppelin in data processing and; specifically, the ways big data can easily be processed and displayed as visualizations in Zeppelin. Also broached was Docker, an application Anant uses, and its role in setting up environments for data processing and analysis. Unfortunately, no actual blimps or zeppelins were seen during the talk, but the application of data analysis to events they usually fly over was presented on last month during a discussion about Spark, Kafka, and the English Premier League.
Instead of trying to completely rehash Eric’s presentation, please check out his material for yourself (available below). In short, he showed how multiple open-source tools can be used to process, import, manipulate, visualize, and share your information. More specifically, Spark is a fast data processing engine which you can use to prepare your data for presentation and analysis. Whereas, Zeppelin is a mature, enterprise-ready application; as shown by its recent graduation from Apache’s Incubator Program; and is a great tool to manipulate and visualize processed data.
Please don’t hesitate to reach out with any questions or if you are interested in participating or speaking at a future Data Wranglers DC event. Each event is recorded, livestreamed on the Data Community DC Facebook page and attended by 50 or more individuals interested in data wrangling, data processing, and possible outcomes from these efforts. After the monthly event, many members continue their discussions at a local restaurant or bar.
I hope to see you at an event in the near future!
The short answer is “To make it easy to package and ship code.”
Docker can be your assembly line for software production. If you’re building software with complex architecture, using software like Docker can significantly reduce the time for software development, testing, and deployment through the use of “containers”. For the client, this approach can significantly reduce software development costs, accelerate delivery cycles and launch times of ideas, and potentially decrease coding errors that hinder your services and hurt the bottom line. Docker is used by thousands of companies as part of their DevOps processes and its adoption is expected to continue to grow. Here are a few examples: Red Hat, Rackspace, Spotify, and more!
Here at Anant we are very interested in data wrangling (aka data munging), which basically means, we want to be able to help people take data in one format and convert it to a form that best suits their needs. One way we keep up to date is through the excellent Data Wranglers DC group that meets monthly here in Washington.
At the most recent meeting, the group tackled the challenge of integrating real-time video and data streams. Mark Chapman, who is a Solutions Engineering Manager at Talend, explained how his company utilized Spark and Kafka in their product to analyze real time data in the English Premier League (EPL). In addition to the video inputs at 25 frames per second from cameras throughout the stadium, the stream was correlated to data connected to players’ heart rates and other measurements. The EPL is then able to overlay this information into replays to improve presentation and analysis as well as send data to companies offering in-game wagers.
The presentation was very interesting and Mark graciously shared his slides:
If you are in any way interested in data wrangling – just like it sounds, getting data under control and to work for you – we would love to hear from you and let you know what might be possible with your data streams. If you are in DC and are interested in the technical side of data munging, please come out to the next event and meet us. This past presentation was hosted by ByteCubed (@bytecubed), in Crystal City, but the gatherings have been in Foggy Bottom as well.
Thank you to everyone who recently joined us for our final installation of the autumn B2B webinar series focused on data integration and data management basics!
We plan to put together more informational webinars in the new year. Before winter settles in, though, imagine yourself on a deserted, tropical island with only a boat and just a few hours until sundown. In the distance, you can see other islands and you know each of them has just one thing you need — food, water, fuel, shelter, etc. The problem you now face is how to get to all of the islands in an efficient manner before sundown. However, without knowing which island has the fuel to power your boat, you are left guessing and hoping you can get everything. If only you had a map – or all of the supplies could be obtained from a central location – this process would be so much quicker, simpler, and easier. In the workplace, this problem plays out every day when you need to search for reports, documents, and other digital assets that help you efficiently run a business or project.
Do you find yourself using multiple systems to get your work done – Salesforce for lead tracking, G-Suite for collaboration, Box for digital storage, Sharepoint for site management, etc. – and feel like the technology gets in the way as you bounce from one application and window to another? Each time you sign up for an online service or install an app, you create a data island or data silo. These semi-isolated islands can rapidly grow in size and number and, without a plan to manage this growth, can quickly lead to wasted time and money as you look for things or redo a previously completed report. Last week’s webinar touched on these issues and, more importantly, shared how you can take control and bring these islands together with a knowledge management system and enterprise search.
Knowledge management is the process of creating, sharing, using and managing the knowledge and information generated by your company. Martin White, an enterprise search and information management strategy consultant, has defined enterprise search as, “a managed search environment that enables employees to find information they can rely on in making decisions that will achieve organization and personal objectives.” These two concepts complement each other and the webinar goes into far greater detail about their importance to your company and how you can think about them as you set up or revamp your knowledge management processes.
Remember, just like with a car or bike, there are systems to meet all sizes and budgets. We highly recommend you take the time to assess the systems you are looking at and understand what some of your objectives are before spending the money to connect your islands and silos of information. Having these goals and objectives in mind beforehand will save you countless hours during setup and limit the time and expense of future changes you might want to make as your company grows. You might ask yourself, “Why should I go through the headache of this in the first place, it sounds dreadful,” and, to this, I would reply that not leveraging technology will leave you at a competitive disadvantage. With a proper knowledge management and search system in place, you will be nimbler and obtain more insightful analysis into your operations and clients at both a tactical and strategic level.
You can find the slides from the presentation below:
This webinar was the last in a series of three others which included:
Where is it?!
You know you have it and just can’t find it right away.
With the myriad of online and on-premise systems available today it is very easy to get frustrated and delayed as your organization drowns in a steady flow of data generation. What is even more aggravating is that you know this issue costs you not just money, but opportunities as well. Your operational efficiency is negatively impacted and, according to various reports and our own experience with clients, this unnecessary restriction impedes your ability to maximize the value you can provide to your customers and partners.
Today, we have Google, Bing, and other search engines to comb the world’s information. What these engines do not see is everything on private networks, your corporate devices, and enterprise servers — and there really is no great, off the shelf solution to pick up the search slack. Very few companies have an efficient search engine for their own internal systems, yet an employee can often be found asking how can it be so hard to find an internal document. The solution to this problem begins with something called enterprise search, which Martin White defines expertly in his book (highly recommended buy) as, “Enterprise search is a managed search environment that enables employees to find information they can rely on in making decisions that will achieve organization and personal objectives.”
In this webinar, we will cover actionable steps that you can take in order to pluck your business information out of the depths of your many disconnected business systems such as Salesforce, WordPress, and more. We will also look at potential ways you can implement enterprise search best practices in order to get a hold of your business information and deliver what you do best in a faster and more efficient manner.
You can sign up for this Friday’s webinar here. We will start at 10am with a short introduction, followed by a presentation of the topic and demo and, if time allows, finish off with an open-forum style discussion. We look forward to seeing you there!
If you would like to check out some light reading prior to the webinar, this post by Martin White is a primer on what to consider when undertaking the development of an enterprise-wide search strategy.
Come see us present in person at one of our upcoming events. If you are in the DC area next week, Rahul will be moderating the monthly Data Wranglers DC event on Wednesday evening (11/9) at 6:30pm Presenter Mark Chapman will speak about integrating real-time data by using Apache Spark and Kafka for video and data stream analysis.