Anant Corporation Blog

Our research, knowledge, thoughts, and recommendations about building and leading businesses on the Internet.

Tag Archives: big data


The Swarm of Sources

Reactive Manifesto, The Next VisiCalc, and Future of Business Technology

Thanks to some of our great partnerships, our firm has recently consulted at University of Michigan, Cisco, Intuit, and Kroger and at several government agencies on business information and enterprise technology. Even though we don’t directly create consumer technology or applications, eventually all consumer technologies have a backend enterprise technology that makes it work because a consumer technology company backed by crappy technology for the enterprise is bad for business.
I’ve been sensing a shift in business information for a while. Business information, the frequency it’s created at, the number of sources it comes from is only increasing, exponentially if not logarithmically. This means, that businesses, and subsequently end-users need to rely on real-time data processing and analysis of this information. The businesses that embrace the “reactive manifesto” of how to build software and technology are going to succeed in the new world where data is coming from millions of people through their mobile devices, processes through applications and software processes, information through global data sources and APIs, and systems in the form of servers and things all over the globe. The “swarm” of sources is mind-boggling.
The Swarm of Sources

The Swarm of Sources

The first business response to all this business information is: let’s bring it all together to analyze it and visualize it. That’s horseshit. Even with the big data technologies out there today, it is wasteful to try to process all of it at the same time. That’s like trying to understand how the universe works at every second. The better response is to understand what’s happening and react to it at the moment in the context that it is important.
This reactive methodology of building large infrastructure can help businesses react to new IoT initiatives, integrating with numerous business software to run the modern enterprise, and partnering with other modern enterprises. Whatever you see out there in apps, devices, sites, and APIs has to be managed in the back. The reason for silicon brains is stronger when you just can’t do it with carbon brains. Technology has to get better faster through iterative machine learning in order keep up with the amount of data that’s being created.
Commercial organizations are being thrown sledgehammers to solve things by vendors such as Oracle, Cloudera, MapR, DataBricks, etc. Although these products are great, they are more like Personal Computers .. but without the real “Killer App.” They aren’t solving industry-specific / vertical problems. Consulting companies waste inordinate time & materials costs to get it “right.” What people need are “lego block” software so that non-technical folks can self-service their information needs without hiring a team of data analyst, data engineer, data architect, data scientist, data visualizer, and of course a project manager. (If you do need a team today, Anant provides an elastic team with all of those skills for the same investment per month as a part-time or full-time employee. Message me or my team.)
I believe the major breakthrough that will change the experience for business technology users is going to be system design tools that help them get what they want without knowing how to program. I don’t know what it will look like, but we need a VisiCalc for the new age, and no it’s not Google Spreadsheets. It’s something else altogether. It’s something that will fix the yearning for a tool that helps people mashup and leverage various gradients between structured and unstructured data in dynamic knowledge pages that always keeps us up to date on what we care about. A machine that learns what we need to know and summarizes it for us, but also allows us to manipulate the knowledge even if it is being created in 10 different systems.

DC Data Wranglers: It’s a Balloon! A Blimp! No, a Dirigible! Apache Zeppelin: Query Solr via Spark

I had the pleasure this past Wednesday of introducing Eric Pugh (@dep4b) to the Data Wranglers DC Meetup group. He spoke about using Solr and Zeppelin in data processing and; specifically, the ways big data can easily be processed and displayed as visualizations in Zeppelin. Also broached was Docker, an application Anant uses, and its role in setting up environments for data processing and analysis. Unfortunately, no actual blimps or zeppelins were seen during the talk, but the application of data analysis to events they usually fly over was presented on last month during a discussion about Spark, Kafka, and the English Premier League.

 

Instead of trying to completely rehash Eric’s presentation, please check out his material for yourself (available below). In short, he showed how multiple open-source tools can be used to process, import, manipulate, visualize, and share your information. More specifically, Spark is a fast data processing engine which you can use to prepare your data for presentation and analysis. Whereas, Zeppelin is a mature, enterprise-ready application; as shown by its recent graduation from Apache’s Incubator Program; and is a great tool to manipulate and visualize processed data.

 

 

 

Please don’t hesitate to reach out with any questions or if you are interested in participating or speaking at a future Data Wranglers DC event. Each event is recorded, livestreamed on the Data Community DC Facebook page and attended by 50 or more individuals interested in data wrangling, data processing, and possible outcomes from these efforts. After the monthly event, many members continue their discussions at a local restaurant or bar.

 

I hope to see you at an event in the near future!

How to use Kafka to understand the English Premier League

Here at Anant we are very interested in data wrangling (aka data munging), which basically means, we want to be able to help people take data in one format and convert it to a form that best suits their needs. One way we keep up to date is through the excellent Data Wranglers DC group that meets monthly here in Washington.

 

At the most recent meeting, the group tackled the challenge of integrating real-time video and data streams. Mark Chapman, who is a Solutions Engineering Manager at Talend, explained how his company utilized Spark and Kafka in their product to analyze real time data in the English Premier League (EPL). In addition to the video inputs at 25 frames per second from cameras throughout the stadium, the stream was correlated to data connected to players’ heart rates and other measurements. The EPL is then able to overlay this information into replays to improve presentation and analysis as well as send data to companies offering in-game wagers.

 

The presentation was very interesting and Mark graciously shared his slides:

 

If you are in any way interested in data wrangling – just like it sounds, getting data under control and to work for you – we would love to hear from you and let you know what might be possible with your data streams. If you are in DC and are interested in the technical side of data munging, please come out to the next event and meet us. This past presentation was hosted by ByteCubed (@bytecubed), in Crystal City, but the gatherings have been in Foggy Bottom as well.

Strategy Breakfast – Wins and Failures in Big Data Projects

Last week we hosted a monthly Strategy Breakfast at the brand new Social Tables office near Metro Center. The group discussed issues that hinder enterprises when they attempt to undertake big data centered solutions, lessons that can be applied from implementation mistakes, and best steps to ensure success in data analysis efforts. A special thank you goes out to Social Tables and its staff for hosting and participating as well as to Mike Seigel (@mikejsiegel) who made this event possible.

 

Many companies and enterprises are already looking at or understand the importance of harnessing value from their data. A critical point addressed at the breakfast related to determining what types of information you want to extract from your information; and that you may not know what queries to ask when starting out on the implementation of a “big data” analysis or integration endeavor. One attendee suggested listing out questions you believe the project will potentially answer and establish a baseline for success – establish your desired outcome for the project. These questions and the success threshold will vary by company but will help you focus on critical business problems or challenges you are attempting to overcome. Basically, these are questions you want answered and you can clearly explain why you want them answered. For example, will the answers provide a greater understanding of project costs? Ways to improve operational efficiency? Will they better or more quickly inform decision-making? This exercise should initially be done internally by company stakeholders from various departments before engaging an outside firm to help hone the questions and construct the infrastructure for a solution.

big data

 

Over the course of the discussion, a consensus emerged that most big data projects aren’t really big data projects. Most of these projects are actually business analytics projects with a high level of complexity caused by the integration of both new and legacy systems. Many companies rush towards a “big data” solution without clearly defined business problems they want to address or not identifying benefits they wish to reap from their business information. Essentially, for a project to be successful a company needs to undertake a thorough assessment of its internal processes and desired outcomes to fully realize the benefits of data collection and analysis. Most “big data” projects don’t revolve around the issue of data sets being too big, as much as they do around connecting isolated data troves, appropriately framing the business benefits that will arise from answering data-driven questions, and finding the appropriate talent to build the infrastructure.

 

Join us later this month as we continue to look at technological issues facing today’s businesses and modern enterprises. The discussion will most likely focus on open-source data analysis tools you can use to help your company become more data-driven without making a significant monetary outlay. Please do not hesitate to contact us with any suggestions!

 

If you wish to learn more you can attend our webinar on “Unifying Business Information with Portals and Dashboards (B2B)” on October 14, at 10AM.

 

Strategy Breakfast – Sweet Spot for Big Data

It was great to get together with Big Data enthusiasts last week at the Metro Offices co-working space. If you attended, please send a thank you to Robert Katz (@Robert_Katz) for opening it up. The conversation was an open forum discussion among professionals with backgrounds in digital marketing, law, and finance. There was general agreement that companies need to use data to make decisions; however, participants noted basic impediments such as understanding or identifying the underlying importance of a data point or metric, separating correlation from causation, and wading through unknown-unknowns.

 

Analytics

What was generally agreed on is there is value hidden within the information being collected and it is necessary to understand how to access and apply that data to help ourselves and clients stay ahead of the competition. At a basic level, spreadsheets and tools such as Google Analytics provide information about what is happening, but it is important to also understand why numbers are changing and if they are changing in an anticipated manner that indicate a stronger company going forward. This latter point touches on measuring the effects of actions taken by a company, primarily through marketing, and the ultimate return on investment of those actions, whether intended or unintended. The final point, exploring the ‘unknown-unknowns’, or ‘what is possible’, is what seemed to capture the greatest attention of many in the room. Never has there been the opportunity to compare and analyze so many data points and it is sometimes overwhelming to do so or even arduous as information is siloed in separate databases – a challenge Anant is helping businesses overcome.  

 

Big Data is a big topic and this group conversation will continue at the monthly breakfasts through the end of the year. While the focus for the next breakfast on September 30 has yet to be set, it will most likely delve into both successful and unsuccessful use cases of big data for data-driven decisions in a business environment. Please do not hesitate to contact us with other topic suggestions for either September or subsequent breakfasts. In the interim, you can sign up for the 30th or join us for a webinar on connecting online business software on September 16. We look forward to seeing you again or meeting and talking with you at one of our upcoming events.