Our research, knowledge, thoughts, and recommendations about building and leading businesses on the Internet.
Rahul Singh, CEO of Anant, had the opportunity to co-organize and MC the May Meetup of Data Wranglers DC where the speakers, John Clune and Timothy Hathaway, covered two topics related to open government and public data for data processing and visualization. We had a great turnout at the event and had the chance to do some networking after.
Our next two meetups will focus on Search & Knowledge Management (June Meetup) and Machine Learning for Data Processing (July Meetup), check out the Meetup page for more details when they become available.
Big thank you to John Clune and Timothy Hathaway for taking the time to present to the group if you have any interest in speaking please don’t hesitate to reach out to Rahul at firstname.lastname@example.org.
Below you can find a recording of both presentations.
As technology has continued to mature in the last two decades there have been many challenges overcome, obstacles faced, and solutions crafted. A recurring theme, in the area of obstacles (more specifically, self-imposed obstacles), has been the propensity for software companies to more often than not 1) turn to developing applications from the ground up for a particular problem or 2) take existing pieces of software that are perfectly fine for their specific use case and then tailor them to a different (but sometimes slightly similar) use case.
Software Algebra is essentially a best practice in software development to make sure that we are using 1) the tools best suited to a particular problem 2) while also dodging the trap of re-inventing the wheel by starting from scratch or trying to fit a tool into solving a problem it was never intended to address.
There are multiple cases in which this best practice is entirely ignored, most commonly so by inexperienced software architects who have the “my hammer can solve all problems” mindset. Often times, one of the best ways to avoid falling into this trap is to relentlessly focus on getting a Minimal Viable Product (MVP) out of the door in a time-boxed span of time and iterating multiple times on that MVP to steadily bring it up to support all use cases.
We recently spoke about this topic at the WebTech Conference in Washington, DC and will be doing so again on Tuesday, April 11th at 6PM at the University of Maryland Baltimore County, you can find additional details as well as register for the event here.
I had the pleasure this past Wednesday of introducing Eric Pugh (@dep4b) to the Data Wranglers DC Meetup group. He spoke about using Solr and Zeppelin in data processing and; specifically, the ways big data can easily be processed and displayed as visualizations in Zeppelin. Also broached was Docker, an application Anant uses, and its role in setting up environments for data processing and analysis. Unfortunately, no actual blimps or zeppelins were seen during the talk, but the application of data analysis to events they usually fly over was presented on last month during a discussion about Spark, Kafka, and the English Premier League.
Instead of trying to completely rehash Eric’s presentation, please check out his material for yourself (available below). In short, he showed how multiple open-source tools can be used to process, import, manipulate, visualize, and share your information. More specifically, Spark is a fast data processing engine which you can use to prepare your data for presentation and analysis. Whereas, Zeppelin is a mature, enterprise-ready application; as shown by its recent graduation from Apache’s Incubator Program; and is a great tool to manipulate and visualize processed data.
Please don’t hesitate to reach out with any questions or if you are interested in participating or speaking at a future Data Wranglers DC event. Each event is recorded, livestreamed on the Data Community DC Facebook page and attended by 50 or more individuals interested in data wrangling, data processing, and possible outcomes from these efforts. After the monthly event, many members continue their discussions at a local restaurant or bar.
I hope to see you at an event in the near future!
It was great to get together with Big Data enthusiasts last week at the Metro Offices co-working space. If you attended, please send a thank you to Robert Katz (@) for opening it up. The conversation was an open forum discussion among professionals with backgrounds in digital marketing, law, and finance. There was general agreement that companies need to use data to make decisions; however, participants noted basic impediments such as understanding or identifying the underlying importance of a data point or metric, separating correlation from causation, and wading through unknown-unknowns.
What was generally agreed on is there is value hidden within the information being collected and it is necessary to understand how to access and apply that data to help ourselves and clients stay ahead of the competition. At a basic level, spreadsheets and tools such as Google Analytics provide information about what is happening, but it is important to also understand why numbers are changing and if they are changing in an anticipated manner that indicate a stronger company going forward. This latter point touches on measuring the effects of actions taken by a company, primarily through marketing, and the ultimate return on investment of those actions, whether intended or unintended. The final point, exploring the ‘unknown-unknowns’, or ‘what is possible’, is what seemed to capture the greatest attention of many in the room. Never has there been the opportunity to compare and analyze so many data points and it is sometimes overwhelming to do so or even arduous as information is siloed in separate databases – a challenge Anant is helping businesses overcome.
Big Data is a big topic and this group conversation will continue at the monthly breakfasts through the end of the year. While the focus for the next breakfast on September 30 has yet to be set, it will most likely delve into both successful and unsuccessful use cases of big data for data-driven decisions in a business environment. Please do not hesitate to contact us with other topic suggestions for either September or subsequent breakfasts. In the interim, you can sign up for the 30th or join us for a webinar on connecting online business software on September 16. We look forward to seeing you again or meeting and talking with you at one of our upcoming events.
The state of the world’s information systems are changing and so should your data processing habits. As the cloud takes precedence in IT environments, different systems that run the modern enterprise are not on the same network or system. These systems have data that business users need to leverage on a day to day and sometimes on an immediate basis.
As big data evolves, we have seen movements from batch processing to micro-batches, to stream processing. All of this is great but folks still need to connect the internet together somehow to access the data.
This presentation was delivered by CEO, Rahul Singh, at The George Washington University to the Data Wranglers DC Meetup on data processing. It outlines the challenge of the current state of business and explains that asynchronous processing is the way to manage the growing sources and volume of business information.
The discussion outlined four main points:
• Thoughts of why “Asynchronous” is the Future
• Discussion about Batch, Micro-Batch, Streaming
• Difference between a Queue / Enterprise Service Bus
• Proposed Architecture for Asynchronous Data Processing
Take a look at the slide presented below.