Office Hours With A Data Consultant And More - Our Newsletter's Official Day One
Data Science And Engineering
The CTO’s Synopsis
I have been sending random one-off articles to some of you for years now.
Hopefully, no more.
Welcome to my first official newsletter.
My day one so to speak.
I wanted to provide a more concise and clear newsletter that can truly provide value.
So, I am developing a more true form newsletter with some clear major benefits.
First, I wanted to create open office hours for my readers.
With every newsletter, I will provide the ability for you to sign up and talk with me about anything data related.
You might be the owner of a company looking to discuss your data strategy or a student looking for advice on your data science career. Whatever it might be.
There will be a set date with limited slots. For our first try of this, I will be setting up some time on January 19th.
Another thing I will do is that instead of just sending you entire articles, I will have a section dedicated to articles I have written as well as enjoyed over the past month. I won’t include the whole article, just either a bit of the intro or a synopsis and the link.
I will also likely include some companies I have found interesting and perhaps the occasional market analysis.
This is just the start and I will see what works best for you, the reader.
So let’s dive into office hours, great articles, and interesting companies in the data world.
Open Office With A Data Consultant
As explained above, I want to set up a date or two every few weeks for you guys to ask me questions personally. To start I will test out one-on-ones.
Maybe you have a question about data strategy, developing a data product, or just working in the data field.
Then you can set up some time with me on the date below.
There are only 4 slots open for this first round.
Articles Worth Reading
Why Run A/B Experiments?
Ryan spent some time earlier this year orchestrating a massive experiment for Firefox. He and his team launched a bunch of new features with Firefox 80 and they wanted to understand whether these new features improved our metrics.
In the process, they ended up talking with a bunch of Firefox engineers and explaining why we need to run a controlled experiment. There were a few questions that got repeated a lot, so they figure it's worth answering them here.
This article is the first in a series they are writing on building data intuition. This article is targeted at new data scientists or engineers interested in data. They also hope this becomes a useful resource for data scientists, so they can point their stake-holders to this resource.
How We Helped Our Clients Reduce Their Costs And Increase Profits In 2020
Technology has been speeding forward through 2020. The impacts of Covid have only sped up the adoption of change for companies.
Companies small and big, start-ups and corporations have been pushed to innovate and improve processes.
For example, small, medium, and large companies are migrating to the cloud to ensure their staff can do work from anywhere. This has been a decent amount of the work I have taken on this year.
Much of these changes I saw this year were thanks to the technology that has existed for the past 5-10 years. It’s just with the pressure from all the economic changes, companies have had to take a serious look at the tools they use. While also optimizing their processes.
In this article, I wanted to talk about how our clients reduced their costs and increased their profits in 2020. As well as discuss some of the trends I have seen over the past year as I worked with our various clients and where I see them accelerating.
How Do I Become A Data Engineer
Unlike some of the other technical roles that have degrees and, generally speaking, a defined path, data engineering is a little less straightforward. Many of us might have never even heard of data engineers when we were taking our college courses. Yet companies like Facebook, Amazon, PayPal, and Walmart all have data engineering roles open right now, and there are also plenty of startups looking for data engineers.
But how do you go from college student to data engineer? What degrees do data engineers have? How does one become a data engineer? What skills do data engineers have? What do data engineers do on a day-to-day basis?
These are just some of the questions I have gotten over the past year. I wanted to write an article to help answer many of them.
Emerging: How New Technologies Move from Obscurity to Ubiquity
Frontier tech, deep tech, hard tech, emerging tech. All of these terms attempt to describe technologies that are enabled by novel technical or scientific breakthroughs. Sometimes, these technologies are so consequential that they gain general-purpose technology status — a category which includes the steam power engine, the printing press, electricity, the automobile, the computer, the Internet, artificial intelligence, nanotechnology, and synthetic biology.
Others are less consequential or play an enabling role, but still have the power to shape society. Whatever terminology you prefer, the development of these technologies is increasingly vital as we face problems that require solutions well beyond our current capabilities. The adoption of these technologies has also created some of the world’s most valuable companies.
“Frontier tech is no longer at the frontier.” Deena Shakir
This author spent the last several years analyzing emerging technologies as a graduate student at UC Berkeley, as an investor with SamsungNext and Playground Global, and as a research fellow at the World Economic Forum, the United Nation’s ITU, and Stanford’s Future of Digital Currency Lab. What they learned about emerging technologies and their path from obscurity to ubiquity is outlined in the article, according to Oxford Language’s definition of the word “emerging.”
Companies We Are Watching
In every article we will be looking into start-ups and companies we are curious about. We will also be putting time into actually interviewing some of their teams.
Website Link: WhyLabs
The goal of WhyLabs Platform is to enable every enterprise, no matter how large or small, to run AI with certainty. Another way they put it is that WhyLabs is an AI observability platform built to enable every enterprise to run AI with certainty.
The team is built up of AI practitioners who have helped build ML platforms like SageMaker as well as ML experts who know the problems that face ML deployments.
The WhyLabs platform is specifically built for data science workflows, incorporating methods and features that we pioneered based on analogous best practices in DevOps. Furthermore, it is easy to install, easy to deploy, and easy to operate.
They specifically focus on helping in areas such as eliminating manual troubleshooting, logging, and profiling data as well as tracking the general model life cycle and connecting your model’s performance to product KPIs so your team can tie results to said performance.
Airbyte is a new open-source (MIT) EL+T platform that started in July 2020. It has a fast-growing community and it distinguishes itself by several significant choices:
Airbyte’s connectors are usable out of the box through a UI and an API, with monitoring, scheduling, and orchestration. Their ambition is to support 50+ connectors by EOY 2020. These connectors run as Docker containers so they can be built in the language of your choice. Airbyte components are also modular and you can decide to use subsets of the features to better fit in your data infrastructure (e.g., orchestration with Airflow or K8s or Airbyte’s…)
Similar to Fivetran, Airbyte integrates with DBT for the transformation piece, hence the EL+T. While contrary to Singer, Airbyte uses one single open-source repo to standardize and consolidate all developments from the community, leading to higher quality connectors. They built a compatibility layer with Singer so that Singer taps can run within Airbyte.
Airbyte’s goal is to commoditize ELT, by addressing the long tail of integrations. They aim to support 500+ connectors by the end of 2021 with the help of their community.
Website Link: DataKitchen
One of the most popular DataOps tools, DataKitchen is best for automating and coordinating people, environments, and tools in data analytics of the entire organization. DataKitchen handles it all – from testing to orchestration, to development, and deployment. Using this platform, your organization can achieve virtually zero errors and deploy new features faster than your business. DataKitchen lets organizations spin up repetitive work environments in a matter of minutes so teams can experiment without breaking production cycles. The Quality pipeline of DataKitchen is based on three core sections; data, production, and value. It is essential to understand that with this tool, you can access the pipeline with Python Code, transform it via SQL coding, design model in R, visualize in Workbook, and gain reports in form of Tableau.
The End Of Day One
With that, we will end day one.
We will be working to having this more consistent format of sections.
That being Office Hours, Articles Worth Reading and Companies We Are Watching.
We also might find some new sections worth adding in, but for now, this is where we are starting.
We want to thank all of our readers for sticking around with us for so long and we hope this new format provides way more value than the random articles we used to send.