Player FM - Internet Radio Done Right
24 subscribers
Checked 7M ago
Đã thêm cách đây bốn năm
Nội dung được cung cấp bởi Travis Lawrence. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Travis Lawrence hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
Player FM - Ứng dụng Podcast
Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !
Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !
Podcast đáng để nghe
TÀI TRỢ BỞI
S
State Secrets: Inside The Making Of The Electric State


1 The Secret To Getting Inspired: Millie Bobby Brown & Chris Pratt Go Behind The Scenes 21:04
21:04
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích21:04
Step into the mysterious and visually stunning world of The Electric State as host Francesca Amiker takes you behind the scenes with the creative masterminds who brought Simon Stålenhag’s dystopian vision to life. In this premiere episode, directors Joe and Anthony Russo, stars Millie Bobby Brown and Chris Pratt, writers Christopher Markus and Stephen McFeely, and producers Angela Russo-Otstot and Chris Castaldi reveal how they transformed a haunting graphic novel into an epic cinematic experience. Watch The Electric State coming to Netflix on March 14th. Check out more from Netflix Podcasts . State Secrets: Inside the Making of The Electric State is produced by Netflix and Treefort Media.…
Transform Your Object Storage Into a Git-like Repository With Paul Singman @ LakeFS
Manage episode 321576032 series 2881014
Nội dung được cung cấp bởi Travis Lawrence. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Travis Lawrence hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
In this episode we speak with Paul Singman Developer Advocate at Treeverse / LakeFS. LakeFS is an open source project that allows you to transform your object storage into a Git-like repository.
Top 3 takeaways
- LakeFS enables use cases like debugging to quickly view historical versions of your data at a specific point in time and running ML experiments over the same set of data with branching..
- The current data landscape is very fragmented with many tools available.. Over the coming years there will most likely be consolidation of tools that are more open and integrated.
- Data quality and observability continue to be key components of successful data lakes and having visibility into job runs.
43 tập
Transform Your Object Storage Into a Git-like Repository With Paul Singman @ LakeFS
Building the Backend: Data Solutions that Power Leading Organizations
Manage episode 321576032 series 2881014
Nội dung được cung cấp bởi Travis Lawrence. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Travis Lawrence hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
In this episode we speak with Paul Singman Developer Advocate at Treeverse / LakeFS. LakeFS is an open source project that allows you to transform your object storage into a Git-like repository.
Top 3 takeaways
- LakeFS enables use cases like debugging to quickly view historical versions of your data at a specific point in time and running ML experiments over the same set of data with branching..
- The current data landscape is very fragmented with many tools available.. Over the coming years there will most likely be consolidation of tools that are more open and integrated.
- Data quality and observability continue to be key components of successful data lakes and having visibility into job runs.
43 tập
Tutti gli episodi
×B
Building the Backend: Data Solutions that Power Leading Organizations

1 The Analytics Engine for All Your Data with Justin Borgman @ Starburst 36:12
36:12
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích36:12
In this episode we speak with Justin Borgman, Chairman & CEO at Starburst, which is based on open source Trino (formerly PrestoSQL) and was recently valued at $3.35 billion after securing their series D funding. In this episode we discuss convergence of DW’s / DL's, why data lakes fail and much much more. Top 3 takeaways The data mesh architecture is gaining adoption more quickly in Europe due to GDPR. There were two main limitations of data lakes when comparing to DW’s, performance and CRUD operations. Performance has been resolved with query engines like Starburst and tools like Apache Iceberg, Apache Hudi and Delta Lake are starting to close the gap with CRUD operations. The principle of a single source of truth / storing everything in a single DL or DW is not always feasible or possible depending on regulations. Starburst is bridging that gap and enabling data mesh and data fabric architectures.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Transform Your Object Storage Into a Git-like Repository With Paul Singman @ LakeFS 27:23
27:23
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích27:23
In this episode we speak with Paul Singman Developer Advocate at Treeverse / LakeFS. LakeFS is an open source project that allows you to transform your object storage into a Git-like repository. Top 3 takeaways LakeFS enables use cases like debugging to quickly view historical versions of your data at a specific point in time and running ML experiments over the same set of data with branching.. The current data landscape is very fragmented with many tools available.. Over the coming years there will most likely be consolidation of tools that are more open and integrated. Data quality and observability continue to be key components of successful data lakes and having visibility into job runs.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Enable Faster Data Processing and Access with Apache Arrow with Matt Topol @ Factset 49:15
49:15
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích49:15
In this episode we speak with Matt Topol, Vice President, Principal Software Architect @ FactSet and dive deep into how they are taking advantage of Apache Arrow for faster processing and data access. Below are the top 3 value bombs: Apache Arrow is an open-source in-memory columnar format that creates a standard way to share and process data structures. Apache Arrow Flight eliminates serialization and deserialization which enables faster access to query results compared to traditional JDBC and ODBC interfaces. Don’t put all your eggs in one basket, whether you're using commercial products or open source, make sure you design a modular architecture that does not tie you down to any one piece of technology.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Implementing Amundsen @ Convoy with Chad Sanderson 35:52
35:52
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích35:52
In this episode we speak with Chad Sanderson head of data and early stage startup advisor focused on data innovation @ Convoy and uncover their journey to implementing Amundsen, an open source data catalog. Below are the top 3 value bombs: Data Scientist’s should not be spending the majority of their time trying to find the data they are interested in. Amundsen is a powerful open source data catalog that integrates across your data landscape to provide visibility into your data assets and lineage. We often get lost in the features within data teams. It’s important to take a step back and understand how you're impacting the bottom line of the business.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 The Importance of Treating Your Data Initiatives as Products with Murali Bhogavalli 26:33
26:33
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích26:33
Your data team should not just be keeping the lights on, but should be building and creating data products to support the business. In this episode we speak with Murali Bhogavalli a data product manager and explore what is a data product manager and how they differ from a traditional product manager. Below are the top 3 value bombs: Data should be looked at as a product and treated as such within the organization (i.e. agile methodologies, continuous improvement…) Organizations need to be more than just data driven but also data informed. For that to happen, you need to build data literacy into your ecosystem by helping everybody understand what the data means and where is it coming from and the quality of it.. Product managers typically use data to deliver the outcomes. But for a data PM, data is the deliverable and it also the outcome.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Open-Source Data Catalog Amundsen with Mark Grover @ Stemma 41:11
41:11
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích41:11
In this episode of Building The Backend we hear from Mark Grover founder @ Stemma, co-creator of Amundsen. Stemma is a fully managed data catalog, powered by the leading open-source data catalog, Amundsen. Below are top 3 value bombs: Automated data catalogs are critical to help wrangle the growing data across organizations. (i.e. Being able to identify out of 150 columns on this table only 10 are being used downstream) Tribal knowledge and context cannot be automated - data catalogs cannot be 100% automated. Amundsen is an open-source data catalog originally created at Lyft. Stemma has created a managed version of Amundsen. Help me improve the podcast by completing this 60 second survey: https://buildingthebackend.com/survey…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Architecting a Modern Data Lake with Dipti Borkar from Ahana 39:32
39:32
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích39:32
In this episode of Building The Backend we hear from Dipti Borkar cofounder @ Ahana a managed service for Presto on AWS, where we talk all about the data lake, how it should be structured and where the industry is going. Below are top 3 value bombs: Presto is an open source distributed SQL query engine originally created by Facebook, mainly used to run SQL queries on data lakes but can be connected to relational data stores as well. Ahana is a managed Presto service on AWS with 3x price/performance. When optimizing your data lake, it’s normally best to store the data in Parquet or ORC format vs JSON or CSV as they are columnar formats that can have indexes built in. Data Lake Houses are continuing to gain popularity by bringing the benefits of your data lake and data warehouse together with the help of tools like Databricks DeltaLake and Apache HUDI.…
B
Building the Backend: Data Solutions that Power Leading Organizations

What tools are you using for data viz? Are they low cost? One option is Apache Superset, in this episode we speak with Robert Stolz to learn more about Superset and other open source data tools. Top 3 Value Bombs: One popular use case with Apache Superset is embedding it within applications because it’s open source, there is a wide range of flexibility to integrate it with existing systems. Apache Superset supports any sources supported by the Python SQL toolkit called SQLAlchemy. DBT encourages a set of best practices around data development (i.e. source control and test driven development).…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Edge Computing and Continuous Intelligence with Swim 34:17
34:17
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích34:17
In this episode of Building The Backend we hear from Simon Crosby – CTO @ Swim an open source edge computing operating system, where we talk all about edge computing, event streaming and much more. Below are top 3 value bombs: Edge means more than just being physically located somewhere it could also mean in the cloud. It really is the closest point of where your source data is being generated. Continuous intelligence is a design pattern where streaming data is directly tied into business operations. Kafka is continuing to hold it’s strong position in the event streaming space.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 12 Modern Data Architecture Principles That Should Be Implemented in 2022 20:24
20:24
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích20:24
This episode is a little different then the usual format. Instead of interviewing a data leader - I share what I consider are the 12 most important principles when designing a modern data architecture . Please message me on LinkedIn with the thoughts on this show.
B
Building the Backend: Data Solutions that Power Leading Organizations

1 The Keys to Good Data Quality With Prukalpa Sankar from Atlan 37:21
37:21
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích37:21
In this episode of Building The Backend we hear from Prukalpa Sankar – Co-founder of Atlan, where we talk all about data quality/governance, common issues organizations face when implementing data quality and much much more. Below are top 3 value bombs: Data Governance has a bad reputation. It should not be a bureaucratic controlling process that is pushed from the top down. Active Metadata is key to modern data architectures, essentially it’s putting together all the human and machine generated metadata together to derive insights. One of the most difficult metadata attributes to capture is the context for the data as this almost always requires input from humans and tribal knowledge is often lost and is not documented.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Designing a Modern Data Architecture – Teradata 44:29
44:29
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích44:29
This is a podcast episode you do not want to miss with Stephen Brobst, CTO @ Teradata. We discuss all things Data Warehouses, the shift to the distributed cloud and, key principles to implementing successful DW's. Top 3 Value Bombs: Large organizations are shifting more to a distributed / inter-cloud architecture for many reasons, a couple of reasons are data sovereignty, increasing residency and reducing costs. Just because your DW does not support indexing does not mean you do not need them. One of the most common reasons DW’s fail is they are led by IT and not the business. The DW should be led directly by the business needs and most important initiatives.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Exploring Open-Source Data Integration With Airbyte 35:42
35:42
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích35:42
“The hardest part of ETL is not building the connectors, it is maintaining them.” Truer words never spoken. Really enjoyed this episode with Michel Tricot CEO & Co-Founder of Airbyte where we discuss all things data integration and connectors. Top 3 value bombs: The future of ETL/ELT integration connectors may lie with open source. Many closed source data integration tools only create connectors if the ROI is there, but this leaves many tools out and speed to market can be slow. Airbyte has created a modular open source framework that allows the community to quickly build reliable data connectors. As Airbyte starts to monetize they have some innovative methods, one of which is if a developer from the open source community creates and maintains a connector they could potentially get a small percentage of revenue associated with that connector. Data governance ang logging is increasingly becoming more important in the coming years.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 How To Effectively Reduce Data Quality Incidents 10x with Datafold 39:12
39:12
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích39:12
This episode features Gleb Mezhanskiy Co-Founder & CEO @ Datafold, during our discussion we talk all about data observability and how to improve your data quality. Before Datafold, Gleb was a founding member of data teams at Lyft and Autodesk, where he built sophisticated data platforms and developed tooling to improve productivity and data quality. Top 3 Value Bombs: The foundation of any data observability platform is the data catalog. Data observability becomes increasingly difficult the more data sets you have if you do not define your process to track and monitor your data. Do not surprise your report consumers. knowing how your metrics will change in prod before your deployment can be done with the right data observability process and regression testing.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Applying Transformations to Streaming Data with Materialize 32:55
32:55
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích32:55
This episode features Arjun Narayan Co-Founder & CEO @ Materialize, during our discussion we talk all about transforming streaming data, the do’s the don’ts and how Materialize is changing the landscape of streaming. Top 3 Value Bombs: When creating schema changes organizations should always strive to create forward compatible schema changes only. This means consumers will be able to consume your data model without impacting them, they just may be missing your newly added column. Materialized computations are bound to change in the future, either due to bugs or requirement changes. Kafka allows you to replay all your previous messages to update the calculation. The cloud is still young, over the coming years we will see many more technologies that are specifically built with a cloud focus.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Optimizing Spark in the Cloud - with Jean-Yves Stephan 32:26
32:26
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích32:26
This episode features Jean-Yves Stephan Co-Founder & CEO @ Data Mechanics (recently Acq. by Spot by NetApp), during our discussion we talk about optimizing Spark to run in the cloud at a low cost. Top 3 Value Bombs: Running Spark CAN be expensive but there are ways to reduce your current operating costs by 50-75% by smart automations (i.e. tune for node type, memory and CPU). Spot instances can lower your costs by utilizing unused instances. Creating serverless architectures and using containers will allow for more flexibility with deployment models and scalability.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 How To Achieve Better Observability and Control Over Your Data Pipelines with Josh Benamram 37:03
37:03
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích37:03
This episode features Josh Benamrum, who is the co-founder of Databand. Databand is a company that helps engineering teams achieve better observability and control over their tech stack. Top 3 Value Bombs: When observing our data we should be looking at our data and pipelines Don’t wait till the board meeting for an incorrect metric to make DQ a priority Having clear SLA’s on just what data quality means across the organization is essential…
B
Building the Backend: Data Solutions that Power Leading Organizations

Travis welcomes to his podcast Saket Saurabh, who provides a window into the world of data management and the self-service options that are democratizing it. Co-founder and CEO of Nexla, Saket has a passion for data and infrastructure and how to improve its flow among partners, customers and vendors. Nexla automates various data engineering tasks, intelligently creates an abstraction of data and enables collaboration among people at different skill levels. Named a 2021 Cool Vendor by Gartner, Nexla is a leader in data preparation, integration and tracking. Top 3 value bombs: Data architectures overall need to be more abstract to enable future flexibility The first stumbling block for most organizations is not knowing where to locate their data. ETL is dead. The ELT model has become central while streaming and real-time use cases are becoming prevalent.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 A Powerful Open Source Database That Supports Many Storage Needs (MariaDB) 27:33
27:33
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích27:33
In this episode, we speak with Rob Hedgpeth, a director of developer developer relations at Maria DB. We explore all things Maria DB, the capabilities it has and when you should consider it for your next project. Top 3 value bombs: MariaDB follows a shared nothing architecture and supports distributed SQL for unlimited scaling on demand. MariaDB can handle many types of storage (i.e. document store, graph and spatial) When deciding on your next relational database do not just look at options available within your cloud service provider, include Databases as a Service within your analysis (i.e. Sky SQL - Maris DB’s commercial product).…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Increase the Quality and Reliability of Your Data 31:12
31:12
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích31:12
In this episode, we speak with Lior Gavish, the co-founder of Monte Carlo to explore all things data quality. Monte Carlo is a data lineage and observability tool that lowers your data downtime. Top 3 Value Bombs: Data products should be thought of in it’s entirely from the source to the consumer. No one data stakeholder can solve data quality issues, it’s a collaboration of the data engineers, business, data consumer and even software to help automate certain aspects of cataloging and capturing meaningful metadata. Good data quality processes should alert you to anomalies in your metrics before your data consumers do.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Build Real-Time Data Pipelines in Minutes Not Months with Meroxa 36:33
36:33
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích36:33
In this episode, we speak with DeVaris Brown, he is the CEO and co-founder of Meroxa, which is a data platform that enables organizations to build real time data pipelines in minutes not months. Prior to founding Meroxa, DeVaris was a product leader at Twitter, Heroku, and Zendesk. In this episode we will be talking about all things data ingestion. Top 3 Value Bombs: Data ingestion should be in real time to provide the most flexibility across your use cases. Real time ingestion is not as complex as it used to be - many tools i.e. Meroxa simplify this process Stand on the shoulders of giants - take advantage of open source technologies and build on top of them. Make your data extendable to be able to be consumed from various destination types.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Launch, Monitor, and Share Data Pipelines In a Matter of Minutes 32:07
32:07
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích32:07
In this episode, we speak with Blake Burch, co-founder of Shipyard, a data orchestrator tool that allows you to create powerful workflows in a matter of minutes. Top 3 Value Bombs: Data tests are often for the assumptions we already know. There's a lot of unknowns that can crop up and cause issues that tests are not catching. Start analyzing job metadata to alert on potential anomalies. Store your raw data to allow the most flexibility when it comes to re-transforming the data. Don’t settle for scatter shot troubleshooting. Have a clear lineage of how your data is being used from the source to the various consumers.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 The Data Warehouse for Distributed Clouds - Yellowbrick 37:57
37:57
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích37:57
In this episode, we speak with Mark Cusack, CTO at Yellowbrick. Yellowbrick is a data warehouse platform that was built from the ground up for performance and cost that can be deployed across clouds and on-prem. Top 3 Value Bombs: Yellowbrick DW was recently named a contender in Cloud Data Warehouses by Forrester Research and they are able to achieve 100X performance at 1/5th the price against many competitors. As data production is exponentially increasing at the “edge” the need to pre-process and keep the data where it is is becoming critical. The distributed cloud model helps solve this increasing problem. Yellowbrick was created from the ground up with a focus on performance and cost, a few of its technical features include a custom Linux-based OS kernel, data is read directly from primary storage into the CPU cache, and custom network drivers.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 What You Should Know Before Getting Started With Data Science with DATA SCIENCE I N F I N I T Y 43:42
43:42
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích43:42
In this episode, we speak with Andrew Jones who has spent 13 years in Data Science at companies including Amazon & more recently Sony PlayStation where he developed and prototyped Machine Learning based features for the PlayStation 5, several of which have been patented by Sony. Since then he has created the DATA SCIENCE I N F I N I T Y community to support folks on there data science journey. Top 3 Value Bombs: 85% of AI projects fail, one of the reasons is due to going too complex too soon. When solving problems with data science, you should always start with the business problem first. Having a strong understanding of these foundational data science models will help you solve the majority of data science problems: linear regression, logistic regression, decision trees and, random forest Learning is a journey not a destination :) Find out more here: https://data-science-infinity.teachable.com/…
B
Building the Backend: Data Solutions that Power Leading Organizations

In this episode, we speak with Tejas Manohar, Co-Founder of Hightouch, a leading Reverse ETL platform. That syncs data from your warehouse or lake back into tools your business teams rely on. Top 3 Value Bombs: Organizations should be sending more holistic customer data back into their marketing solutions. Reverse ETL is the process of creating pipelines to extract data from the warehouse/lake and move back into operational components. Utilize CDC when extracting data to minimize the impact to your source system.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Become a Data Driven Organization with Christina Stathopoulos an Analytical Lead at Waze @Google 33:44
33:44
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích33:44
In this episode, we speak with Christina Stathopoulos who works at Google as an Analytical Lead with Waze, a crowdsourced mobile navigation app. She is also an adjunct professor at IE Business School and guest lecturer at ISDI where she teaches analytics courses in the MBA programs. In this episode we will discuss the current landscape of data, challenges organizations face when becoming data driven, and top reasons AI projects fail. Top 3 Value Bombs: One of the main blockers for organizations to become data driven is lack of C-suite support. Your AI projects will fail. Expect that and learn from your mistakes. Hire a team which focuses just on data quality and governance. Reports are only as good as the data they are built on top of.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Designing Scalable Data Architects with Dr. Mark Tabladillo from Microsoft 40:46
40:46
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích40:46
In this episode, we speak with Dr. Mark Tabladillo. Mark is a thought leader in the AI/ML space at Microsoft where he creates technical architectures for artificial intelligence and data science solutions. Top 3 Value Bombs: Many organizations struggle to understand their current data landscape. Azure Purview can help you manage and govern your data. Technology and best practices are changing rapidly. Even the pros iterate, iterate and iterate when building out new architectures to best meet the use cases. Check out Azure’s AI business school, a free resource to learn how to drive lasting business impact.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Reduce Data Movement and Decrease Processing Times with a Machine Scale Feature Store(Molecula) 46:14
46:14
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích46:14
In this episode, we speak with H.O. Maycotte. H.O. is the CEO/founder of Molecula, an enterprise feature store that simplifies, accelerates, and controls big data access to power machine-scale analytics and AI. Molecula is powered by Pilosa, an open source project created by H.O. and team. Pilosa eliminates the need to copy data between systems in order to make it accessible for analytical and machine learning purposes at scale. Leading companies like Spotify, Hulu, Uber, Zillow, ESPN are all utilizing Pilosa. Top 3 Value Bombs: Large volume or complex workloads on a lake or DW can take hours to process, feature stores can reduce that to seconds in some scenarios. On average, over 80% of an organization's data are copies of the original data. Reduce the movement and duplication of data across your organization. Organizations do not have a data volume problem but rather data readiness problem. Enable the business with “real time” data.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Why You Should Be Using (CDC) Change Data Capture for Ingestion with Datacoral 40:47
40:47
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích40:47
In this episode, we speak with Raghu Murthy. He is the founder of Datacoral, which provides serverless architectures that support data pipelines and orchestration to facilitate ELT into your Data Warehouse. Prior to founding Datacoral he was at Yahoo, Facebook and was part of the initial team that developed Hive. In this episode we will explore the best patterns for ingesting operational data into your data warehouse, creating metadata first architectures and the role Datacoral serves. Top 3 Value Bombs: If you're migrating relational data that supports CDC, you should be using CDC to migrate it for the majority of use cases. ELT/ETL pipelines should be orchestrated by a metadata first style architecture. Consumers of the DW, should be notified if data is incomplete.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Integrating Large Scale Microservices Architectures with your Data Platform with Sunu Sasidharan 25:48
25:48
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích25:48
In this episode, we speak with Sunu Sasidharan. Sunu is the technology lead at Cuologic where he helps global brands implement large scale architectures, devops automation and data engineering across multiple technology stacks. Top 3 Value Bombs: When creating distributed databases, you can only achieve two of the following three: consistency, availability, partition tolerance (CAP Theorem) Design data pipelines that support schema evolution. Teams should be able to add new functionality without causing downstream impacts. The entire deployment and creating of data objects/pipelines should be managed by code and version controlled.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Transportation Modeling and Autonomous Vehicles With Matt Battifarano 35:04
35:04
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích35:04
In this episode, we speak with Matt Battifarano. Matt is a data scientist focusing on transportation modeling. He first started his career as a data scientist at a startup called Bridj where they created a smart micro-bus platform for urban transit similar to Uber Pool. Currently he’s working towards his PHD at Carnegie Mellon at their Mobility Data Analytics Center. Top 3 Value Bombs: K.I.S.S - Keep it super simple. Do not over complicate the process, especially when just starting out prototyping.. The importance of understanding the purpose/history of methods in AI/ML There are many challenges to developing autonomous vehicles - one in particular is creating an environment where you can test it safely. Unlike many AI/ML tests you cannot perform A/B tests with end users.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 The Importance of Self Service BI with 5xData 26:30
26:30
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích26:30
In this episode, we speak with Tarush Aggarwal. Tarush is the founder of 5xdata, where he helps companies build a strong data foundation with self-service BI to enable the business. Prior to starting 5xData he was one of the first data engineers on the analytics team Salesforce and helped scale the data team WeWork from 5 to 100+. Top 3 Value Bombs: Ingest all your raw data into a central location and build your data models on top of that. When organizations are first building out a data platform, the first item they should focus on is building out a self-service BI tool. The use case for data lakes may be on the decline with the ability to separate storage and compute within data warehouses…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 The Next Wave of AI and Creating Intelligent Cognitive Assistants with aigo.ai 32:34
32:34
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích32:34
In today’s episode, we will speak with Peter Voss and discuss the current landscape of AI, the next wave of AI called Artificial General Intelligence, and how organizations today can level up their chatbots to create satisfied customers. Peter Voss is a Serial Entrepreneur, and Pioneer in Artificial Intelligence. Who coined the term ‘AGI’ (Artificial General Intelligence) with fellow luminaries in the space. At the age of 25 Peter, IPO’ed a company he started that grew to over 400 people. Since then he’s been focusing on AI and recently launched aigo.ai an intelligent cognitive assistant that delivers highly intelligent and hyper-personalized conversational assistants at scale for the enterprise – Top 3 Value Bombs: AI solutions today are considered “narrow”. Artificial General Intelligence is the next wave where AI solutions will be more autonomous. Chatbots can either frustrate customers or satisfy them. Create chatbots with a “brain” that can automatically kick off processes to meet their needs. Graph DB’s are critical to enabling artificial general intelligence…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Learn How LinkedIn is Future-Proofing There Data Architecture 41:18
41:18
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích41:18
In today’s episode, we will speak with Kapil Surlaker, the vice president of engineering at LinkedIn. Kapil has been with LinkedIn for over 10 years and has played an instrumental role in shaping the data architecture that LinkedIn is built on top of. In this episode, we cover a wide range of topics surrounding data architecture from: How metadata is captured and served up Future-proofing the data architecture The shift from on-prem to Azure How LinkedIn monitors the quality of there data in real-time…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 DataOps Is Not Just DevOps for Data with DataKitchen 28:14
28:14
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích28:14
In today’s episode, we will speak with Chris Bergh, a pioneer in the DataOps landscape and the CEO at DataKitchen, a DataOps Platform that Simplifies Complex Data Toolchains and Environments Top 3 Value Bombs: DataOps is not just DevOps for data Any organization can get started today and start implementing DataOps practices. Start small and prioritize quick wins. The people/process is just as important as the tools used if not more so when implementing DataOps. If you enjoy this episode and want to learn more, please head on over to DataKitchen.io to download your free copy of the DataOps Cookbook.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Getting Started with AI While Avoiding R&D Failures 37:38
37:38
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích37:38
In today’s episode, we will speak with Manny Bernabe and discuss the current landscape of AI, how to get started implementing AI solutions and what organizations should be doing today to set them up AI success in the future. Manny is the founder of BigPlasma.ai and has 10+ years of experience creating and deploying AI & Machine Learning solutions and products to industries from financial services to semiconductor manufacturing. He helps innovation leaders build AI & analytics products/services while avoiding R&D failures. Top 3 Value Bombs: If your organization want’s to be in business in 10 years, then you need to start prototyping/creating AI solutions AI should be top of mind when designing modern data architectures. Start collecting data today for the questions you want to be able to answer three years from now.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Cleaning Dirty Data with the Classification Guru 17:00
17:00
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích17:00
In today’s episode, we will speak with Susan Walsh and learn why organizations struggle with creating and maintaining high-quality data and the steps she takes to resolve data issues. Susan Walsh has nearly a decade of experience fixing your data and founded the classification guru. Susan is a specialist in data classification and data cleansing. She is passionate about helping you find the value in cleaning your dirty data and raises awareness of the consequences of ignoring issues through our blog, webinars, and speaking engagements. Top 3 Value Bombs: One reason organizations struggle with having high-quality data they don’t value data as an investment but rather a cost. You must be able to track the lineage of your data to the source-of-truth, it’s critical to resolving data quality issues. Don’t just fix bad data, identify the root cause and resolve the issue there.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Data Teams: A Unified Management Model for Successful Data-Focused Teams 40:33
40:33
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích40:33
In today’s episode, we will speak with Jesse Anderson and learn how to run successful big data projects and how to resource your teams. Jesse is a big data expert at Big Data Institute, who’s worked with startups to Fortune 100 companies. He has taught over 30,000 people the skills to become data engineers and is published in prestigious publications such as The Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. Top 3 Value Bombs: It’s not easy to retool your traditional DW team to support big data technologies A general ratio you should have is, 2-5 data engineers for every 1 data scientist The importance of having three data teams and having them staffed properly which include operations, data engineers, and data scientists…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 DataSecOps - Increase the Security of Data While Making it Simple to Manage with eXate 41:57
41:57
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích41:57
In today’s episode, you will hear from the co-founders of eXate Peter Lancos and Sonal Rattan. eXate streamlines, automates and simplifies the processes of storing, interpreting, and extracting value from data assets. It democratizes data privacy for organizations by providing a simple, embedded platform that automates the technical enforcement of data policies. Top 3 Value Bombs: DataSecOps is the collaboration and automation of policies enforcement from the various teams across the organization; Dev, Legal, Security, Governance, Risk, Compliance, Data Stewards, and Data Owners. Data policies should be centrally managed and automatically applied, you should not be reliant upon having a federated distributed way of applying data policies. Privacy-enhancing technologies will allow you to minimize the use of personal data without losing the functionality of the data…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage 31:04
31:04
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích31:04
In today's episode you will hear from Doug Laney, a best-selling author and recognized authority on data and analytics strategy. Doug’s book, Infonomics: How to Monetize, Manage, and Measure Information for Competitive Advantage, was selected by CIO Magazine as the “Must-Read Book of the Year” and one of the “Top 5 Books for Business Leaders and Tech Innovators.” Top 3 Value Bombs: "You can’t manage what you don’t measure”, companies should be measuring the quality characteristics of their data, the economic values it provides, and how complete and accurate it is. Organizations should be sharing more data across business units, The quickest way to get your data scientists, to find another job is to make him or her curate and harvest data. There should be a team or person in charge of curating /procuring the data.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Disrupting Data Governance - Everybody Should be a Data Steward 27:20
27:20
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích27:20
In today’s episode, you will hear from Laura Madsen with 20+ years in data and analytics, authoring books on data governance and healthcare analytics, and co-founded Minneapolis-based consulting firm Via Gurus. Top 3 Value Bombs: Data governance should be democratized throughout the organization Data Governance is a journey, not a destination. Most organizations are not prioritizing governance and putting the dollars behind it to ensure it's successful. Everybody in the organization should be a data steward. If you're looking at a report or metric that seems wrong, it’s your responsibility to escalate it. That process should be welcoming and efficient.…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 Azure Exam DP-203: Data Engineering on Microsoft 44:50
44:50
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích44:50
In today’s episode, you will hear from Chris Testa-O'Neill, a thought leader in the Microsoft Data and AI space, currently, part of the World Wide Learning team at Microsoft scaling his knowledge to thousands of people through his official Microsoft Learn content and Microsoft courses. Top 3 Value Bombs: Why Microsoft is merging the DP-200 and DP-201 exams into one exam (DP-203) Automation and flexibility in data architectures are key for building out modern data architectures. One of the top reasons cloud projects fail is the lack of stakeholder buy-in from upper management…
B
Building the Backend: Data Solutions that Power Leading Organizations

1 TRAILER: Welcome to Building the Backend - EP0 2:06
2:06
Nghe Sau
Nghe Sau
Danh sách
Thích
Đã thích2:06
Welcome to the Building the Backend Podcast! We’re a data podcast focused on uncovering the data technologies, processes, and patterns that are driving today’s most successful companies. In this trailer episode, you will get a glimpse of what to expect with episodes on the show.
Chào mừng bạn đến với Player FM!
Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.