Data Archives Software Engineering Daily public
[search 0]
More
Download the App!
show episodes
 
Loading …
show series
 
Databases underpin almost every user experience on the web, but scaling a database is one of the most fundamental infrastructure challenges in software development. PlanetScale offers a MySQL platform that is managed and highly scaleable. Sam Lambert is the CEO of PlanetScale and he joins the show to talk about why he started the platform, scaling …
  continue reading
 
Anaconda is a popular platform for data science, machine learning, and AI. It provides trusted repositories of Python and R packages and has over 35 million users worldwide. Rob Futrick is the CTO at Anaconda, and he joins the show to talk about the platform, the concept of an OS for AI, and more. This episode is hosted by Lee Atchison. Lee Atchiso…
  continue reading
 
Apache Iceberg is an open source high-performance format for huge data tables. Iceberg enables the use of SQL tables for big data, while making it possible for engines like Spark and Hive to safely work with the same tables, at the same time. Iceberg was started at Netflix by Ryan Blue and Dan Weeks, and was open-sourced and donated to the Apache S…
  continue reading
 
Starburst is a data lake analytics platform. It’s designed to help users work with structured data at scale, and is built on the open source platform, Trino. Adam Ferrari is the SVP of Engineering at Starburst. He joins the show to talk about Starburst, data engineering, and what it takes to build a data lake. Full Disclosure: Starburst is a sponso…
  continue reading
 
Vercel provides a cloud platform to rapidly deploy web projects, and they develop the highly successful Next.js framework. The company recently made headlines when they announced v0 which is a generative AI tool to create React code from text prompts. The generated code uses open-source tools like Tailwind CSS and shadcn/ui. Lee Robinson is the VP …
  continue reading
 
Algolia is a platform that provides search as a service. The company was founded in 2012, was part of Y Combinator’s Winter 2014 class, and has become highly popular for integrating modern search functionality into web-facing services. Sean Mullaney is the CTO of Algolia and has worked at Google X, Stripe, and Zolando. He joins the show today to ta…
  continue reading
 
Jodie Burchell is the Data Science Developer Advocate at JetBrains, which makes integrated development environments or, IDEs, for many major languages. After observing the rapid growth of the AI coding assistant landscape, the company recently announced integration of an AI assistant into their IDEs. Jodie joins the show today to talk about why the…
  continue reading
 
This episode of Software Engineering Daily is part of our on-site coverage of AWS re:Invent 2023, which took place from November 27th through December 1st in Las Vegas. In today’s interview, host Jordi Mon Companys speaks with Ankur Mehrotra who is the Director and GM of Amazon SageMaker. Jordi Mon Companys is a product manager and marketer that sp…
  continue reading
 
An embedding is a concept in machine learning that refers to a particular representation of text, images, audio, or other information. Embeddings are designed to make data consumable by ML models. However, storing embeddings presents a challenge to traditional databases. Vector databases are designed to solve this problem. Pinecone has developed on…
  continue reading
 
Building scalable software applications can be complex and typically requires dozens of different tools. The engineering often involves handling many arcane tasks that are distant from actual application logic. In addition, a lack of a cohesive model for building applications can lead to substantial engineering costs. Nathan Marz is the creator of …
  continue reading
 
Vespa is a fully featured search engine and vector database, and it has integrated ML model inference. The project open sourced in 2017, and since then has grown to become a prominent platform for applying AI to big data sets at serving time. Vespa began as a project to solve Yahoo’s use cases in search, recommendation, and ad serving. The company …
  continue reading
 
SurrealDB is the result of a long-time collaboration between brothers Tobie and Jaime Morgan Hitchcock. The project has modest origins and started merely to support other projects the brothers were working on. However, over time the project grew and in 2021 they started working on it full-time. Since then the project has gained serious adoption. Wh…
  continue reading
 
GitHub Copilot is an AI tool developed by GitHub and OpenAI to assist software developers by autocompleting code. Copilot kicked off a revolution in software engineering, and AI assistants are now considered essential tools to many developers. Joseph Katsioloudes is a cyber security specialist and works at the GitHub Security Lab. He joins the show…
  continue reading
 
Machine learning model research requires running expensive, long-running experiments where even a slight mis-calibration can cost millions of dollars in underutilized compute resources. Once trained, model deployment, production monitoring, and observability requirements all present unique operational challenges. Chris Van Pelt is the Chief Informa…
  continue reading
 
Maritime logistics is the process organizing the movement of goods across the ocean. Historically, this has been a challenging problem because of the multinational nature of shipping, as well as piracy, smuggling, and legacy technology. It’s also profoundly important for security reasons, and because 90% of what we buy travels over the oceans. Ocea…
  continue reading
 
Hugging Face was founded in 2016 and has grown to become one of the most prominent ML platforms. It’s commonly used to develop and disseminate state-of-the-art ML models and is a central hub for researchers and developers. Sayak Paul is a Machine Learning Engineer at Hugging Face and a Google Developer Expert. He joins the show today to talk about …
  continue reading
 
Data breaches at major companies are so now common that they hardly make the news. The Wikipedia page on data breaches lists over 350 between 2004 and 2023. The Equifax breach in 2017 was especially notable because over 160 million records were leaked, and much of the data was acquired by Equifax without individuals’ knowledge or consent. Data brea…
  continue reading
 
If you’re a sports fan and like to track sports statistics and results, you’ve probably heard of Sofascore. The website started in 2010 and ran on a modest single server. It now has 25 million monthly active users, covers 20 different sports, 11,000 leagues and tournaments, and is available in over 30 languages. Josip Stuhli has been with Sofascore…
  continue reading
 
Cloud-based software development platforms such as GitHub Codespaces continue to grow in popularity. These platforms are attractive to enterprise organizations because they can be managed centrally with security controls. However, many, if not most, developers prefer a local IDE. Daytona is aiming to bridge that gap. It’s a layer between a local ID…
  continue reading
 
Cloud-based software development platforms such as GitHub Codespaces continue to grow in popularity. These platforms are attractive to enterprise organizations because they can be managed centrally with security controls. However, many, if not most, developers prefer a local IDE. Daytona is aiming to bridge that gap. It’s a layer between a local ID…
  continue reading
 
Knowledge graphs are an intuitive way to define relationships between objects, events, situations, and concepts. Their ability to encode this information makes them an attractive database paradigm. Hume is a graph-based analysis solution developed by GraphAware. It represents data as a network of interconnected entities and provides analysis capabi…
  continue reading
 
Knowledge graphs are an intuitive way to define relationships between objects, events, situations, and concepts. Their ability to encode this information makes them an attractive database paradigm. Hume is a graph-based analysis solution developed by GraphAware. It represents data as a network of interconnected entities and provides analysis capabi…
  continue reading
 
Observability software helps teams to actively monitor and debug their systems, and these tools are increasingly vital in DevOps. However, it’s not uncommon for the volume of observability data to exceed the amount of actual business data. This creates two challenges – how to analyze the large stream of observability data, and how to keep down the …
  continue reading
 
Speech technology has been around for a long time, but in the last 12 months it’s undergone a quantum leap. New speech synthesis models are able to produce speech that’s often indistinguishable from real speech. I’m sure many listeners have heard deep fakes where computer speech perfectly mimics the voice of famous actors or public figures. A major…
  continue reading
 
If you’re a developer, there’s a good chance you’ve experimented with coding assistants like GitHub Copilot. Many developers have even fully integrated these tools into their workflows. One way these tools accelerate development is by autocompleting entire blocks of code. The AI achieves this by having awareness of the surrounding code. It understa…
  continue reading
 
The importance of data teams is undeniable. Most companies today use data to drive decision-making on anything from software feature development to product strategy, hiring and marketing. In some companies data is the product, which can make data teams even more vital. But there’s a common problem – analyzing data is hard and time consuming. Lots o…
  continue reading
 
Today it’s estimated there are over 1 billion websites on the internet. Much of this content is optimized to be viewed by human eyes, not consumed by machines. However, creating systems to automatically parse and structure the web greatly extends its utility, and paves the way for innovative solutions and applications. The industry of web scraping …
  continue reading
 
There are hundreds of observability companies out there, and many ways to think about observability, such as application performance monitoring, server monitoring, and tracing. In a production application, multiple tools are often needed to get proper visibility on the application. This creates some challenges. Applications can produce lots of diff…
  continue reading
 
It’s now clear that the adoption of AI will continue to increase, with nearly every industry working to rapidly incorporate it into their systems and applications to provide greater value to their users. Business analytics is a key domain that promises to be radically reshaped by AI. Alembic is an AI platform that integrates web data, product conve…
  continue reading
 
When StackOverflow launched in 2008 it lowered the barrier to writing complex software. It solved the longstanding problem of accessing accurate and reliable programming knowledge by offering a collaborative space where programmers could ask questions, share insights, and receive high-quality answers from a community of experts. Generative AI has i…
  continue reading
 
ScyllaDB is a fast and highly scalable NoSQL database designed to provide predictable performance at a massive cloud scale. It can handle millions of operations per second at a scale of gigabytes or petabytes. It’s also designed to be compatible with Cassandra and DynamoDB APIs. Scylla is used by Zillow, Comcast, and for Discord’s 350M+ users, and …
  continue reading
 
AI-assisted software delivery refers to the utilization of artificial intelligence to assist, enhance, or automate various phases of the software development lifecycle. AI can be used in numerous aspects of software development, from requirements gathering to code generation to testing and monitoring. The overarching aim is to streamline software d…
  continue reading
 
Database caching is a fundamental challenge in database management and there are hundreds of techniques to satisfy different caching scenarios. PolyScale is a fully automated database cache. It offers an innovative approach to database caching, leveraging AI and automated configuration to simplify the process of determining what should and should n…
  continue reading
 
Generative pre-trained transformer models, or GPT models, have countless applications and are being rapidly deployed across a wide range of domains. However, using GPT models without appropriate safeguards can lead to leakage of sensitive data. This concern underscores the critical need for privacy and data protection. Skyflow LLM Privacy Vault pre…
  continue reading
 
Companies have high hopes for Machine learning and AI to support real-time product offerings, prevent fraud and drive innovation. But there was a catch – training models require labeled data that machines can digest. As data volumes increase, the opportunity to get great ML results rises, but so does the problem of labeling all the data to get that…
  continue reading
 
RudderStack is a warehouse-native customer data platform (CDP) that helps businesses collect, unify, and activate customer data from all their different sources. In today’s episode, we’re talking to Soumyadeb Mitra, the founder and CEO of RudderStack. We discuss the importance of activating all your data, how RudderStack can help you activate your …
  continue reading
 
The state of Data inside most companies is chaotic. It takes significant time and investment to tame this chaos. When you are a platform provider you are gathering tons of data from the developers using your platform. These developers building products on your platform need insight into that data to better understand how their application is perfor…
  continue reading
 
As companies depend more on data to improve digital products and make informed decisions, it’s crucial that the data they use be accurate and reliable. MonteCarlo, the data reliability company, is the creator of the industry’s first end-to-end data observability platform. Barr Moses and Lior Gavish are the founders of Monte Carlo and they join us t…
  continue reading
 
In this podcast episode, we take a look at the intricacies of low-code data pipelines with Raj Bains, the founder of Prophecy.io. Raj shares valuable insights into how performant low-codedata pipelines are revolutionizing industries and transforming everyday operations. Raj discusses the founding story of Prophecy.io, the company’s mission, and its…
  continue reading
 
Chroma is an open source embedding database that is designed to make it easy to build large language model applications by making knowledge, facts and skills pluggable. Anton Troynikov is the co-founder of Chroma and he is our guest today. This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cl…
  continue reading
 
Data Activation is the method of unlocking the knowledge sorted within your data warehouse, and making it actionable by your business users in the end tools that they use every day. In doing so, Data Activation helps bring data people toward the center of the business, directly tying their work to business outcomes. Hightouch is the simplest and fa…
  continue reading
 
A data catalog provides an index into the data sets and schemas of a company.Data teams are growing in size, and more companies than ever have a data team, so the market for data catalog is larger than ever. Mark is the CEO of Stemma and the co-creator of Amundsen, a data catalog that came out of Lyft. In today’s show Mark shares how his history as…
  continue reading
 
Streaming analytics refers to the process of analyzing real-time data that is generated continuously and rapidly from various sources, such as sensors, applications, social media, and other internet-connected devices. Streaming analytics platforms enable organizations to extract business value from data in motion, similar to how traditional analyti…
  continue reading
 
Distributed databases are necessary for storing and managing data across multiple nodes in a network. They provide scalability, fault tolerance, improved performance, and cost savings. By distributing data across nodes, they allow for efficient processing of large amounts of data and redundancy against failures. They can also be used to store data …
  continue reading
 
DataSet is a log analytics platform provided by Sentinel One that helps DevOps, IT engineering, and security teams get answers from their data across all time periods, both live streaming and historical. It’s powered by a unique architecture that uses a massively parallel query engine to provide actionable insights from the data available. John Har…
  continue reading
 
There are many types of early stage funding available from friends and family to seed to series A. Some firms invest across a wide set of technologies and seek only to provide capital. Others are in it for the long haul – they focus on specific areas of technology and develop both long term relationships and deep expertise over time. Today, we are …
  continue reading
 
There are many types of early stage funding available from friends and family to seed to series A. Some firms invest across a wide set of technologies and seek only to provide capital. Others are in it for the long haul – they focus on specific areas of technology and develop both long term relationships and deep expertise over time. Today, we are …
  continue reading
 
ChatGPT is an artificial intelligence language model developed by OpenAI. It is part of the GPT (Generative Pre-trained Transformer) family of models, which are designed to generate human-like text based on input prompts. ChatGPT is specifically trained to carry out conversational tasks, such as answering questions, completing sentences, and engagi…
  continue reading
 
The Presto/Trino project makes distributed querying easier across a variety of data sources. As the need for machine learning and other high volume data applications has increased, the need for support, tooling, and cloud infrastructure for Presto/Trino has increased with it. Starburst helps your teams run fast queries on any data source. With Star…
  continue reading
 
Building and managing data-intensive applications has traditionally been costly and complex, and has placed an operational burden on developers to maintain as their organization scales. Todays’ developers, data scientists, and data engineers need a streamlined, single cloud data platform for building applications, pipelines, and machine learning mo…
  continue reading
 
Loading …

Quick Reference Guide