Artwork

Nội dung được cung cấp bởi Hussein Nasser. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Hussein Nasser hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
Player FM - Ứng dụng Podcast
Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !

The cost rolling back transactions (postgres/mysql)

9:25
 
Chia sẻ
 

Manage episode 305108565 series 1954062
Nội dung được cung cấp bởi Hussein Nasser. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Hussein Nasser hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
The cost of a long-running update transaction that eventually failed in Postgres (or any other database for that matter. In Postgres, any DML transaction touching a row creates a new version of that row. if the row is referenced in indexes, those need to be updated with the new tuple id as well. There are exceptions with optimization such as heap only tuples (HOT) where the index doesn’t need to be updated but that doesn’t always happens. If the transaction rolls back, then the new row versions created by this transaction (millions in my case) are now invalid and should NOT be read by any new transaction. You have two solutions to address this, do you clean all dead rows eagerly on transaction rollback? Or do you do it lazily as a post process? Postgres does the lazy approach, a command called vacuum which is called periodically Postgres attempts to remove those dead rows and free up space in the page. Whats the harm of leaving those dead rows in? Its not really correctness issues at all, in fact transactions know not to read those dead rows by checking the state of the transaction that created them. This is however expensive, the check to see of the transaction that created this row is committed or rolled-back. Also the fact that those dead rows live in disk pages with alive rows makes an IO not efficient as the database has to filter out dead rows. For example, a page may have contained 1000 rows, but only 1 live row and 999 dead rows, the database will make that IO but only will get a single row of it. Repeat that and you end up making more IOs. More IOs = slower performance. Other databases do the eager approach and won’t let you even start the database before rolling back is successfully complete, using undo logs. Which one is right and which one is wrong? Here is the fun part! Nothing is wrong or right, its all decisions that we engineers make. Its all fundamentals. Its up to you to understand and pick. Anything can work. You can make anything work if you know what you are dealing with. If you want to learn about the fundamentals of databases and demystify it check out my udemy course https://database.husseinnasser.com
  continue reading

512 tập

Artwork
iconChia sẻ
 
Manage episode 305108565 series 1954062
Nội dung được cung cấp bởi Hussein Nasser. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được Hussein Nasser hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
The cost of a long-running update transaction that eventually failed in Postgres (or any other database for that matter. In Postgres, any DML transaction touching a row creates a new version of that row. if the row is referenced in indexes, those need to be updated with the new tuple id as well. There are exceptions with optimization such as heap only tuples (HOT) where the index doesn’t need to be updated but that doesn’t always happens. If the transaction rolls back, then the new row versions created by this transaction (millions in my case) are now invalid and should NOT be read by any new transaction. You have two solutions to address this, do you clean all dead rows eagerly on transaction rollback? Or do you do it lazily as a post process? Postgres does the lazy approach, a command called vacuum which is called periodically Postgres attempts to remove those dead rows and free up space in the page. Whats the harm of leaving those dead rows in? Its not really correctness issues at all, in fact transactions know not to read those dead rows by checking the state of the transaction that created them. This is however expensive, the check to see of the transaction that created this row is committed or rolled-back. Also the fact that those dead rows live in disk pages with alive rows makes an IO not efficient as the database has to filter out dead rows. For example, a page may have contained 1000 rows, but only 1 live row and 999 dead rows, the database will make that IO but only will get a single row of it. Repeat that and you end up making more IOs. More IOs = slower performance. Other databases do the eager approach and won’t let you even start the database before rolling back is successfully complete, using undo logs. Which one is right and which one is wrong? Here is the fun part! Nothing is wrong or right, its all decisions that we engineers make. Its all fundamentals. Its up to you to understand and pick. Anything can work. You can make anything work if you know what you are dealing with. If you want to learn about the fundamentals of databases and demystify it check out my udemy course https://database.husseinnasser.com
  continue reading

512 tập

Tất cả các tập

×
 
Loading …

Chào mừng bạn đến với Player FM!

Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.

 

Hướng dẫn sử dụng nhanh