‪Nicely written 👌👌. In the past, when I’ve had to introduce binary search to students, I’ve used the same example of the books.‬
‪
It’s worth noticing that in real life, we (us as humans) don’t really do binary search, but randomised binary search instead, since it’s hard for us to pick the exact middle.
We could think of it as picking the book to check randomly. It’s not hard to prove that this way, the expected number of steps until having a single book is logarithmic wrt to the amount of books.
If we just do it uniformly, it guarantees a logarithmic expected number of steps anyways.
A rather informal explanation could be this:
We could think of the sequence to be divided in 3 contiguos pieces of length=1/3*N. Then, with probability=1/3 the picked element belongs to the middle piece, and we get rid of at least 1/3 of the elements.
Hence, with probability 1/3, from N elements we move to 2/3 * N elements, so N is reduced by a factor of 3/2.
Reducing N by a constant factor c > 1 with a constant probability p > 0 makes the expected number of steps to be logarithmic.
I’m not sure. It seems that in order to formulate it formally we need to include the answer as an argument of the state (something like T(n,s) being the time complexity of the state (n,s) where n is the cardinality of the search space, and s is the cardinality of the space containing only elements less than the answer). For sure running Monte-Carlos’s is convincing enough.
Let me know if you succeed in proving it with Master theorem.
In around 2006-2009 I was working on my own database engine. A Graph db before they were cool. For the prototype, I mostly only needed one real on-disk data structure in addition to the blobs for attribute values, and that was the b-tree. Some structures I was using are like B-trees of b-trees, or even b-trees of b-trees of b-trees, but still yes this concept is central to the whole shebang. I eventually migrated to using Lucene, and wrote my own query language and server for it, just in time to close my company down and get a real job. But it was indeed the binary search that was central to the whole thing. I still have a soft spot in my heart for the algorithm, and I see binary trees or the potential to build them, everywhere. When it comes to data, I see it in two ways now, a 'pay now' or 'pay later' for consuming it: you either sort your data now when you're about to store it, or spend time sorting it on the way out when you're fetching it, at least if it has comparable keys!
I only played with B-trees at college, but yes, they are incredible (and incredibly hard to code compared to AVLs or red-black trees or any other classic in-memory data structure). Now I'm teaching Algorithm Design and binary search appears everywhere, in the middle of a DP or a flow algorithm. It's one of the most clever little ideas in CS, that's why it's so beautiful.
‪Nicely written 👌👌. In the past, when I’ve had to introduce binary search to students, I’ve used the same example of the books.‬
‪
It’s worth noticing that in real life, we (us as humans) don’t really do binary search, but randomised binary search instead, since it’s hard for us to pick the exact middle.
We could think of it as picking the book to check randomly. It’s not hard to prove that this way, the expected number of steps until having a single book is logarithmic wrt to the amount of books.
If we just do it uniformly, it guarantees a logarithmic expected number of steps anyways.
A rather informal explanation could be this:
We could think of the sequence to be divided in 3 contiguos pieces of length=1/3*N. Then, with probability=1/3 the picked element belongs to the middle piece, and we get rid of at least 1/3 of the elements.
Hence, with probability 1/3, from N elements we move to 2/3 * N elements, so N is reduced by a factor of 3/2.
Reducing N by a constant factor c > 1 with a constant probability p > 0 makes the expected number of steps to be logarithmic.
Can this be shown with the master theorem?
I’m not sure. It seems that in order to formulate it formally we need to include the answer as an argument of the state (something like T(n,s) being the time complexity of the state (n,s) where n is the cardinality of the search space, and s is the cardinality of the space containing only elements less than the answer). For sure running Monte-Carlos’s is convincing enough.
Let me know if you succeed in proving it with Master theorem.
Cool! When you say randomly, it's uniformly distributed or biased towards the median? Does the answer change the expected cost?
In around 2006-2009 I was working on my own database engine. A Graph db before they were cool. For the prototype, I mostly only needed one real on-disk data structure in addition to the blobs for attribute values, and that was the b-tree. Some structures I was using are like B-trees of b-trees, or even b-trees of b-trees of b-trees, but still yes this concept is central to the whole shebang. I eventually migrated to using Lucene, and wrote my own query language and server for it, just in time to close my company down and get a real job. But it was indeed the binary search that was central to the whole thing. I still have a soft spot in my heart for the algorithm, and I see binary trees or the potential to build them, everywhere. When it comes to data, I see it in two ways now, a 'pay now' or 'pay later' for consuming it: you either sort your data now when you're about to store it, or spend time sorting it on the way out when you're fetching it, at least if it has comparable keys!
I only played with B-trees at college, but yes, they are incredible (and incredibly hard to code compared to AVLs or red-black trees or any other classic in-memory data structure). Now I'm teaching Algorithm Design and binary search appears everywhere, in the middle of a DP or a flow algorithm. It's one of the most clever little ideas in CS, that's why it's so beautiful.
Beautiful piece Alejandro!
I am deeply curious to know though if you could talk about how search engines such as Google's search and sorting systems work.