Design and Analysis of Algorithms

MIT 6.046

Lecture notes. All notes + your marks in one place here

Lecture videos

Overview of course:

  • Divide and Conquer (Merge Sort classic example, Fast Fourier transform, algorithm for convex hull, van Emde Boas trees)
  • Amortized analysis, random algos, quick select,quick sort skip lists, perfect and universal hashing
  • Optimization (greedy (Dijkstra), dynamic programming, shortest paths)
  • Network Flow (network, capacities, max flow - min cut problem)
  • Linear Programming
  • Intractability (polynomial vs exponential time problems, approximation problems)
  • Synchronous, Asynchronous algos, Cryptography, RSA
  • Cache, External Memory Model, Cache Oblivious Model, Complexity in terms of memory transfers

Lecture 1. Interval Scheduling

Very similar problems can have very different complexity. Wiki problem definition.

Polynomial: Shortest path between two vertices. O(V**2)

NP: Determine if a graph has Hamiltonian cycle: Given a directed graph find a simple cycle that contain each vertex V.

NP-complete are the hardest problems in NP. Solve a NP-complete in polynomial time, then you can solve all NP problems.


Given an array of intervals [(s1,f1), (s2,f2) ... ] - left closed, right opened. Select the maximum number of non-overlapping intervals. This problem should not be confused with meeting rooms problem where we find minimum conference rooms needed (time with max overlap). The latter is line sweep.

Claim We can solve this problem using a greedy algorithm.

Definition A greedy algorithm is a myopic algorithm that processes the input one piece at a time with no apparent look ahead.

Non-working greedy heuristics:

  • pick shortest intervals
  • pick intervals with least amount of overlaps. Counter example:
---- ---- ---- ----
  ----- ---- ----
  -----      ----
  -----      ----
  -----      ----

Working greedy heuristic:

  • pick earliest finish time

"Proof by intimidation, proof because the lecturer said so"

Proof by induction

Claim: Given a list of intervals L, greedy algorithm with earliest finish time produces k* intervals, where k* is maximum

Induction on k*! Induction on the number of optimal intervals.

  1. Base case k* = 1. Any interval can be picked up.
  2. Suppose claim holds for k* and we are given a list of intervals whose optimal schedule has k*+1 intervals (s1,f1),...(sk+1,fk+1)(s_1,f_1), ... (s_{k*+1},f_{k*+1})

Run our greedy algo and get intervals (a1,b1),...(ak,bk)(a_1,b_1), ... (a_{k},b_{k}). kk and kk* are not comparable, yet. By construction b1<=f1b_1 <= f_1. Thus S=(a1,b1),...(sk+1,fk+1)S = (a_1,b_1), ... (s_{k*+1},f_{k*+1}) is another optimal solution of size k+1k*+1. Let LL' be the set of intervals where si>b2s_i > b_2. SS is optimal for LL then S=(s2,f2)...(sk+1,fk+1)S' = (s_2,f_2)... (s_{k*+1},f_{k*+1}) is optimal solution of LL' and has size kk. By initial hypothesis we run greedy on LL' and produce (a2,b2),...(ak,bk)(a_2,b_2), ... (a_{k},b_{k}) of size k1k-1. Then k1=kk-1 = k* and proves the when we run greedy and get (a1,b1),...(ak,bk)(a_1,b_1), ... (a_{k},b_{k}) is also optimal solution.


Problem. Weighted interval scheduling. Each interval has a weight wiw_i. Find a schedule with maximum total weight.

Greedy does not work here, need to use DP

O(n2),O(nlogn)O(n^2), O(nlogn)

dp_weighted_intervals.png

dp_weighted_intervals_2.png

  1. subproblem space:jobs[i:], dp(i) is max profit given jobs[i:]
  • need NOT neccessarily include job[i]
  • note need to sort by start/end time
  • need to sort by start time and define subproblem jobs[i:] or sort by end time and define subproblem to be s[:i]
  1. O(n**2) is easy
  2. improve to O(nlogn) with BS

`

CODE
# O(n^2)
class Solution:
    def jobScheduling(self, start: List[int], end: List[int], profit: List[int]) -> int:
        @cache
        def dp(i):
            if i == len(jobs): return 0
            val = max(dp(i+1),jobs[i][2])
            for j in range(i+1,len(jobs)):
                if jobs[i][1] <= jobs[j][0]:
                    val = max(val,dp(j)+jobs[i][2])
            return val
        jobs = [(s,e,p) for s,e,p in zip(start,end,profit)]
        jobs.sort(key=lambda x:x[0])
        return dp(0)

# O(nlog(n))
class Solution:
    def jobScheduling(self, start: List[int], end: List[int], profit: List[int]) -> int:
        def bs(l,r,num):
            while l<r:
                m = l+r>>1
                if jobs[m][0]>=num:
                    r = m
                else:
                    l = m+1
            return l
        @cache
        def dp(i):
            if i == len(jobs): return 0
            j = bs(i+1,len(jobs),jobs[i][1])
            return max(dp(i+1),dp(j)+jobs[i][2])
        jobs = [(s,e,p) for s,e,p in zip(start,end,profit)]
        jobs.sort(key=lambda x:x[0])
        return dp(0)
    
# O(nlog(n))
class Solution:
    def jobScheduling(self, start: List[int], end: List[int], profit: List[int]) -> int:
        def bs(l,r,num):
            while l<r:
                m = l+r>>1
                if jobs[m][1]>num:
                    r=m
                else:
                    l=m+1
            return l-1
        @cache
        def dp(j):
            if j < 0: return 0
            i = bs(0,j,jobs[j][0])
            return max(dp(j-1),dp(i)+jobs[j][2])
        
        jobs = [(s,e,p) for s,e,p in zip(start,end,profit)]
        jobs.sort(key=lambda x:x[1])
        return dp(len(jobs)-1)


NP-complete problem. Generalization of the problem considers k>1k > 1 machines/resources.Here the goal is to find kk compatible subsets whose union is the largest. First example had k = 1.

In an upgraded version of the problem, the intervals are partitioned into groups. A subset of intervals is compatible if no two intervals overlap.

Lecture 2. Divide & Conquer: Convex Hull, Median Finding

Divide and Conquer Paradigm: Given a problem of size nn:

  1. Divide it into aa subproblems of size n/bn/b where b>1b > 1 so that your subproblems have smaller size.
  2. Solve each subproblem recursively
  3. Combine solutions of subproblems into overall solution (this is where the smarts of the algo is).

Run-time: T(n)=aT(n/b)+MergeWorkT(n) = aT(n/b) + MergeWork. Often to compute T(n)T(n) you can use Master Theorem.

master_theorem.png


Convex Hull Problem

Given nn points in a plane S={(xi,yi)i=1,2,...,n}S = \{ (x_i,y_i) | i = 1, 2, ..., n\} assume no two have the same xx coordinate and no two have the same yy coordinate and no three line on the same line. The convex hull is the smallest polygon which contains all points in SS.

convex_hull.png

Simple algorithm:

  • For each pair of points draw a line.
  • This line separates the plane in two half-planes.
  • Check if all points lie in one side on the half-plane. If yes, this line is part of the convex hull (we call it a segment of the convex hull), otherwise it is not a segment.

Run-time: O(n2)O(n^2) pairs of points, O(n)O(n) to test =O(n3)= O(n^3) runtime.

Divide and Conquer algo:

  1. Sort the points by x coordinates.
  2. For input set SS, divide into left half AA and right half BB by xx coordinates.
  3. Compute convex hull for AA and for BB.
  4. Combine solutions.

Obvious algorithm looks at all ai,bja_i, b_j pairs and takes T(n)=2T(n/2)+O(n2)=O(n2)T(n) = 2T(n/2) + O(n^2) = O(n^2) runtime

merge_convex_hulls.png

Find upper, and lower tangent using two finger and a string algorithm. Compute intercept of the vertical line and all (a,b)(a,b) points. Algo demo.

Merge step: After you find the tangents (ai,bj)(a_i,b_j) and (ak,bm)(a_k,b_m) you link aia_i to bjb_j go down all bsb-s till you see bmb_m, then link to aka_k then go up through all asa-s till you see aia_i.

Run time

T(n)=2T(n/2)+O(n)=O(nlogn)T(n) = 2*T(n/2) + O(n) = O(nlogn)

Other Convex Hull solutions:


Median Finding Algorithm

It is just quick-select algo. Randomized pivot point has O(n)O(n) expected run time and O(n2)O(n^2) worse case. Smart pivot point is a deterministic algo (groups of 5 elements, median of medians) has O(n)O(n) worse case time. See CLRS for detailed algo (deterministic) O(n)O(n) time to find median.

Lecture 3. Divide & Conquer: Fast Fourier Transform

Polynomial of order n1n-1 A(x)=a0+a1x+a2x2+...+an1xn1A(x) = a_0 + a_1x + a_2x^2 + ... + a_{n-1}x^{n-1}

Operations on polynomial:

  • evaluation A(x0)=?A(x_0) = ?. Use Horner's rule to do it in O(n)O(n). A(x)=ao+x(a1+x(a2+...(xan1)))A(x) = a_o + x(a_1 + x(a_2 + ... (xa_{n-1}))) - nn multiplications and additions
  • addition O(n)O(n)
  • multiplication A(x)B(x)=C(x)A(x)*B(x) = C(x), The coefficients of CC are ck=j=0kajbkjc_k = \sum_{j = 0}^{k} a_jb_{k-j} and that takes O(k)O(k) time, so in total brute force multiplication of polynomials takes O(n2)O(n^2) run time.

Polynomial multiplication is the same as doing convolution between vectors.

u = [1 1 1];
v =  [1 1 0 0 0 1 1];
1  1  1 -> 1
   1  1 1 -> 2
      1 1 1 -> 2
conv(u,v) =  [1     2     2     1     0     1     2     2     1]

uu moves as a filter through vv. Think convolutional neural networks! In polynomial multiplication a=u,b=va = u, b = v

Polynomial representation

  • coefficient vector = (a0,a1...an1)(a_0,a_1 ... a_{n-1})
  • roots r0,r1,...,rnr_0, r_1, ..., r_n (allowing multiplicity). A(x)=c(xr0)(xr1)..(xrn1)A(x) = c(x-r_0)(x-r_1)..(x-r_{n-1})
  • samples/points (x_k,y_k) for k=0,1..,n1k = 0, 1 .. ,n-1 for A(xk)=ykA(x_k) = y_k has unique solution by the Fundamental Theorem in Algebra

By FTA nrootsn-roots allowing multiplicity define uniquely the polynomial.

root representation is not good. Hard to go from polynomial to root representation. Addition is also very hard. Multiplication is easy.

poly_ops.png

No representation is perfect! Our aim is to convert from coefficient to sample representation in O(nlogn)O(nlogn) time using Fourier transform.

Coefficient repr -> Samples repr can be done in O(n2)O(n^2) trivially

matrix_view.png

  • VAV * A gets you the sample representation on O(n2)O(n^2)
  • Samples to coefficients (VV and yy are known how would you find aa):
    • Gaussian elimination O(n3)O(n^3)
    • Multiply by inverse V1yV^{-1} * y - O(n2)O(n^2) computing inverse once and use for free.

Divide & Conquer Algorithm

We have coefficient representation and want to get samples in O(nlogn)O(nlogn) time.

A(x)=(a0,a1,a2...an1)A(x) = (a_0,a_1,a_2...a_{n-1})

  1. Divide into even and odd coefficients.

Aeven=k=0n/2a2kxk=(a0,a2...)A_{even} = \sum_{k=0}^{n/2} a_{2k}x^{k} = (a_0,a_2...)

Aodd=k=0n/2a2k+1xk=(a1,a3...)A_{odd} = \sum_{k=0}^{n/2} a_{2k+1}x^{k} = (a_1,a_3...)

  1. Conquer: Recursively compute Aeven(z)A_{even}(z) and Aodd(z)A_{odd}(z) for z{x2:xX}z \in \{x^2: x\in X\}

  2. Combine: A(x)=Aeven(x2)+xAodd(x2)A(x) = A_{even}(x^2) + xA_{odd}(x^2)

Run time

T(n,X)=2T(n/2,X)+O(n+X))T(n,|X|) = 2*T(n/2,|X|) + O(n+|X|))

dc_FFT.png

To get collapsing set XX:

roots_of_unity.png

dbabichev FFT implementation.

np.convolve is much slower as it does not use FFT. scipy has convolve function and uses FFT.

Lecture 4. Divide & Conquer: van Emde Boas Trees

Emde Boas Tree.

Goal: Maintain nn elements among {0,1,...u1}\{ 0, 1, ... u-1\} subject to insert, delete, successor (given a value I want to know the next greater value in the set). We know how to do all these in O(logn)O(logn) time using balance BST-s (AVL, Red-Black). Emde Boas Trees can doo these in O(loglog[u])O(loglog[u] ) which is an exponential improvement.

Intuition: Binary search on the levels of the tree where the number of levels is log(u)log(u)

Can we do better than O(log(n)))O(log(n))) and does not depend on the universe of numbers {0,1,...u1}\{ 0, 1, ... u-1\}? No.

In each of previous tree data structures, where we do stuff dynamically, at least one important operation took O(log(n)))O(log(n))) time, either worst case or amortized. In fact, because each of these data structures bases its decisions on comparing keys, the O(nlog(n)))O(nlog(n))) lower bound for sorting tells us that at least one operation will have to take O(log(n)))O(log(n))) time. Why? If we could perform both the INSERT and EXTRACT-MIN operations in o.lg n/ time, then we could sort nn keys in o(nlog(n)))o(nlog(n))) time by first performing nn INSERT operations, followed by nn EXTRACT-MIN operations.

First attempt to store an array of size nn with elements among the set {0,1,...u1}\{ 0, 1, ... u-1\}:

bit_vec.png

Insert and Delete are constant, Successor is linear.

Second attempt:

tree_emde.png

clusters.png

Think each cluster as a binary tree built bottom up using the OR operation. All roots of the different clusters will be a 'summary' vector. As this vector will tell you if there is a one in each of the clusters.

Insert is constant - need to change the value in the clusters and mark in the summary vector.

Successor is:

successor.png

We have not yet improved the runtime complexity. Van Emde Boas is to create clusters which are recursive and depend on other clusters.

emde_recurse.png

Code implementation details look at the lecture and your notes here.

Lecture 5: Amortization

So far learning fancy cool data structures. This lecture is on fancy cool techniques of computing complexity of data structures.

Amortized analysis is used to compute complexity of data structures. It computes the total run time of a data structure given nn executed operations. Amortized analysis is not used in computing run time of algorithms.

Table doubling

That's how dynamic arrays work in Python.

Assume a table of size mm. We will double the size of the table whenever it is full. For nn insertions the total running time is:

O(20+21+...2log(n))=O(n)O(2^0 + 2^1 + ... 2^{log(n)}) = O(n). Thus amortized (i.e.) per insertion it is O(n)/n=O(1)O(n)/n = O(1).

Techniques of amortized analysis:

  • aggregate method
  • accounting method
  • potential method

All these method give upper bounds to the actual cost of all operations = amortized cost.

Aggregate method

This is what we did in table doubling. Added total cost of nn operations then divided by nn. NB: Often when you have a data structure with total of nn insert and delete operations you can just think of nn insert operations because each element can be deleted at most the number of times it has been inserted.

Accounting method

Store credit bank account which should always be non-negative. When you do an operation you pay for the operation and can deposit money in you account.

Example. Table Doubling.

  • If insertion does not trigger table doubling: I would pay 1 coin for the insertion plus c coin deposit.

  • If insertion trigger table doubling: I can pay from the bank account to cover the table doubling.

  • amortized cost for table doubling when the table becomes of size 2n2*n is O(n)cn/2=O(1)O(n) - c * n/2 = O(1) for chosen large enough cc. When table doubles from nn to 2n2n, only last n/2n/2 places have coins.

  • amortized runtime per insertion 1+c=O(1)1+c = O(1)

Aside. Table expansion and contraction have insert and delete operations in O(1)O(1) amortized if:

  • table doubling when table is full
  • table contraction by a halve when there are m/4m/4 elements in the table of size mm

To prove this is indeed O(1)O(1) amortized you need the Potential method.

Potential method

Define a potential (energy) function Φ\Phi which maps each data structure DiD_i to a nonnegative integer. Check CLRS for more details and proof of Table doubling and halving is O(1)O(1).

Lecture 6: Randomized Algorithms

Randomized algorithm is one that generates a random number rr and makes decisions on rr's value.

On same input and on different execution randomized algos may:

  • run a different number of steps
  • produce different output

Two types of random algos:

  • Monte Carlo - runs in polynomial time always output is correct with high probability
  • Las Vegas - runs in expected polynomial time output is always correct

Monte Carlo = probably correct Las Vegas = probably fast

Monte Carlo is for estimation stuff to get almost correct values.

Las Vegas example is quick sort - O(nlog(n))O(nlog(n)) expected time. 'Almost sorted' does not make sense.

Problem. Matrix Product Checker

Given nxnxn$ matrices A,B,CA, B, C the goal is to check if AA x B=CB=C. We use randomized algo so that we do not checkout the full multiplication.

Freivalds' algorithm

Procedure:

  1. Generate an n×1n × 1 random 0/10/1 vector rr .
  2. Compute P=A×(Br)CrP = A × (Br) - Cr
  3. Output "Yes" if P=(0,0,,0)P = ( 0 , 0 , … , 0 ); "No," otherwise.

If A×B=CA × B = C , then the algorithm always returns "Yes". If A×BCA × B \neq C then the probability that the algorithm returns "Yes" is less than or equal to one half.

By iterating the algorithm kk times and returning "Yes" only if all iterations yield "Yes", a runtime of O(kn2)O(k n^2) and error probability of <=1/2k<= 1/2^k is achieved.

Proof that P(P(false negatives)1/2) \leq 1/2. Idea is for every bad rr which does not catch that ABC0AB - C \neq 0 we create a good rr and have 1-1 mapping. See your notes.

Paranoid Quick Sort

paranoid_quick_sort.png

paranoid_qs_analysis.png

Paranoid Quick sort is probably fast with expected running time O(nlog(n))O(nlog(n)).

Lecture 7 Randomization: Skip Lists

A skip list a probabilistic data structure that allows O(log(n))O(log(n)) search, insert, delete within a set of nn elements. It maintains a linked list hierarchy of subsequences with each successive subsequence skipping over fewer elements than the previous one. In the example below we insert 80.

Comparing with treap and red-black tree which has the same function and performance, the code length of Skiplist can be comparatively short and the idea behind Skiplists is just simple linked lists.

skip_list.png

Design a skip list, leetcode.

Motivation for skip lists.

Step 1. Linked list - search is O(n)O(n)

Step 2. Sorted Linked list - search is still O(n)O(n)

Step 3. Two sorted Linked list, where the second one is a subsequence (express line) and skips elements.

Step 4. Add log(n)log(n) layers of linked lists

skip_list_motivation.png

  • Insert is randomized using a fair coin
  • Search and Delete are deterministic

Why skip lists are good?

Warm-up Lemma, The number of levels in nn-element skip list is O(log(n))O(log(n)) with high probability, that is as nn grows we the probability converges to 1. This is stronger than having expected running times stuff where you have no guarantees of how likely the worst case is.

With this lemma we can say thing like:

  • number of levels in the skip list is at most 2log(n)2log(n) with 90%90\% probability
  • at most 4log(n)4log(n) with 99.9%99.9\% probability etc

Proof. P(>=clog(n)levels)=P(P(>= clog(n) levels) = P(some element got >=clog(n)>=clog(n) promotions )=(1/2)clog(n)nnc=1nc1) = (1/2)^{clog(n)} \leq \dfrac{n}{n^c} = \dfrac{1}{n^{c-1}}

Search

Theorem. Any search in nn-element skip list costs O(log(n))O(log(n))

Steps:

  1. We track the search moves backwards
  2. Backwards search makes up and left moves each with probability 1/2
  3. Number of ups is less than number of levels c(log(n))\leq c(log(n)) with high probability
  4. The BUM: Total number of moves = number of coin flips until you get c(log(n))c(log(n)) heads (up moves)

Claim: Number of coin flips until we see c(log(n))c(log(n)) heads is O(log(n))O(log(n)) with high probability.

We need to proof that for to see clog(n)clog(n) heads, there exist a constant dd such that if YY is a random vairable counting the number of heads after dlog(n)dlog(n) coin flips then P(Y<clog(n))=1nαP(Y < clog(n)) = \dfrac{1}{n^{\alpha}}. Idea is to use Chernoff bound.

Lecture 8: Randomization: Universal & Perfect Hashing

Buzz words: Hashing with chaining, open addressing, load factor n/mn/m, Simple Uniform Hashing assumption, Hash functions, linear probing, quadratic probing, universal hashing, perfect hashing

Dictionary problem. Abstract Data Type (Math definition, interface):

  • maintain a dynamic set of items
  • each item has a key, item is a key value pair
  • insert(item)
  • delete(item)
  • search(key)

Goal is to run all 3 operations in O(1)O(1) expected time (amortized).

In MIT 6.001- Introduction to Algorithms you saw hashing with chaining and open addressing. you proved that the insert, delete and search take O(1+nm)O(1+\dfrac{n}{m}), where n is the number of elements you have in the table and m is the number of slots (table size, number of buckets). So as long as you chose m=O(n)m = O(n) (you can keep that dynamically using table doubling and shrinking) you would have O(1)O(1) operations.

However, you assumed that you have simple uniform hashing. That is your keys are mapped at each slot of the table with probability O(1m)O(\dfrac{1}{m}) So you had to choose a smart hashing function that would map your keys uniformly. However you want, a hashing function that work work well no matter what the keys are. That is in the worst case scenario for the keys you still want O(1)O(1) operations. Our analysis in MIT 6.001 considered average case scenario, where for random keys we would have simple uniform hashing.

Want to avoid the assumption that the keys are random.

Universal Hashing

Works for dynamic sets - allows insert and delete

  • choose hh randomly from a hash family HH.
  • assume HH ti be universal hash family:
    • for all keys k,kk,k': P(h(k)=h(k))1mP(h(k)=h(k')) \leq \dfrac{1}{m}, probability over choosing hh.

You can prove using indicator variables that E(E(number of keys hashing to the same place as ki)1+n/mk_i) \leq 1+n/m

Dot product hash family

  • assume mm m is prime
  • assume u=mru = m^r for integer rr
  • view key kk in base mm, k=(k0,k1,..kr1)k = (k_0,k_1, .. k_{r-1})
  • for a key a=(a0,..ar1)a = (a_0, .. a_{r-1}) define ha(k)=(a×k)modmh_a(k) = (a \times k) \mod m (dot product)

H={haa0...u1}H = \{h_a| a \in 0...u-1\}. To choose random hah_a choose a random aa.

Another universal hashing family:

universal_hashing.png

We achieved O(1)O(1) expected time for all operations.

Perfect Hashing

This works for static keys and support search only. It is perfect hashing because it achieves O(1)O(1) search worst case, that is keys are stored perfectly with no collisions.

  • polynomial build time with high probability
  • worst case O(1)O(1) run time
  • worst case memory O(n)O(n)

Idea: Use 2-level hashing.

2-level-hashing.png

Lecture 9: Augmentation. Range Trees

Easy Tree Augmentation.

The goal here is to store x.fx.f at each node xx, which is a function of the node, namely f(f(subtree rooted at x)x). If x.fx.f is computable locally from its children then updates take O(h)O(h) runtime where h is the height of the tree. To update x.fx.f, we need to walk up the tree to the root.

Order-statistic trees.

ADT (interface of the data structure):

  • insert(x), delete(x), successor(x)
  • rank(x)
  • select(i): find me the element of rank i

Want all of these in O(log(n))O(log(n))

We can implement the above ADT using easy tree augmentation on AVL trees (or 2-3 trees or any balance BST) to store subtree size: f(f(subtree)) = # of nodes in it.

rank_select.png

NB: Need to choose augmentation functions which can be maintained easier such as subtree size. Above we could thing to store the rank of each node (rank and select would be very easy). However if you insert elements it would be O(n)O(n), e.g. if you insert a minimum element you need to update all node ranks.

#TODO Finger Search Trees (this is tree augmentation on 2-3 threes)

Range Trees

Solves the problem orthogonal range search.

Suppose we have nn points in a dd-dimension space. We would like a data structure that supports range query on these points: find all the points in a give axis-aligned box. An axis-aligned box is simply an interval in 1D, a rectangle in 2D, and a cube in 3D.

Goal is to run in O(logd(n)+(O(log^d(n) + ( output size )))).

NB: Our data structure would be static and support only search points in box query.

1D case

We have array [a_1,a_2...a_n] and for a query search(x,y) we want to output all numbers in the array which are in the interval [x,y). Simple solution is to use sorted array and then do binary searches in O(log(n))O(log(n)).

Sorted arrays are inefficient for insertion and deletion. For a dynamic data structure that supports range queries such as AVL, Red-Black trees.

However, neither of the above approaches generalizes to high dimensions.

1D range trees

range_tree.png

That is like doing rightful rank for node aa and left rank for node bb.

Analysis. O(lgn)O(lg n) to implicitly represent the answer (showing just the roots). O(lgn+k)O(lg n + k) to output all k answers. O(lgn)O(lg n) to report k via subtree size augmentation.

range_tree_pic.png

2D range trees

Create a 1D range tree on the x coordinates. Do a search to find O(lg(n))O(lg(n)) nodes which satisfy the interval provided by the xx coordinate. Data Augmentation: for each of these trees we store another range tree on the y coordinate. So we have dictionary with keys being the nodes of the first range tree, and values is range tree by the y coordinate. There is lots of data duplication. Then you do a little search on the ycoordinaterangetreey-coordinate range tree. Run time O(log2(n))O(log^2(n))

Space complexity is O(nlgn)O(n lg n). The primary subtree is O(n)O(n). Each point is dupli­cated up to O(lgn)O(lg n) times in secondary subtrees, one per ancestor.

Aside

Range trees are used in database searches. For example if you have 4 columns in a database and you do searches like col1 should be inside one interval col2 in another interval, etc. range trees would allow fast queries. This is called indexing in database columns.

Lecture 10: Dynamic Programming. Advanced DP

We consider 3 problems:

  • longest palindromic subsequence
  • optimal binary search tree
  • alternating coin game

DP steps:

  1. Determine Subspace of problems
  2. Define Recursive relation based on optimal solutions of subproblem
  3. Compute value of an optimal solution (bottom-up, top down)
  4. Construct optimal solution from computed information (typically involves some backtracing)

Longest Palindromic Sequence Given a string ss find the longest subsequence which is a palindrome (not necessarily contiguous).

Solution `

CODE
# O(n**2)
def solve(s):
    @cache
    def dp(i,j):
      if i == j: return 1
      elif  i > j: return 0
      if s[i] == s[j]: return 2 + dp(i+1,j-1)
      return max(dp(i+1,j),dp(i,j-1))

Run time = Number of subproblems * Time per subproblem (assumes lookup is O(1)O(1))

If you use arrays for lookup you would have constant access, in hash tables you get it constant amortized (collisions).

Optimal Binary Search Trees

Given keys K1,K2...KnK_1, K_2 ... K_n, where K1<..<KnK_1 < .. < K_n WLOG Ki=iK_i = i. There are many different BST-s with these set of keys. We assign weights for each of these keys W1,W2...WnW_1,W_2 ... W_n. You can think that the weights are probabilities of searching each of the keys (search probabilities). Find/construct a BST that minimizes:

W×(\sum W \times ( depth (Ki)+1)(K_i) + 1).

The root has depth 0. Application: needs a structure which would minimize the expected search cost.

optimal_bst.png

Before doing DP, try greedy!

We choose the root KrK_r to be the key with largest weight. Then you know which nodes are on the left and which on the right. Continue do greedy approach in recursive fashion. But this does not work:

optimal_bst_greedy.png

DP solution

Subproblem space e(i,j)=e(i,j) = cost of optimal BST on Ki,Ki+1...KjK_i, K_{i+1} ... K_j

optimal_bst_dp.png

Alternating Coin Gamse

Row of nn coins of values V1,...,VnV_1, ... , V_n, where nn is even. In each turn, a player selects either the first or last coin from the row, removes it, and receives the value of the coin

The first player never looses (win or equal)! He can make it so that he pick all values on odd indices or on even indices, depending which sum gives the larger sum.

How to maximize the amount of money won assuming you move first?

Subproblem space: V(i,j)V(i,j) is the max value we can definitely win if it is our turn and only conis Vi..VjV_i .. V_j

coin_game.png

leetcode

Lecture 11. Dynamic Programming. All-Pairs Shortest Paths

Two types of shortest path problems:

  • single source (from one source ss find the shortest path to all vertices vVv \in V)
  • all pairs shortest path

The problem with one source and one destination cannot be solved faster than the single source shortest path, so when you solve it you would use the single source shortest path solutions such as Dijkstra and Bellman Ford.

single_source.png

Note that Dijkstra works only for non-negative weights. It is a type of greedy algorithm.

Bellman Ford works for general graphs. It is a DP algo.

APSP = All pairs shortest path

One way to solve this problem is to run VV times Dijkstra or Bellman Ford.

DP First attempt.

If your subproblem space is dp[u][v]dp[u][v] that is shortest path from uu to vv, then you would not have a DAG subproblem space! Enter an infinite recursion when trying to resolve your subproblems.

Natural improvement of subspace is dp[u][v][m] = $ weight of shortest path from $u to vv $\leq m $ edges.

Below are the 5 steps of DP thinking from Eric:

dp_shortest_path_1.png

`

CODE
# Bottom-up via relaxation steps
for m = 1 to n by 1
  for u in V
    for v in V
      for x in V # this is the min step
        if duv > dux + dxv # can put wxv too
            duv = dux + dxv

Runtime is O(V4)O(V^4), which is the same as running VV time Bellman Ford.

Matrix Multiplication

Given n×nn \times n matrices AA and BB compute their product C=A×BC = A \times B.

  • O(n3)O(n^3) standard algo
  • O(n^{2.807}) via Strassen

Matrix multiplication is the same as the above recurrence relation

cij=kaikb˙kjc_ij = \sum_k a_{ik} \dot b_{kj} is similar to duv=min(dux+w(x,v))d_{uv} = min(d_ux + w(x,v)) for xVx \in V

Define the summation operand to be a min, and the multiplication operand to be ++.

We can redefine the DP problem using Matrix multiplication language.

matrix_mult_short_path.png

The shortest distance matrix we want to compute DmD^m equals WmW^m where the powers is defined in circle land.

All pairs shortest path problem requires computing WnW^n in circle land. Single matrix multiplication is n3n^3, hence total complexity is O(V4)O(V^4) - same algo as above just expressed in another language. However with matrix multiplication you can use repeated squaring trick and get running time O(n3lgn)O(n^3lgn).

Floyd-Warshall

floyd_warshall.png

CODE
C = (w(u, v))
for k = 1 to n by 1
  for u in V
    for v in V
      if c uv > c uk + c kv
        c uv = c uk + c kv

Run time O(V3)O(V^3)

First loop on k, otherwise it will error. k is the phase count.

Jonhson's algorithm

Idea is to do graph re-weighting so that we have nonnegative weights and run Dijkstra. Shortest paths are preserved.

johnson.png

How to find a function hh?

You want to find hh which satisfies h(v)h(u)w(u,v)h(v) - h(u) \leq w(u,v) for all (u,v)V(u,v) \in V. This is called a system of difference constraints.

Theorem. If there is a negative-weight cycle, there there exist no solution to the above system.

Theorem. If (V,E,w)(V, E, w) has no negative-weight cycle, then we can find a solution to the difference constraints.

Proof by example. Add new vertex (source) ss and connect it to any other vertex and add 0 weights. Compute single source shortest path from ss and get dist[s][v]dist[s][v] for every vVv \in V. This is your function ff. Prooved by triangle inequality.

Time complexity

  1. The first step involves running Bellman-Ford from s, which takes O(VE)O(V E) time. We also pay a pre-processing cost to re-weight all the edges (O(E))(O(E)).

  2. We then run Dijkstra’s algorithm from each of the VV vertices in the graph; the total time complexity of this step is O(VE+V2lgV)O(V E + V 2 lg V )

  3. We then need to re-weight the shortest paths for each pair; this takes O(V2)O(V^2) time.

The total running time of this algorithm is O(VE+V2lgV)O(V E + V^2 lg V).

Lecture 12: Greedy Algorithms. Minimum Spanning Tree

  • Prim's algorithm
  • Kruskal's algorithm

Recall that a greedy algorithm repeatedly makes a locally best choice or decision, but ignores the effects of the future.

Minimum spanning tree problem.

A spanning tree of a graph GG is a subset of the edges of GG that form a tree and include all vertices of GG. Given an undirected graph G=(V,E)G = (V, E) and edge weights W:ERW : E → R, find a spanning tree TT of minimum weight sum w(e)\sum w(e). We take some edges of the graph, hit all vertices and minimize the weight sum.

Properties of a greedy algorithm:

  • Optimal Substructure: the optimal solution to a problem incorporates the optimal solution to subproblem(s)
  • Greedy choice property: locally optimal choices lead to a globally optimal solution

In DP you would do guessing, unlike greedy where you are greedy and take the best local option.

Lemma 1. If TT' is a minimum spanning tree of G/eG/e, then TeT' \cup {e} is an MST of GG.

Contract edge ee - idea. This is to combine two nodes into one and solve the smaller problem.

edge_contraction.png

The statement can be used as the basis for a dynamic programming algorithm, in which we guess an edge that belongs to the MST, retract the edge, and recurse. At the end, we decontract the edge and add e to the MST. The lemma proves correctness of the algo but it would be exponential. At each step you guess one random edge of all possible edges.

We need an intelligent way to choose edge ee.

Lemma 2 (Greedy-Choice Property for MST). For any cut (S,V S)(S, V \ S) in a graph G=(V,E,w)G = (V, E, w), any least-weight crossing edge e=u,ve = {u, v} with uSu \in S and vSv \in S is in some MST of GG.

This lema is your golden ticket to use greedy algo.

Prim's algorithm

It is Dijkstra like.

Idea is to start with one vertex ss. This is your initial cut ss vs the rest.

`

CODE
def dist(a,b):
    return abs(a[0]-b[0]) + abs(a[1]-b[1])
h = [(0,0)] # dist,s
edges = {s:float('inf') for s in range(len(points))}
edges[0] = 0
parent = {}
visited = set() # keeps track of your cut, in Dijkstra you do not need that
while h:
    d,s = heappop(h)
    visited.add(s)
    for u in range(len(points)):
        if u not in visited and dist(points[u],points[s]) < edges[u]:
            edges[u] = dist(points[u],points[s])
            parent[u] = s
            heappush(h,(edges[u],u))
# edges has the weights of the MST tree

run_time_prim.png

Runtime just like Dijkstra.

Kruskal's Algoritm

Kruskal constructs an MST by taking the globally lowest-weight edge and contracting it.

sort the edges in nondecreasing weights
for edge in edges:
     add edges consecutively to the DSU in this order (keep tree)

runtime_kruskal.png

Lecture 13: Incremental Improvement. Max Flow, Min Cut

All about Ford-Fulkerson max-flow algorithm and Max Flow - Min Cut Theorem.

Flow network

Definition. A flow network is a directed graph G=(V,E)G = (V, E) with two distinguished vertices: a source ss and a sink tt. Each edge (u,v)E(u, v) \in E has a nonnegative capacity c(u,v)c(u, v). If (u,v)E(u, v) ∉ E, then c(u,v)=0c(u, v) = 0.

flow_network.png

Maximum-flow problem Given a flow network GG, fund a flow of maximum vale on GG = max flow rate you can send from source to the sink.

Net Flow

A net flow on GG is a function f:V×V>Rf: V \times V -> R satisfying:

  • capacity constraint: for all u,vVu,v \in V: f(u,v)c(u,v)f(u,v) \leq c(u,v)
  • skew symmetry: for all u,vV:f(u,v)=f(v,u)u,v \in V: f(u,v) = -f(v,u)
  • flow conservation: for all uV{s,t}u \in V - \{s, t\}: f(u,v)=0\sum f(u,v) = 0

Definition The value of a flow ff, denoted by f|f| is given by f=vVf(s,v)=f(s,V)|f| = \sum_{v \in V} f(s,v) = f(s,V)

We use implicit summation notation. Example for writing flow conservation:

f(u,V)=0f(u,V) = 0 for all uV{s,t}u \in V - \{s,t\}.

Theorem The value of a flow satisfies: f=f(V,t)|f| = f(V,t), what goes out of the source equals what enters the sink. Proof.

f=f(s,V)=f(V,V)f(Vs,V)=0+f(V,Vs)=f(V,t)+f(V,Vst)=f(V,t)|f| = f(s,V) = f(V,V) - f(V-s,V) = 0 + f(V,V-s) = f(V,t) + f(V,V-s-t) = f(V,t) (last step by conservation law)

Cuts

Definition. A cut (S,T)(S, T) of a flow network G=(V,E)G = (V, E) is a partition of VV such that sSs ∈ S and tTt ∈ T. If ff is a flow on GG, then the flow across the cut is f(S,T)f(S, T).

cut.png

Lemma For any flow and any cut (S,T)(S,T) we have f=f(S,T)|f| = f(S,T) Proof f(S,T)=f(S,V)f(S,S)=f(S,V)=f(s,V)+f(Ss,V)=f(s,V)=ff(S,T) = f(S,V) - f(S,S) = f(S,V) = f(s,V) + f(S-s,V) = f(s,V) = |f|

You've got a flow, the flow of the value is the flow value of the cut, as long as you have the source in one place of the cut and the sink on the other part of the cut.

Note f(Ss,V)=0f(S-s,V) = 0 because SS does not contain tt and we use flow conservation.

Definition The capacity of a cut (S,T)(S,T) is c(S,T)c(S,T)

capacity_cut.png

Upper bound on the maximum flow value: Theorem. The value of any flow is bounded above by the capacity of any cut.

Residual network points you where in the network you have free capacities to put flow through. It has the same vertices as the original graph but different edges.

residual_network.png

Last two lines above say that your residual network might introduce extra edges which are not in the original network. Residual networks depend on the flow ff.

residual_network_example.png

Definition Any path from ss to tt in GfG_f is an augmenting path in GG with respect to ff. If you have an augmenting path your flow ff is not a maximum flow. The flow value can be increased along an augmenting path pp by c_f(p) = min(c_f(u,v)) $ for $(u,v) \in p

Your augmenting path tells you which edges in the original graph GG with your flow ff how you change the values on the augmenting path. augmenting_path.png

Lecture 14: Incremental Improvement. Matching

Max-flow, min-cut theorem

Theorem. The following are equivalent:

  1. f=c(S,T)|f| = c(S, T) for some cut (S,T)(S, T).
  2. f is a maximum flow.
  3. f admits no augmenting paths.

Ford-Fulkerson max-flow algo.

initialize f(u,v) = 9 for all u,v in V
while an augmenting path in G wrt f exists:
  do augment f by c_f(p)

To prove correctness of Ford-Fulkerson we need to prove that 3 implies 2.

We will rove the theorem by proving 3 implies 1, 1 implies 2 and 2 implies 3.

1 implies 2. fc(S,T)|f| \leq c(S,T) for any cut. The assumption that f=c(S,T)|f| = c(S,T) shows ff is a maximum flow as it cannot be increased.

2 implies 3. If there was an augmenting path then the flow value could be increased, hence contradicts that ff is a maximum flow.

1 implies 3 proof

Ford Fulkerson depends a lot on which augmenting paths you choose every time. Depending on the the order of your Edges, the DFS would choose different augmenting paths. Some of them would have residual flows which are super small and on each augmentation you would add very small flow.

If you do BFS augmenting path search (assuming each edge is of weight 1) then augmentation is proven to be O(VE)O(VE), hence the final run time of the algo is O(VE(V+E))

Baseball elimination

A team survives if it has the largest number of wins. Given a table of standings we want to know which teams still have a chance of surviving.

Check the notes for this example.

Lecture 15: Linear Programming. LP, reductions, Simplex.

You can formulate the max flow problem as an LP problem. LP is more general than Max Flow.

Linear programming (LP) is a method to achieve the optimum outcome under some requirements represented by linear relationships.

LP is polynomial time solvable. Integer LP is NP hard (that is we add extra constraint that the xx variables are integers).

In general, the standard form of LP consists of

• Variables: x=(x1,x2,...,xd)Tx = (x_1 , x_2 , . . . , x_d)^T

• Objective function: cxc · x

• Inequalities (constraints): AxbAx ≤ b, where AA is a n×dn × d matrix

and we maximize the objective function subject to the constraints and x0x ≥ 0.

The natural LP formulation of a problem may not result in the standard LP form. Do these transformations:

  • Minimize an objective function: Negate the coefficients and maximize.
  • Variable xj does not have a non-negativity constraint: Replace xjx_j with xjxjx'_j − x''_j, and xjx'_j , xj,xj0x'_j,x''_j ≥ 0.
  • Equality constraints: Split into two different constraints; x=bx = b into xb,xbx ≤ b, x ≥ b.
  • Greater than or equal to constraints: Negate the coefficients, and translate to less than or equal to constraint.

Linear Programming Duality

Gives you a certificate of optimality. If I get a solution of the LP problem, I can get a certificate that it is the optimal one (only if indeed it is the optimal one).

For every primal LP problem in the form of:

Maximize c · x
Subject to Ax ≤ b, x ≥ 0,

there exists an equivalent dual LP problem

Minimize b · y
Subject to A^T y ≥ c, y ≥ 0.

The max-flow min-cut theorem can be proven by formulating the max-flow problem as the primal LP problem.

Maximum Flow Given G(V,E)G(V, E), the capacity c(e)c(e) for each eEe \in E, the source ss, and the sink tt:

Maximize over vf(s,v)\sum_{\text{over v}}f(s, v)

Subject to f(u,v)=f(v,u)u,vVf(u, v) = −f (v, u) ∀u, v ∈ V skew symmetry

f(u,v)=0uV{s,t}f(u, v) = 0 ∀u ∈ V − \{s, t\} conservation

vVf(u,v)c(u,v)u,vVv∈V f (u, v) ≤ c(u, v) ∀u, v ∈ V capacity.

LP could be used for multi-commodity problems.

Shortest Paths

Given G(V,E)G(V, E), weights w(e)w(e) for each eEe ∈ E, and the source ss, we want to find the shortet paths from s to all vVv \in V, denoted d(v)d(v).

Maximize \sum_{v∈V}d(v)

Subject to d(v) − d(u) ≤ w(u,v) ∀u, v ∈ V -  triangular inequality

d(s) = 0.

Note the maximization above, so all distances don’t end up being zero. In the inequalities constrins I have minimim already.

LP Algorithms

  1. Simplex algorithm
  2. Ellipsoid algorithm
  3. Interior Point Method

Simplex The simplex algorithm works well in practice, but runs in ex­ ponential time in the worst case. Steps:

• Represent LP in “slack” form.

• Convert one slack form into an equivalent slack form, while likely increasing the value of the objective function, and ensuring that the value does not decrease.

• Repeat until the optimal solution becomes apparent.

Slackness is a measure of how tight are our constriants. See Lecture notes for example of algo.

At each step of the simplex algo, you increase the objective value, while maintaining correctness of constraints.

In general, simplex algorithm is guaranteed to converge in (n+m)(n+m) choose mm, iterations where nn is the number of variables, and n+mn + m is the number of constraints.

Lecture 16. Complexity: P, NP, NP-completeness, Reductions

P

P={problems solvable in polynomial times O(nO(1))}P = \{ \text{problems solvable in polynomial times } O(n^{O(1)}) \}

NP

$NP = $ {decision problems (answer is yes or no) solvable in nondeterministic polynomial time}

Nondeterministic refers to the fact that a solution can be guessed out of polynomially many options in O(1) time. If any guess is a YES instance, then the nondeterministic algorithm will make that guess. NP is biased towards YES.

P vs NP

Does being able to quickly recognize if a solution is correct, means that you can quickly solve the problem?

If yes, then P=NPP = NP

Sudoku is NP-complete when generalized to a n×nn × n grid.

youtube

|Example NP problem. 3SAT

SAT = satisfiability

AT is satisfiability problem - say you have Boolean expression written using only AND, OR, NOT, variables, and parentheses. The SAT problem is: given the expression, is there some assignment of TRUE and FALSE values to the variables that will make the entire expression true?

SAT3 problem is a special case of SAT problem, where Boolean expression should be divided to clauses,such that every clause contains of three literals.

Given a boolean formula of the form (x1x3x6not)(x2x3x7)...(x_1 ∨ x_3 ∨ x_{6}^{not} ) ∧ (x_2 ∨ x_3 ∨ x_7 ) ∧ . . . where is and and is or. Can you set the variables x1,x2...x_1, x_2... such that the boolean formula results in True (satisfiable).

This is NP problem beacuse:

  • guess x_1 = T or F
  • guess x_2 = T or F

If the answer to the SAT3 problem is YES, you can start guessing and then you can check in polynomial time (polynomial verification) if the answer from the guesses is a YES.

NP problems allow you to check the answers in polynomial time.

$NP = $ {decision problems with poly-size certificates and poly-time verifiers for YES outputs}

NP-complete

XX is NP-complete if XNPX \in NP \intersect NPNP-hard.

XX is NP-hard if every problem YNPY\in NP reduces to XX. X is NP-hard if it is at least as hard as all NP-problems.

p_np_line.png

How to prove X is NP-complete.

  1. Show XNPX \in NP (come up with polynomial verification)
  2. Show XNPhardX \in NP-hard by reducint from known NP-complete problem from Y to X.

Super Mario Brothers

We show that Super Mario Brothers is NP-hard by giving a reduction from 3SAT.

Dimensional Matching (3DM)

Definition 3. 3DM: Given disjoint sets X,YX, Y , and ZZ, each of nn elements and triples TX×Y×ZT ⊆ X × Y × Z is there a subset STS ⊆ T such that each element XYZ∈ X ∪ Y ∪ Z is in exactly one triplet sSs ∈ S?

3DM is NP. Given a certificate, which lists a candidate list of triples, a verifier can check that each triple belongs to T and every element of X ∪ Y ∪ Z is in one triple.

3DM is also NP-complete, via a reduction from 3SAT.

Subset Sum

Definition 4. Subset Sum Given nn integers A=a1,a2,...,anA = {a1 , a2 , . . . , an } and a target sum tt, is there a subset SAS ⊆ A such that

S=t\sum S = t

Lectures reduce this problem to 3DM and prove it is NP-complete and NP-weakly hard

Partition

Definition 4. Subset Sum Given nn integers A=a1,a2,...,anA = {a1 , a2 , . . . , an } and a target sum tt, is there a subset SAS ⊆ A such that

S=A/2\sum S = \sum A/2

reduce it to Subset Sum problem (harday direction) and prove it is NP-complete

Lecture 17. Complexity: Approximation Algorithms

Consider optimization problems. Instead of finding the right answer we approximate the optimal answer.

Definition. An algoithm for a problem of size nn has an approximation ratio ρ(n)\rho(n) if for any input, the algo produces a solution with cost cc s.t. max(C/Copt,C/Coptρ(n))max(C/C_{opt},C/C_{opt} \leq \rho(n)) where CoptC_{opt} is the cost of the optimal algorithm.

We take the max because the optimization problem can be maximization or minimization.

This says we are a factor of ρ(n)\rho(n) from an optimal solutions.

Definition. An approximation scheme that takes as input $eps > 0$ and produces a solution such that C=(1+\eps)CoptC = (1 + \eps)C_{opt} for any fixed \eps\eps, is a (1+\eps)(1 + \eps)-approximation algorithm.

O(n2/\eps)O(n^{2/\eps}) if PTAS is polynomial time approximation scheme.

O(n/\eps}) if FPTAS is fuly polynomial time approximation scheme.

Vertex Cover

For an undirected graph GG find a smallest subset of vertices such that all edges are covered. An edge is covered if one of its endpoints is in the subset of vertices. (NP-complete)

Heuristics:

  • pick maximum degree vertex
  • pick random edge (u,v)(u,v), then remove all incident edges to u,vu,v

vertex_cover_max_degree.png

  • pick random edge (u,v)(u,v), then remove all incident edges to u,vu,v is a factor of 2 apart from the optimal solution. All edges we pick are disjoint and if the number of edges we pick in the end is A, then the number of vertices is 2A.

Set Cover set_cover.png

Heuristic: pick subset which covers the most uncovered elements.

Algo:

Start by initializing the set PP to the empty set. While there are still ele­ments in X,4 pick the largest set $S_i and add ii to PP. Then remove all elements in SiS_i from XX and all other subsets SjS_j . Repeat until there are no more elements in XX.

Claim: This is a (ln(n)+1)(ln(n)+1)-approximation algorithm (where n=Xn = |X|).

Partition

Given a set of elements, split it into two subsets and minimize the max of the sums of the two subsets. This is an NP-Complete problem.

Partition problem

Multiway partition problem

Approximation algo.

Phase 1. For smaller subset of size m<nm<n find optimal solution using brute force O(2m)O(2^m). You get two subsets AA' and BB'.

Phase 2. For the rest of the elements add greedily one by one.

If m = (1/\eps(1/ \eps) then this is (1+\eps)(1+\eps)-approximation algo.

Lecture 18. Complexity: Fixed-Parameter Algorithms

Last 3 lectures:

Pick any two:

  1. solve hard problems
  2. solve them fast (poly-time)
  3. exact solution

Idea: Aim for exact algorithm, but confine exponential depedence to a parameter.

Parameter: A parameter is a nonnegative integer k(x) where x is the problem input. The parameter is a measure of how tough is the problem you are solving.

Parameterized Problem: A parameterized problem is simply the problem plus the parameter or the problem as seen with respect to the parameter.

Goal: Algo is polynomial in problem size nn, exponential in parameter kk.

kk-Vertex Cover Problem Given a graph G=(V,E)G = (V,E) and non-negative integer kk. Question: is there a vertex cover SVS\in V of size not greater than kk

This is a decision problem for Vertex Cover and is also NP-hard.

Obvious choice of parameter is kk. (natural parameter)

Brute force

Try all nn choose kk subsets of kk vertices., test each for coverage. Running time is O(EVk)O(EV^{k})

exponent depends on kk this is slow. I cannot say that for any fixed kk the algo is quadratic for example.

Fixed Parameter Tractability

A parameterized problem is fixed-parameter tractable (FPT) if there is an algorithm with running time f(k)nO(1)≤ f (k) · n^{O(1)} , such that f:NNf : N → N (non negative) and kk is the parameter, and the O(1)O(1) degree of the polynomial is independent of kk and nn.

Question: Why f(k)nO(1)f(k) · n^{O(1)} and not f(k)+nO(1)f(k) + n^{O(1)}? Theorem: f(k)nc∃f(k)·n^c algorithm iff f(k)+nc∃f'(k) + n^{c'}

kk-vertex cover problem is FPT:

  • pick random edge e=(u,v)e = (u,v)
  • either uu or vv or both is in the subset SS
  • guess each one:
    • add uu to SS, delete u an all incident edges from G, recurse with k=k1k' = k-1
    • do the same but with vv instead of uu
    • return the OROR of the two outcomes

Recursion tree:

                (n,k)
      (n-1,k-1)       (n-1,k-1)
(n-2,k-2) (n-2,k-2) (n-2,k-2) (n-2,k-2)

base case when k=0k=0 return true if no edges, else false. At each node we do O(n)O(n) work to delete incident edges. The total runtime is O(2k(V+E))O(2^k(|V|+|E|))

Kernalization is a polynomial time algorithm that converts an input (x,k)(x, k) into a small and equivalent input (x,k)(x', k'). Equivalent means that the answer I get in the end are the same. We want xf(k)|x'| \leq f(k). The algo you run on the smaller input would not depend on nn anymore, your problem size depends on f(k)f(k), hence your running time is O(kernalization) + O(run on smaller input) = O(nc)+O(f(k))O(n^c) + O(f(k)).

Theorem Aproblem is FPT iff there exists a kernelization.

Kernelize -> FPT is trivial. kernalization_proof.png

kernel_p1.png kernel_p2.png

Lecture 19. Synchronous Distributed Algorithms: Symmetry-Breaking. Shortest-Paths Spanning Trees

What are Distributed Algorithms?

Algorithms that run on networked processors, or on multiprocessors that share memory.

Most computing is on distributed systems.

Difficulties:

  • concurrent activities
  • uncertainty of timing

You have a different way of framework for thinking about problems here. You cannot just solve graph problems the usaul way you do, e.g. storing everything in adjacency list or matrix. All nodes are similar and you are allowed to send messages. By sending messages across the network you solve algos.

We consider two distributed computing models: • Synchronous distributed algorithms:

  • Leader Election
  • Maximal Independent Set
  • Breadth-First Spanning Trees
  • Shortest Paths Trees • Asynchronous distributed algorithms:
  • Breadth-First Spanning Trees
  • Shortest Paths Trees

Distributed Network

Based on an undirected graph G=(V,E)G= (V,E) associate:

  • a process with each graph vertex
  • two directed communication channels with each edge

Processes at nodes communicating using messages.

Each process has output ports, input ports that connect to communication channels.

Select a leader node in a distributed system.

Theorem 2: Let G=(V,E)G = (V, E) be an nn-vertex clique. Then there is an algorithm consisting of deterministic processes with UIDs that is guaranteed to elect a leader in GG. The algorithm takes only 1 round and uses only n2n^2 point-to-point messages.

• Algorithm:

  • Everyone sends its UID on all its output ports, and collects UIDs received on all its input ports.
  • The process with the maximum UID elects itself the leader.

Maximal Independent Set

Problem: Select a subset SS of the nodes, so that they form a Maximal Independent Set.

  • Independent: No two neighbors are both in the set.
  • Maximal: We can’t add any more nodes without violating independence.

Need not be globally maximum, you just need to have local independence. Can have more than one MIS sets.

Distributed MIS?

You have a graph representing the distributed system. The problem of finding an MIS in distributed system is not the same as gather all nodes and edges and run algo. Here each node should know if it is in the MIS or not using messages.

Assume:

  • No UIDs
  • Processes know a good upper bound on nn.

Require:

  • Compute an MIS SS of the entire network graph.
  • Each process in SS should output in, others output out.

Luby’s MIS Algorithm

• Executes in 2-round phases.

• Initially all nodes are active.

• At each phase, some active nodes decide to be in, others decide to be out, algorithm continues to the next phase with a smaller graph.

• Repeat until all nodes have decided.

• Behavior of active node uu at phase phph:

• Round 1:

  • Choose a random value rr in 1,2,,n51,2, … , n^5 , send it to all neighbors.
  • Receive values from all active neighbors.
  • If rr is strictly greater than all received values, then join the MIS, output in.

• Round 2: – If you joined the MIS, announce it in messages to all (active) neighbors. – If you receive such an announcement, decide not to join the MIS, output out. – If you decided one way or the other at this phase, become inactive.

Termination

• With probability 1, Luby’s MIS algorithm eventually terminates.

Theorem 7: With probability at least 11/n1 - 1/n, all nodes decide within 4logn4logn phases.

• Proof uses a lemma similar to before: • Lemma 8: With probability at least 11/n21 - 1/n^2 , in each phase 1, ... , 4logn4logn , all nodes choose different random values.

• So we can essentially pretend that, in each phase, all the random numbers chosen are differet.

• Key idea: Show the graph gets sufficiently “smaller” in each phase.

• Lemma 9: For each phase ph, the expected number of edges that are live (connect two active nodes) at the end of the phase is at most half the number that were live at the beginning of the phase.

Formal proof with probability bounds and expectation in slides. Animation of Luby algo in slides.

Breath-First Spanning Trees

That's the tree you get from BFS.

• New problem, new setting.

• Assume graph G = (V, E) is connected.

• V includes a distinguished vertex v0v_0 , which will be theorigin (root) of the BFS tree.

• Generally, processes have no knowledge about the graph.

• Processes have UIDs.

  • Each process knows its own UID.
  • i0i_0 is the UID of the root v0v_0 .
  • Process with UID io knows it is located at the root.

• We may assume (WLOG) that processes know the UIDs of their neighbors, and know which input and output ports are connected to each neighbor.

• Algorithms will be deterministic (or nondeterministic), but not randomized.

Output: Each process i!=i0i != i_0 should output parent jj, meaning that jj’s vertex is the parent of ii’s vertex in the BFS tree.

Very similar strategy to standard BFS just need to add te sending and hearing messages part.

Termination

• Q: How can processes learn when the BFS tree is completed?

• If they knew an upper bound on diam, then they could simply wait until that number of rounds have passed.

• Q: What if they don’t know anything about the graph?

When a subtree finishes propagate upwards that you are done, this starts from the leaves and goes upward. Need to send information up on the tree.

Need to send info upwards the tree.

Lecture 20. Asynchronous Distributed Algorithms: Shortest-Paths Spanning Trees

Synchronous Network Model

  • Processes at graph vertices, communicate using messages.
  • Each process has output ports, input ports that connect to communication channels.
  • Algorithm executes in synchronous rounds.
  • In each round:
    • Each process sends messages on its ports.
    • Each message gets put into the channel, delivered to the process at the other end.
    • Each process computes a new state based on the arriving messages.

The way of thinking in distributed systems is fundamentally different. You have a string limitation that every node in the system knows only things about itself and its neighbourhood. It is not aware of the whole system.

Each node of the graph has some process which is not aware of how the graph looks like. Nodes are connected through channels

Asynchronous Network Model

  • Complications so far:

    • Processes act concurrently.
    • A little nondeterminism.
  • Now things get much worse:

    • No rounds---process steps and message deliveries happen at arbitrary times, in arbitrary orders.
    • Processes get out of synch.
    • Much more nondeterminism.

    Asynchronous networks complexity:

  • Message complexity is number of messages sent by all processes during the entire execution

  • Time complexity. Cannot measure rounds like in synchronous networks.

  • A common approach:

  • Assume real-time upper bounds on the time to perform basic steps:

    • dd for a channel to deliver the next message, and
    • ll for a process to perform its next step. (local processing time)

Infer a real-time upper bound for solving the overall problem.

Now we have a big machine (queue) where each process is somewhere in the queue. As they execute we push and dequeu from the queue of processes.

BFS in asynchronous model

If you run simply the BFS like in synchronous model you might end up with a tre which is not BFS-output tree and do not have the shortest paths:

bfs_asynchronous.png

Message complexity:

Number of messages sent by all processes during the entire execution. O(E)O(E)

Time complexity:

Time until all processes have chosen their parents. Neglect local processing time. O(diamd)O(diam·d)

  • Q: Why diam, when some of the paths are longer? To have long paths these processes must run faster than the diameter path.

To fix it you need a relaxation algorithm, like synchronous Bellman-Ford.

Strategy:

  • Each process keeps track of the hop distance, changes its parent when it learns of a shorter path, and propagates the improved distances.
  • Eventually stabilizes to a breadth-first spanning tree.

Shortest Paths

  • Use a relaxation algorithm, once again. (same as synchronous but would be slower)
  • Asynchronous Bellman-Ford.
  • Now, it handles two kinds of corrections:
    • Because of long, small-weight paths (as in synchronous Bellman-Ford).
    • Because of asynchrony (as in asynchronous Breadth-First search).

• The combination leads to surprisingly high message and time complexity, much worse than either type of correction alone (exponential).

Lecture 21. Cryptography: Hash Functions

A hash function hh maps arbitrary strings of data to fixed length output. We want it to look random. In practice, hash functions are used for “digesting” large data.

h:{0,1}>0,1dh: \{0,1\}* -> {0,1}^{d}, the stars says that we can have arbitrary long string of bits.

No secret keys, all is public. Anyone can compute h(x)h(x).

Goal: Make hh run fast (polytime) and make it hard to have collisions. Simple stuff with mod is O(1)O(1) but is easy to find colissions.

We want to approximate/create a random oracle hh with these properties:

  • For input x{0,1}x \in \{0,1\}*, if hh has not seen xx before, then it outputs a random value h(x)h(x), Else h(x)h(x) returns its previously output.
  • the random oracle gives a random value (get it by throw a coin 256 times (SHA-256)), and gives deterministic answers to all inputs it has seen before.

Desirable Properties

  1. One-way (pre-image resistance): Given y{0,1}dy ∈ \{0, 1\}^d , it is hard to find an x such that h(x)=yh(x) = y.
  2. Strong collision-resistance: It is hard to find any pair of inputs x,xx, x' such that h(x)=h(x)h(x) = h(x').
  3. Weak collision-resistance (target collision resistance, 2nd pre-image resistance): Given xx, it is hard to find xx' such that h(x)=h(x)h(x) = h(x').
  4. Pseudo-random: The function behaves indistinguishable from a random oracle.
  5. Non-malleability: Given h(x)h(x), it is hard to generate h(f(x))h(f(x)) (without using hh)for any function ff.

Lecture 22. Cryptography: Encryption

Symmetric key encryption

  • secret key kk assume is 128 bit. It is shared between Alice and Bob.

c=ek(m)c = e_k(m)

m=dk(c)m = d_k(c)

cc stands for ciphertext, mm is message (plaintext), ee is the encryption function, dd is descryption function.

ee and dd need to be reversible operations (plus, minus, permutation, shift left, right)

Key exchange

How does secret key kk get exchanged/shared?

Puzzle: Alice sends to Bob a box with money, but need to be locked otherwise it would get stolen.

Solution: A locks the box sends to Bob. Bob locks the box and sends to Alice. Alice unlocks her lock and sends back to Bob.

Math assumption: Communitivy tbetween the lockers. Locking a box within a Box would not work.

Diffie-Hellman Key Exchange

key_exchange.png

gag^a is Alice's locker that hides aa, gbg^b is Bob's locker that hides bb. The goal is Alice and Bob share secretly the key kk. Middle man Mal might put their own locker gcg^c, thus we need asymmetric encryptions, Alice needs a way to authenticate she is communicationg with Bob. Authenticate using public/private key enctyption

Public Key encryption

Bob comes up with public and private keys. Alice sends to Bob cipher text.

message + public key = ciphertext ciphertext + private key = message

The two keys need to be linked in a mathematical way. Knowing the public key should tell you nothing about the private key.

RSA encryption algorithm.

  • Alice picks two large secret primes pp and qq
  • Alice computes N=pqN = p * q
  • Alice chooses an encryption exponent ee s.t. gcd(e,(p1)(q1))=1gcd(e,(p-1)(q-1)) = 1
  • Alice public key is a tuple =(N,e)= (N,e)
  • Dectyption exponent obtained using Extended Euclidean Algorithm by Alice (secretly) such that ed=1mod(p1)(q1)e*d = 1 mod(p-1)(q-1)
  • Alice private key =(d,p,q)= (d,p,q)

Encryption and Decrypton with RSA

  • encryption c=memodNc = m^e mod N
  • descryption m=cdmodNm = c^d mod N

To prove it works we need to show that med=mmodNm^{ed} = m mod N

Hardness of RSA

• Factoring: given NN , hard to factor into p,qp, q.

• RSA Problem: given ee such that gcd (e,(p1)(q1))=1(e, (p − 1)(q − 1)) = 1 and cc, find mm such that mecm^e ≡ c mod NN . Breaking a particular encryption cc.

The factoring problem: N=pqN = pq Given NN find pp and qq, is unknown if is NPNP-complete. Weird.

Other problems kcolorablek-colorable and knapsack are NP-complete and have been used in crypto algos but have been all broken and do not work in practice.

This is because NP-completeness is telling you stuff about the worse case. For cryptography we want probblems to have hard problems on average.

Lecture 23. Cache-Oblivious Algorithms: Medians & Matrices

In all algos so far we assumed accessing different data cost the same.

Modern Memory Hierarchy

memory_hierarchy.png

The closer to the CPU the faster you can transfer data.

Two notion of speed:

  • latency (how quickly you can fetch data)
  • bandwidth (how fat are the pipes between different hierarchies)

longer latency due to lnger distance data need to travel

A common technique to mitigate latency is blocking: when fetching a word of data, get the entire block containing it.

Amortize latency over a whole block.

Cost for getting one data point over a block:

latency/block size + 1/bandwidth

Block should have useful elements. We require the algorithm to use all words in a block (spatial locality) and reuse blocks in cache (temporal locality).

Exteranal Memory Model

external_model.png

Assume:

  • cache access is free
  • disk is huge (infinitely large)

This model handles blocks explicitly.

We count number of memory transfers between cache and the disk.

Cache-oblivious Model

The only change from the external memory model is that the algorithm no longer knows BB and MM . Accessing a word in the memory automatically fetches an entire block into the cache, and evicts the least recently used (LRU) block from the cache if the cache is full.

Different computers might have different BB and MM.

External and Cache-oblivious models give you a way to measure the data transfer complexity of algorithms.

Scanning

for i in range(N):
  sum += A[i]

Assume AA is stored contiguously in memory. Arrays are stored in contiguous parts of the memory, dictionaries not. This is why DICTIONARIES READ AND WRITE is much SLOWER

External memory model can align AA with a block boundary, so it needs N/BN/B memory transfers.

Cache -oblivious algorithms cannot control alignment (because it does not know B), so in needs N/B+1N/B + 1 memory transfers. NN cound be 2 and these two elements could be at a boundary, so you would need to access 2 blocks of memory.

Divide and Conquer

Base case:

  • problem fits in cache M\leq M
  • problem fits in O(1)O(1) blocks, MT(O(B))=O(1)MT(O(B)) = O(1)

Median Finding

  1. view array as partitioned into columns of 5 (each column is O(1)O(1) size).
  2. sort each column
  3. recursively find the median of the column medians
  4. partition array by x
  5. recurse on one side

We will now analyze the number of memory transfers in each step. Let MT(N)MT(N) be the total number of memory transfers.

  1. free
  2. a scan, O(N/B+1)O(N/B + 1)
  3. MT(N/5)MT(N/5), this involves a pre-processing step that coallesces the N/5N/5 elements in a consecutive array
  4. 3 parallel scans, O(N/B+1)O(N/B + 1), read all elements, check those less than xx, check those greater than xx
  5. MT(7N/10)MT(7N/10)

MT(N)=MT(N/5)+MT(7N/10)+O(N/B+1)MT(N) = MT(N/5) + MT(7N/10) + O(N/B + 1)

MT(N)=O(N/B+1)MT(N) = O(N/B + 1)

Lecture 24. Cache-Oblivious Algorithms: Searching & Sorting

Why LRU block replacement strategy?

LRUM2OPTM/2LRU_{M} \leq 2 OPT_{M/2}

MM is the size of the Cache size MM

LRUMLRU_M is the number of block evictions(i.e. number of block reads)

OPT knows the future and gives the most optimal eviction policy which minimizes the number of evictions.

Proof in lecture notes.

Final notes on other classes.

  • 6.047 - Computational Biology
  • 6.854 - Advanced Algorithms (natural class to take after this one)
  • 6.850 - Computational Geometry
  • 6.849 - Folding algorithms
  • 6.851 - Advanced Datastructure
  • 6.852 - Distributed Algorithms
  • 6.853 - Algorithms on game theory
  • 6.855 - Network optimization
  • 6.856 - Randomized algorithms
  • 6.857 - Applied cryptography
  • 6.875 - Theoretical cryptography
  • 6.816 - Multicore programming
  • 6.045 - Theory of computation classes (NP complete stuff, complexity)
  • 6.840 - Theory of computation grad class
  • 6.841 - Advanced complexity theory
  • 6.842 - Randomized complexity theory
  • 6.845 - Quantom complexity thery
  • 6.440 - Coding theory