Since graduating Flatiron School in November of last year, I have had the opportunity to take my education in any direction that I choose.
When I built and launched my first web applications, I became obsessed with creating user accounts, authenticating, and assigning cookies. I spent hours unpacking OAuth and how that protocol actually creates a secure environment on the web. From this spark, I have made the decision to learn the basics of cybersecurity.
This new pursuit requires that I wrap my head around Python so that I can fully engage in the world authentication, authorization, and overall security. I’ll share my progress learning security through hack-the-box in a post next week. Today, I want to share a problem that I have been working on to learn Python.
Instead of enrolling in a Udemy course or following a tutorial, I chose to learn Python by solving problems directly. I prefer using https://projecteuler.net/, because it’s challenging and most of the problems cannot be solved using brute-force. I also learned that Python is not so different from Ruby. The syntax is different; but the ideas around iterating, looping, arrays (in Python they’re called lists), and objects are almost identical. Enough talk, here’s the problem I’m most proud of and how I refactored it to be much faster:
This is a brute-force solution that presents a linear pattern as the data set is increased. Using a simple python timer I was able to graph execution time for various data sets.
The puzzle now is to write a solution that doesn’t have this linear growth pattern. I refactored this solution to utilize a dictionary instead of a list, but this saved marginal amounts of processing time. My next attempt at speeding things along was to omit the list entirely. Instead of keeping an entire list, I just kept a counter and reassigned my ‘most recent prime’. This uses way less memory and it did speed things up by five seconds on the largest data set, but I’m not making a huge dent in the execution time. Here’s what that looks like:
I wasn’t satisfied so I did some more digging. I know that prime numbers are not divisible by two, so I changed my incrementor to increment from 1 to 3 to 5, thus avoiding half of the numbers in any data set. This cut down my calculation time by 3 seconds. This was not the improvement that I was looking for.
I decided that I was attacking this problem incorrectly. Iterating through each number; 1 through 10,001 is a necessary evil. I can shorten that a bit by only iterating over odd numbers, but the real meat of this problem is in my is_prime function. Take the number 196,111 for example. Using my current algorithm; I have to iterate 196,111 different times to determine that this number is in fact not divisible by any number between 2 and 196,111. My first refactor cuts the solve time in half by breaking each iteration at the half way point:
While I was digging, I learned about the Sieve of Eratosthenes. I applied the first seven primes to my algorithm and I was able to shave another 30 seconds off of my solve time. It looks clunky, but a 50% reduction in execution time is huge:
To make this algorithm even faster, I would need to continue to expand the sieve as my data set grows. To do so, I would need to create an iterator within an iterator. This raises a whole new set of problems and like all code, this solution is not finished. It works for now and I made significant progress. Stepping away is not submission, just a break for now.
I hope that my struggle for efficient code sheds some light on how much work a few seconds of efficiency can take (for a new engineer). It makes a well oiled web application that much more inspiring.
Happy coding :)