
In my experience, it is NOT always feasible.

If there are static urls, you don’t need the ‘go_from_x_to_y’ functions – they can just be “go_to_y” functions instead, which is great if it’s feasible. So, at a high level, there is a function for each possible navigation path, unless there are static urls. Go_from_invoice_page_to_listing_page(driver) Go_from_listing_to_detail_page(driver, item) The general pattern I’ve found success with looks like this: I wound up in a slightly foreign-looking structure through trial and error. Because a selenium program controls a browser that is meant to be used by a human, the flow can feel pretty strange when you’re used to the usual back end coding constructs. This might be the last thing you need to hear when you’re trying to make forward progress fast, but I found that doing this kind of automation scripting seems to demand a different structure from, say, writing backend queue workers that update data or cache systems, or writing API endpoints that perform simple CRUD operations. If I wanted to do something really hacky that a real person would never do on the site, I was free to do that so long as I got the data as a result.

I was not constrained to emulating behaviors that actual end users are expected to take, as web QA automation often demands. Please note that in this project, the point of the work is automating sites with no API to download data and documents.

So, this was very slow going at first, but now that I’m over the hump of getting reacquainted with it, and learning quite a number of new things about it, I thought I’d post the notes here for anyone else who might find them useful (including future me – because my memory is terrible). I had used BeautifulSoup more recently, but not… recently. I had a passing familiarity with Selenium at the start of this, but my knowledge was dated. I used Python Selenium Webdriver for a project wherein a client needs a program that will log into around 25 different web sites, and download a total of 750-1000 different documents. Automating operations across so many different sites has been a huge learning opportunity for me.
