Got Thickness, Slant, and Width data by scraping website. Put sliders in Google Font App.

To scrape the contents of the website I first got all the elements with attribute ‘gf-font-style’ using this code.

The value of the attribute was a string which contained the font family, weight and style information. I parsed the string to extract these as three separate variables. This function helped for parsing the string also.

I then built up a list of functions which I could put in the console, and it would extract the weight, family and style of all the fonts on the page. I did this on each page for all the thicknesses, widths, and slants, and so I have all this data saved in arrays.

I built sliders in the extension to filter the data by thickness, width and slant using this new data.

Details page fully working, and trying some web scraping

The details page is fully working. App reloads to details view if that was the last view before the popup closes. To do this I had to move more states to the applicationState object, since this is the object that is sent to the content page, and so the showDetails value is stored in it now, so the content page can update the popup on that.

Have hidden overflow of the detail variants text. So before it was moving to a new line if it couldnt fit. I resolved this by adding the following CSS:

  • text-overflow: ellipsis;
  • white-space: nowrap;
  • overflow: hidden;

The nowrap stops it moving to a new line, and the overflow:hidden hides any text tht goes over the div. Ideally I want any text that overflow to be truncated with a … , which is what the text-overflow:ellipsis is supposed to do, but for some reason that is not working. So I have just left it for now. The nowrap was an important enough find and will do for now.

Am now loading full set of Roboto fonts on load, because they are needed by material UI. When switching to the detail page, I noticed some text in the apply div was going bolder and causing a flash of unstyled fonts. This only happened in the Roboto detail, so bascially when the Roboto variants were donwloaded, the buttons and other icons were updating too. So have preloaded the full set of Roboto weights now, so these load correctly on start. Material UI didnt specify which weights were needed, as long as nowhere I could see, so I have just included them all now. Makes a much smoother transtion between the detail and main view, since the applyDiv, which is shared in both views, no longer reloads, so it looks a lot cleaner. 

Cleaned up the code more, and removed functions am not using. Moved the Google Filter Data (categories, varitans, languages, e.t.c.) to a separate file and am importing them on load, so the App.js looks cleaner.

Have pushed this new version to the store.

Will now work on scraping the Google Fonts page to get updated filters sets for weights, width and slant, and also to get the font pairs.

Things remaining to do on the UI is get the ellisis working on the font overlflow.

How the require method loads modules in Node.

Spent a while trying to crawl the google fonts page. It is actually not that simple because the fonts on the page are rendered using javascript, so if we donwload and then crawl the page, the fonts wont be included in the page source.

I tried this initially using requestJs, but this only gets the page source so it wasnt working. I also tried using a headless browser, phantomJs, to render the page, however the Google Fonts page blocks the headless browsers as it realises it is not a real browser. I also tried a python scrpaer, pyQt4, but the tutorial seemed outdated and i couldnt get it to work.

What make it touch also is the fonts only load on scroll. And the google font page will only ever have upto 20 font tiles in the DOM at any one time. It removes and adds the font tags from the DOM as your scroll. However the <style> tags for the different fonts remain.

I was testing at the end ways to extract the fonts from the page using the inspect console. Will try more on this tomorrow.