Scholar.py requests blocked!

The most basic version of the Scholar Search React App will have a search field to search for journals, and then a filter field also. The search will query the server which will use scholar.py to return results for the search query.

Querying the server will be done via Ajax requests, however the React library does not have any inbuilt Ajax/networking functionality since it is a ‘View’ library, so we will need to use another library for this.

The scholar.py search can be accessed through the projects python script using os.system (‘scholar.py –options’). This os.system runs the scholar.py command as if it were running through the shell, and so we can get the results in our python script.

So basically after running the scholar script a few times, it no longer seems to giving out results. I looked at the ‘Issues’ posts on the github page, and as I suspected it seems that google blocks the requests after a certain number. It must think the requests are coming from some kind of a robot or crawler so is blocking them. Apparently there is a captcha which comes up after several requests, which the scholar.py cannot get through so returns empty results.

As such it does not seem like the best stable script to use, unfortunately. Perhaps will revert to building a YouTube based app using React.

Advertisements

Scholar Search and ReactJS

Am planning to built a Single Page Application that can search google scholar, as well as customizing results.

Scraping google scholar will be easy thank to Christian Kreibich who has written a python library (scholar.py) which returns results from google scholar based on the query variables such as title, author and date. Here is a sample command from scholar.py and the output it returns (excerpt taken from the schorlar.py git repository)

$ scholar.py -c 1 --author "albert einstein" --phrase "quantum theory"
         Title On the quantum theory of radiation
           URL http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
          Year 1917
     Citations 184
      Versions 3
    Cluster ID 17749203648027613321
      PDF link http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
Citations list http://scholar.google.com/scholar?cites=17749203648027613321&as_sdt=2005&sciodt=0,5&hl=en
 Versions list http://scholar.google.com/scholar?cluster=17749203648027613321&hl=en&as_sdt=0,5
       Excerpt The formal similarity between the chromatic distribution curve for thermal radiation [...]

The back end of the application will be built in flask and be used to query google scholar. The front end will be built in React.

This is my first application being built with React. To get started I built the skeleton for a Flask and React based application following a guide from Real Python.

Social Login working, databases and admin view setup

So made good progress on the site today.

Had some problems yesterday with the popup for the facebook and google social login not closing, and reloading the ‘redirect URL’ within the popup window. Realized the popup was not closing for the social logins because I was accessing the site on a httpdomain. Must have been a mistake to put it there, because the site is supposed to load on http:// only. When accessing via http it works fine, and the popup closes. That took me a couple to figure out, because I tried a series of other fixes before figuring out that was the issue.

The social login is done via Angular using a library called ‘Satellizer‘ by Sahat Yalkabov. It uses JWT tokems for the authentication, and can be used to access lots of different social sites.

Now the social login is working, I have also created a Model for the videos, and connected the add/remove video buttons to the back end also, so the database updates for each of the users. Using the infamous ‘Flask Megablog Tutorial‘ to help setup the models and ensure there is a link between the Users and Videos.

Initiated the Flask-Admin view also so it is easy to see/change the database, super important for this early stag of building. Also updated the code so that if you login via Google or Facebook, as long as you have the same email address in both, it will link the accounts, so you will get the same library whichever account you login.