21
Oct

Scraping Lawyers and Accountants Database

Posted By admin

Project Title: Scraping Lawyers and Accountants Database

Project Description
Lawyers: All possible details of the office and the lawyer need to be collected.
https://zoekeenadvocaat.advocatenorde.nl/zoeken?type=kantoren&weergave=lijst&sortering=naam&q=&locatie%5Badres%5D=&locatie%5Bgeo%5D=false&locatie%5Bhash%5D=&pagina=1

This is a complete paginated list with offices and from there you can click to layers in that firm

From this you can create two lists; one for the offices and one for the lawyers.

In de url for the office is a unique ID at the end. Let’s save this ID with each office and use this link in each lawyer record to link them.

https://zoekeenadvocaat.advocatenorde.nl/kantoren/rotterdam/de-jonge-advocaten/28370304648

Each lawyer has specialised areas (called ‘rechtsgebebieden’ in dutch). Those areas are very important.

There is also a unique ID in the URL per lawyer. This should also be saved

There should be around 18236 lawyers and 5698 offices

Accountants (has two sites):

Again, alle possible details of both sites need to be collected.

Offices – https://www.nba.nl/kantorenzoeker/
This one is very easy, the data, 2969 offices are already on the first page in JSON. Check “Offices” request for this.

Accountants https://www.nba.nl/register/?name=
You can search this site per two letters (AA, AB, … ZZ)

No pagination – but you can get more on the same page by clicking on “toon meer”

It’s pretty slow.

It has a unique ID in de URL called “Leden kunnen inloggen voor meer informatie”. We want that ID also.

We like to have the offices and accountants linked. Which should be possible by comparing the address “Kantooradres” and the street information in the JSON and creating an unique ID

For similar work requirements feel free to email us on info@webscrapingexpert.com.

Comments

Add a comment