Last post Mar 05, 2021 02:25 AM by Rednuts72
Mar 04, 2021 06:38 AM|Rednuts72|LINK
good night or good day,
I ask this question in webform forum and a contributor said to me that I should choice an other place of forum that my question is about http request, I come again to you about my problem of loading a list of internet sites (dns or url). I know that it should
be possible to ask a public server of internet network (3WC) with a http requested to obtain a list of dns or url and it is my work to realize a function that select a certain number of sites by the type of request which asked for example on the <title> page.
a contributor gave a request address http://www.w3.org/help/search?q=web+socket&search-submit but this request is wrong for me. is there others public server
thanks you for your participate.
Mar 04, 2021 11:51 AM|mgebhard|LINK
Your question does not make sense.
I know that it should be possible to ask a public server of internet network (3WC) with a http requested to obtain a list of dns or url
DNS translates a URL to an IP address. What is a list of DNS in your application? Do you want the IP addresses for a URL?
What is 3WC? Do you mean W3C? W3C has a site map where you can get the URLs.
a contributor gave a request address http://www.w3.org/help/search?q=web+socket&search-submit but this request is wrong for me. is there others public server requests
What is wrong? This is a support forum you must explain the expected results and the actual results.
Mar 04, 2021 01:24 PM|PatriceSc|LINK
Be as clear as possible. I first thought you wanted to crawl an existing list of sites. Do you mean that you want to find sites having something in their title tag? Just explain in plain English (you though the W3C may have a list of all existing web sites
on the internet ?)
If yes you can try the "intitle:" preifx. See http://www.googleguide.com/advanced_operators_reference.html#intitle
Given your other thread,n having high goals is fine but you likely far underestimate the complexity (and resources if this is not just for few sites but intended for general sites) of creating your own better search engine from zero (wiithout even talking
about having others using it ???)
Edit; Google also have site: to restrict a search to a given site or domain. Or you mean that you want to find all server names in the w3.org domain ??
Mar 04, 2021 01:50 PM|Rednuts72|LINK
in first, please excuse me for the wrong orthograph of 3WC in place of W3C ; in second, thanks you for your answers. what I do, it's to have a list of internet (web) sites with their dns address. after that, I realize a function that filters the list of
internet (web) sites by one or several key words. the problem is where to obtain this list of sites, should be asked a public server or is there a ".txt" or ".csv" file including the list of internet (web) sites.
thanks you for your contribution.
Mar 04, 2021 04:51 PM|PatriceSc|LINK
And you want a lisrt for sites found in the w3.org domain or you want to scan the whole web ? According to
https://siteefy.com/how-many-websites-are-there/ you have at least 1,000,000,000 web sites on intenet ?!!
A search engine is finding sites by using links found in earlier analyzed sites (Google does have DNS servers, I suspect they offer this service so that they can also "discover" new domains). Also a webmabster can add his own site by hand to make sure it
is found. Then all this content is preprocessed, stored etc... so that it can be used to return a search response without scanning sites in "real time". See perhaps
If you want to find sites from the general internet to me a realistic solution is to use an existing API such as
https://www.microsoft.com/en-us/bing/apis/bing-web-search-api. it seems Google doesn't have an official API at this time ???
Edit: accordiing to another source it can take between 4 days and 4 weeks to have Google to find a new website. You expect to have how manu sites in the list you are looking for ?
Mar 04, 2021 06:08 PM|Rednuts72|LINK
for the research engine (motor) project, I would have a (completed) list of internet (web) sites. I will do the scan by a filter after. as I know, there is two abilities as in first, asking a public server with a special request and secondly, having a list
of internet (web) sites in a table as a ".txt" or ".csv" file. I will be happy then you pass to me one or the two methods in choice.
Mar 04, 2021 07:17 PM|PatriceSc|LINK
Hummm... Try perhaps https://whoisds.com/newly-registered-domains which provide a daily list of new domains for free (around 100 000 per day). Seems you have a full list at
https://whoisdatacenter.com/whois-database/ for quite a price but you have also free samples by country at the end of this page.
It should be more then enough to give a try at whatever you want to do and see if you are still within your time and resource constraints. As pointed earlier a search engine can also discover new sites by extracing links from already processed sites...
Mar 05, 2021 02:25 AM|Rednuts72|LINK