Curl Passing data to a simple function

Below is all the code to date. I need to do much modifications to it so
it will run with my program and this works. Thank god.
The only problem is I need to know how to pass a int to the write_data
functions, so I can load the data into the proper array number.
eg. Url_Data_Array[i].Memory get set up with the memory I have been
trying to load.



There is only two lines of code you need to concern your self with and
they are both have ~~~~~~~~~ in front of them. I need the
curl_easy_setopt(eh, CURLOPT_WRITEFUNCTION, write_data); to pass an int
to the static size_t write_data(char *ptr, size_t size, size_t nmemb,
void *stream);


I know what the char *ptr, size_t size, size_t nmemb, do as I wrote the
function to load it to memory. However I do not know what the void
*stream is for as I have not used it at all. Is this where I could pass
a structure to the function with additional information. I have been at
this for at least 3 months and this is the same stumbling block I come
to over and over again. Weather I use pthreads or any other form of
multiple file pulling.



If I can just communicate with this dumb function it would bring me to
my next step, it tells me in Curl that the static size_t write_data(char
*ptr, size_t size, size_t nmemb, void *stream); has to be written with
those parameters. God am lost any help would be great. However I can
change the name of the function. Could I just add an other void * or
something.


If you can get me through this it would be very helpful.

Thank you,

Donald
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#ifndef WIN32
#  include <unistd.h>
#endif
#include <curl/multi.h>

static const char *urls[] = {
  "http://www.microsoft.com",
  "http://www.opensource.org",
  "http://www.google.com",
  "http://www.yahoo.com",
  "http://www.ibm.com",
  "http://www.mysql.com",
  "http://www.oracle.com",
  "http://www.ripe.net",
  "http://www.iana.org",
  "http://www.amazon.com",
  "http://www.netcraft.com",
  "http://www.heise.de",
  "http://www.chip.de",
  "http://www.ca.com",
  "http://www.cnet.com",
  "http://www.news.com",
  "http://www.cnn.com",
  "http://www.wikipedia.org",
  "http://www.dell.com",
  "http://www.hp.com",
  "http://www.cert.org",
  "http://www.mit.edu",
  "http://www.nist.gov",
  "http://www.ebay.com",
  "http://www.playstation.com",
  "http://www.uefa.com",
  "http://www.ieee.org",
  "http://www.apple.com",
  "http://www.sony.com",
  "http://www.symantec.com",
  "http://www.zdnet.com",
  "http://www.fujitsu.com",
  "http://www.supermicro.com",
  "http://www.hotmail.com",
  "http://www.ecma.com",
  "http://www.bbc.co.uk",
  "http://news.google.com",
  "http://www.foxnews.com",
  "http://www.msn.com",
  "http://www.wired.com",
  "http://www.sky.com",
  "http://www.usatoday.com",
  "http://www.cbs.com",
  "http://www.nbc.com",
  "http://slashdot.org",
  "http://www.bloglines.com",
  "http://www.techweb.com",
  "http://www.newslink.org",
  "http://www.un.org",
};

#define MAX 10 /* number of simultaneous transfers */
#define CNT sizeof(urls)/sizeof(char*) /* total number of transfers to
do */

// delcare char* for loading in the next url to crawl
char *UrlAddress[50];
//declare the thread_data that the tread will pass to the function to
process
struct Url_Data
{
    int  Url_ID;
    char *UrlAddress;
    char *Memory;
    size_t UrlConnectionHtmlBody_size;
    char *RedirectAddress;
    char *IPAddress;
    long HttpResponse;
};
// declare an array of the Thread_data so you have a differnt var for
each thread
struct Url_Data Url_Data_Array[50];

char* memory;
size_t UrlConnectionHtmlBody_size;

~~~~~~~~~~~~~~~~~~~static size_t write_data(char *ptr, size_t size,
size_t nmemb, void *stream);

~~~~~~~~~~~~~~~~~~~static size_t write_data(char *ptr, size_t size,
size_t nmemb, void *stream)
{
    size_t mem;
    //increase the memory buffer size being held
    mem = size * nmemb;
    // set the sizt_t to know how long the char* is
    UrlConnectionHtmlBody_size += mem;
    if (mem>0)
    {
        memory = (char*)realloc(memory, UrlConnectionHtmlBody_size);
    }
    else
    {
        memory = (char*) malloc(UrlConnectionHtmlBody_size);
    }
    // store the data
    if (mem)
    {
        memcpy(&(memory[UrlConnectionHtmlBody_size-mem]), ptr, mem);
    };
    return mem;
};

static void init(CURLM *cm, int i)
{
  CURL *eh = curl_easy_init();
  CURLcode res;
~~~~~~~~~~~~~~~~  curl_easy_setopt(eh, CURLOPT_WRITEFUNCTION,
write_data);
  curl_easy_setopt(eh, CURLOPT_HEADER, 0L);
  curl_easy_setopt(eh, CURLOPT_URL, urls[i]);
  curl_easy_setopt(eh, CURLOPT_PRIVATE, urls[i]);
  curl_easy_setopt(eh, CURLOPT_VERBOSE, 0L);
  // pointer Redirect Site
        char *ra;
        char *ip;
        long HttpResponse;
        /* get the CURLINFO_HTTP_CONNECTCODE*/
        res = curl_easy_getinfo(eh, CURLINFO_RESPONSE_CODE,
&Url_Data_Array[i].HttpResponse);
        /* ask for the ReDirectAddress*/
        res = curl_easy_getinfo(eh, CURLINFO_REDIRECT_URL,
&Url_Data_Array[i].RedirectAddress);
        // Get the IP address for the web site
        res = curl_easy_getinfo(eh, CURLINFO_PRIMARY_IP,
&Url_Data_Array[i].IPAddress);

  curl_multi_add_handle(cm, eh);
}

int main(void)
{
  CURLM *cm;
  CURLMsg *msg;
  long L;
  unsigned int C=0;
  int M, Q, U = -1;
  fd_set R, W, E;
  struct timeval T;

  curl_global_init(CURL_GLOBAL_ALL);

  cm = curl_multi_init();

  /* we can optionally limit the total amount of connections this multi
handle
     uses */
  curl_multi_setopt(cm, CURLMOPT_MAXCONNECTS, (long)MAX);

  for (C = 0; C < MAX; ++C) {
    init(cm, C);
  }

  while (U) {
    while (CURLM_CALL_MULTI_PERFORM == curl_multi_perform(cm, &U));

    if (U) {
      FD_ZERO(&R);
      FD_ZERO(&W);
      FD_ZERO(&E);

      if (curl_multi_fdset(cm, &R, &W, &E, &M)) {
        fprintf(stderr, "E: curl_multi_fdset\n");
        return EXIT_FAILURE;
      }

      if (curl_multi_timeout(cm, &L)) {
        fprintf(stderr, "E: curl_multi_timeout\n");
        return EXIT_FAILURE;
      }
      if (L == -1)
        L = 100;

      if (M == -1) {
#ifdef WIN32
        Sleep(L);
#else
        sleep(L / 1000);
#endif
      } else {
        T.tv_sec = L/1000;
        T.tv_usec = (L%1000)*1000;

        if (0 > select(M+1, &R, &W, &E, &T)) {
          fprintf(stderr, "E: select(%i,,,,%li): %i: %s\n",
              M+1, L, errno, strerror(errno));
          return EXIT_FAILURE;
        }
      }
    }

    while ((msg = curl_multi_info_read(cm, &Q))) {
      if (msg->msg == CURLMSG_DONE) {
        char *url;
        CURL *e = msg->easy_handle;
        curl_easy_getinfo(msg->easy_handle, CURLINFO_PRIVATE, &url);
        fprintf(stderr, "R: %d - %s <%s>\n",
                msg->data.result, curl_easy_strerror(msg->data.result),
url);
        curl_multi_remove_handle(cm, e);
        curl_easy_cleanup(e);
      }
      else {
        fprintf(stderr, "E: CURLMsg (%d)\n", msg->msg);
      }
      if (C < CNT) {
        init(cm, C++);
        U++; /* just to prevent it from remaining at 0 if there are more
                URLs to get */
      }
    }
  }

  curl_multi_cleanup(cm);
  curl_global_cleanup();

  return EXIT_SUCCESS;
} 

These functions are not working either
1
2
3
4
5
6
7
8
9
10
11
12
  char *ra;
        char *ip;
        long HttpResponse;
        /* get the CURLINFO_HTTP_CONNECTCODE*/
        res = curl_easy_getinfo(eh, CURLINFO_RESPONSE_CODE,
&Url_Data_Array[i].HttpResponse);
        /* ask for the ReDirectAddress*/
        res = curl_easy_getinfo(eh, CURLINFO_REDIRECT_URL,
&Url_Data_Array[i].RedirectAddress);
        // Get the IP address for the web site
        res = curl_easy_getinfo(eh, CURLINFO_PRIMARY_IP,
&Url_Data_Array[i].IPAddress);

All I want to do is pull this information fomr the net, does anybody know a good lib to work with or something this cURL is driving me nuts. I need to pull multiple ones down because of DNS resolution can slow down the program.
curlpp (C++ API) may make the code easier. That's the only API I've used and it worked well for my needs. I've not attempted to load multiple sites like you are though.
This is pretty much what I use:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <string>
#include <iostream>
#include <sstream>
#include <curl/curl.h>

static size_t http_write(void* buf, size_t size, size_t nmemb, void* userp)
{
	if(userp)
	{
		std::ostringstream* oss = static_cast<std::ostringstream*>(userp);
		std::streamsize len = size * nmemb;
		oss->write(static_cast<char*>(buf), len);
		return nmemb;
	}

	return 0;
}

std::string get_html_page(const std::string& url, long timeout = 0)
{
	CURL* curl = curl_easy_init();

	std::ostringstream oss;

	curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &http_write);
	curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 1L);
	curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
	curl_easy_setopt(curl, CURLOPT_FILE, &oss);
	curl_easy_setopt(curl, CURLOPT_TIMEOUT, timeout);
	curl_easy_setopt(curl, CURLOPT_URL, url.c_str());

	curl_easy_perform(curl);
	curl_easy_cleanup(curl);

	return oss.str();
}

int main()
{
	std::string html = get_html_page("http://www.google.com");

	std::cout << html << std::endl;

	return 0;
}
Last edited on
Topic archived. No new replies allowed.