From owner-freebsd-questions@FreeBSD.ORG Mon Mar 15 02:28:18 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 12CB416A4CE for ; Mon, 15 Mar 2004 02:28:18 -0800 (PST) Received: from hills.ccsf.cc.ca.us (hills.ccsf.cc.ca.us [147.144.1.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 03E1343D2D for ; Mon, 15 Mar 2004 02:28:18 -0800 (PST) (envelope-from abozan01@ccsf.edu) Received: from localhost (abozan01@localhost) by hills.ccsf.cc.ca.us (8.11.3/8.11.3) with ESMTP id i2FASHT02698; Mon, 15 Mar 2004 02:28:17 -0800 (PST) X-Authentication-Warning: hills.ccsf.cc.ca.us: abozan01 owned process doing -bs Date: Mon, 15 Mar 2004 02:28:17 -0800 (PST) From: Adam Bozanich X-X-Sender: abozan01@hills.ccsf.cc.ca.us To: Zhang Weiwu In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: questions@freebsd.org Subject: Re: [OT?] write C program with UTF16LE X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Mar 2004 10:28:18 -0000 On Mon, 15 Mar 2004, Zhang Weiwu wrote: > Hello. Although I write some php/perl script, I don't write C program. Now > I have a very large text file in UTF16LE format, the rule is strings are > seperated by numbers. Say > > 0300 6100 6200 6300 0400 6700 5400 9800 7400 0300 .... > > Leading 0300 means the following 3 characters (6 bytes) is a string, and > the next 0400 means the following 4 characters makes another string. > Here's an example using the fgets function. see 'man fgets'. There are probably a bunch of ways to go about this, but this one is nice and simple. #include #define CHUNKSIZE 5 /* 4 characters and a space */ /* max number of encoded chars if you are using 2 decimal places for the count*/ #define MAX_CHUNK_COUNT 99 int main(int argc, char **argv) { char delbuf[CHUNKSIZE]; char chunks[CHUNKSIZE * MAX_CHUNK_COUNT]; int chunk_count; while(fgets(delbuf,CHUNKSIZE+1,stdin) != NULL) { /* you may not want to destroy this */ delbuf[2] = '\0'; chunk_count = atoi(delbuf); if(fgets(chunks, (CHUNKSIZE * chunk_count) + 1 , stdin) == NULL){ fprintf(stderr,"can't read all of the string\n"); break; } fprintf(stdout,"\n%s",chunks); } exit(0); } This worked for the numbers you gave, but I'm sure that you need to add some better error handling and what not. You probably also don't want to trash the buffer holding the string length. Try running with this: ./a.out < inputfile > outputfile > but I am the kind of newbie don't know if I am using glibc at all. When I > just write > #include > Am i using the stdio.h from glibc? > Yes, on FreeBSD you are using GNU's libc I hope this gives you some ideas, good luck! -Adam