首页 > 编程语言> > 编译原理 C语言词法分析程序的设计与实现

编译原理 C语言词法分析程序的设计与实现

2021-01-04 10:29:25 作者：互联网

词法分析程序

一、实验题目

C语言词法分析程序的设计与实现

二、实验要求

可以识别出用C语言编写的源程序中的每个单词符号，并以记号的形式输出每个单词符号。
可以识别并跳过源程序中的注释。
可以统计源程序中的语句行数、各类单词的个数、以及字符总数，并输出统计结果。
检查源程序中存在的词法错误，并报告错误所在的位置。
对源程序中出现的错误进行适当的恢复，使词法分析可以继续进行，对源程序进行一次扫描，即可检查并报告源程序中存在的所有词法错误。

三、程序设计说明

1．语言说明
（1）标识符：以字母或者下划线开头且由字母，数字，下划线组成的符号串
（2）关键字（c语言常用的32个关键字）

（3）无符号数：由整数部分，可选的小数部分和可选的指数部分构成。
（4）关系运算符：<,<=,>,>=,==,!=,&&,||;
（5）算术运算符：+，+=，-，-=，，=，/，/=,%
（6）逻辑运算符：&，|，！,^,~，？
（7）标点符号：(,),{,},[,],:,;,
（8）赋值符号：=
（9）注释标记：以“/”开始，以“/”结束；“//”
（10）分隔符：tab,enter,space

2．在状态转换图(如下附图)，然后进行每一个状态对应一小段程序，构造相应的词法分析程序。
（1）在开始状态时，首先要读入一个字符。若读入的字符是一个空格(包括blank,tab)就跳过它，继续读字符，直到对你一个非空字符为止。接下来功的工作就是根据所读的非空字符转相应的程序进行处理。
（2）在标识符状态，识别并组合出一个标识符之后，还必须加入一些动作，如查关键字表，以确定识别出的单词符号是关键字还是用户的自定义标识符，并输出相应的记号和属性。
（3）在无符号数状态，可识别出各种常数，包括整数，小数和无符号数。
（4）在’<’状态，若读进的下一个字符是“=”,则输出关系运算符“<=”;若读进的下一个字符是“<”,则输出关系运算符“<<”;否则输出关系运算符“<”。
（5）在“/”状态，若读进的下一个字符是’*‘ 则进人注释处理状态，词法分析程序要做的工作是跳过注释,具体做法就是不断地读字符,直到遇到“*/”为止，然后转开始状态，继续识别和分析下一个单词;若读进的下一个字符不是“*”,而是“/”，那么即为双斜杠注释符，跳过注释，直到读到该行的回车符，再进行新的字符的读入；如读入的是“=”,那么即为“/=”;否则则输出斜杠“/”。
（6）在“!”状态﹐若读进的下一个字符是“=”,则输出不等号“!=”;否则，输出感叹号“！”。
（7）在“=”状态﹐若读进的下一个字符是“=”,则输出相等号“==”;否则，输出等号“=”。
（8）在“+”状态﹐若读进的下一个字符是“=”,则输出“+=”;否则，输出“+”。
（9）在“-”状态﹐若读进的下一个字符是“=”,则输出“-=”;否则，输出“-”。
（10）在“*”状态﹐若读进的下一个字符是“=”,则输出“=”;否则，输出“”。
（11）在“\n”状态﹐将行数加一，并跳转到新的开始进行读取字符。
（12）在“””状态,一直读取到相匹配的““”即可。
（13）在“‘”状态,一直读取到相匹配的“’”即可。
（14）在“&”状态﹐若读进的下一个字符是“&”,则输出“&&”;否则，输出“&”。
（15）在“|”状态﹐若读进的下一个字符是“|”,则输出“||”;否则，输出“|”。
（16）在“#”状态﹐若读进的下一个字符串是“include”或者“define”,则进入头文件的处理。此程序中的头文件处理暂时只支持“include”与“define”两种关键字的识别，并且头文件需要满足“header.h”的格式，否则都会判断为不正确的头文件，并记录错误的行号。
（17）在其他标点符号状态，只需输出其相应的记号即可。
（18）若进入错误处理状态，表示词法分析程序从源程序中读入了一个不合法的字符。所谓不合法的字符是指该语言不包括以此字符开头的单词符号。词法分析程序发现不合法字符时，要做错误处理,其主要工作是记录改错行与打印错误信息﹐并跳过这个字符，然后转开始状态继续识别和分析下一个单词符号。
（19）为了判断是否已经读到单词符号的右端字符,有时需要向前多读人一个字符，比如在标识符状态和无符号数状态,在返回调用程序之前,应将向前指针后退一个字符。
（20）如果读取的字符是文件结束符，那么即退出词法分析程序，结束分析。

3．状态转化图（自动机）
在这里插入图片描述

4．输入输出形式
（1）在文件输入与输出，并且格式按照 <行号，记号，单词> 输出。其中关键字不需要属性输出，其记号唯一代表一个关键字。
对照表见附件“Makert_table.txt”
（2）输出中包含总字节数，各类单词的个数，语句行数
（3）输出错误的位置的行号。

5．定义全局变量和过程
（1）state:整型变量,当前状态指示。
（2）ch:字符变量，存放当前读取的字符
（3）constexpr auto Buffer_len = 2048;
（4）constexpr auto Half_Buffer_len = 1024;
（5）//表达式于记号表
vector<pair<string, string> > MaketTable;
（6）//记录每一个单词的数量
map<string, int> NumOfWord;
（7）//记录错误的位置
set<int> Errorlocation;
（8）//记录程序中的字符总数
long int char_num;
（9）//记录源程序中的语句行数
long int statement_lines;
（10）//符号表
map<string,int> SymbolTable;
（11）//符号表指针
int SymbolPtr = 0;
（12）//提取字符
char ExtractingWords(int& StartPointer, int& ForwardPointer, int& cnt, char* Buffer);
（13）//将字符连接在token之后
void cat(string& token, const char& ch);
（14）//判断是否是字符
bool Is_letter(const char& ch);
（15）//判断是否是数字
bool Is_digit(const char& ch);
（16）//向前指针后退一个字符
void retract(const char* Buffer, int& ForwardPointer);
（17）//判断是否是关键字
string CheckKey(const string &token);
（18）//文件输出分析后的单词
void FileOutResult(const string& str,string sttribute = "-");
（19）//错误处理(方法：记录当前错误地行号，并且跳过当前地字符)
void error();
（20）//输出统计结果
void ShowResult();

四、源程序
(附件中包含下列代码的.c文件)

#define _CRT_SECURE_NO_WARNINGS
#include<iostream>
#include<fstream>
#include<iomanip>
#include<algorithm>
#include<string>
#include<vector>
#include<cstring> 
#include<map>
#include<set>
using namespace std;

constexpr auto Buffer_len = 2048;
constexpr auto Half_Buffer_len = 1024;

//表达式于记号表
vector<pair<string, string> > MaketTable;
//记录每一个单词的数量
map<string, int> NumOfWord;
//记录错误的位置
set<int> Errorlocation;  
//记录程序中的字符总数
long int char_num;
//记录源程序中的语句行数
long int statement_lines;
//符号表
map<string,int> SymbolTable;
//符号表指针
int SymbolPtr = 0;
//提取字符
char ExtractingWords(int& StartPointer, int& ForwardPointer, int& cnt, char* Buffer);
//将字符连接在token之后
void cat(string& token, const char& ch);
//判断是否是字符
bool Is_letter(const char& ch);
//判断是否是数字
bool Is_digit(const char& ch);
//向前指针后退一个字符
void retract(const char* Buffer, int& ForwardPointer);
//判断是否是关键字
string CheckKey(const string &token);
//文件输出分析后的单词
void FileOutResult(const string& str,string sttribute = "-");
//错误处理(方法：记录当前错误地行号，并且跳过当前地字符)
void error();
//输出统计结果
void ShowResult();   
//分析后的结果输出到该文件
ofstream F_Out("result.txt", ios::out);

int main()
{
	//记号表存储初始化与数据读取
	ifstream F_MaketTable("Makert_table.txt", ios::in);

	string expression = "", mark = "";
	if (!F_MaketTable.is_open())
		cout << "Open file failure" << endl;
	while (!F_MaketTable.eof()){            // 若未到文件结束一直循环
		F_MaketTable >> expression >> mark;
		MaketTable.push_back(make_pair(expression, mark));
	}
	F_MaketTable.close();   //关闭文件

	char_num = 0, statement_lines = 1;
	char ch;

	/**变量声明
	* left_len : 左缓冲区数量剩余容量
	* right_len : 右缓冲区数量剩余容量
	* StartPointer : 开始指针
	* ForwardPointer : 向前指针
	* state : 状态
	*/
	int cnt = 0, StartPointer = 0, ForwardPointer = Buffer_len - 1, state = 0;
	char Buffer[Buffer_len];  //缓冲区数组
	memset(Buffer, '\0', sizeof(Buffer));

	//将输入流定向于文件
	if (NULL == freopen("test.txt", "r", stdin)) {
		cout << "test.txt打开失败" << endl;
		cout << "FAILED!" << endl;
		return 0;
	}

	F_Out << "词法分析结果如下所示:" << endl;
	F_Out << "[Line " << statement_lines << "]:" << endl;
	string token;	//存储临时提取的单词
	do {
		switch (state) {
		case 0:{
			token = "";
			//反复调用TakeWord,去掉空格
			do {
				ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			} while (ch == ' ' || ch == '\t');
			switch (ch) {
				//设置标识符状态
				case 'a':case 'b':case 'c':case 'd':case 'e':case 'f':case 'g':case 'h':case 'i':case 'j':
				case 'k':case 'l':case 'm':case 'n':case 'o':case 'p':case 'q':case 'r':case 's':case 't':
				case 'u':case 'v':case 'w':case 'x':case 'y':case 'z':case '_':
				case 'A':case 'B':case 'C':case 'D':case 'E':case 'F':case 'G':case 'H':case 'I':case 'J':
				case 'K':case 'L':case 'M':case 'N':case 'O':case 'P':case 'Q':case 'R':case 'S':case 'T':
				case 'U':case 'V':case 'W':case 'X':case 'Y':case 'Z':state = 1; break;
				//设置常数状态
				case '0':case '1':case '2':case '3':case '4':case '5':case '6':case '7':case '8':case '9':
				state = 2; break;
				//设置运算符状态
				case '<':state = 8; break;
				case '>':state = 9; break;
				/*  ?/:/^/&/|/%/(/)/[/]/{/}/,/;  */
				case'?':case':':case'^':case'%':case'(':case')':case'[':case']':case'{':
				case'}':case',':case';':case '~':{
					string temp_ch = "";
					temp_ch += ch;
					NumOfWord[temp_ch] += 1;   //统计单词个数
					FileOutResult(temp_ch);
					state = 0;
					break;
				}
				case '=':state = 10; break;    //等号或者判断是否相等时的逻辑符号
				case '+':state = 11; break;    //加或者加等或者加加
				case '-':state = 12; break;    //减或者减等或者减减
				case '*':state = 13; break;    //乘或者乘等符号
				case '/':state = 14; break;    //解释符号或者除号处理
				case '!':state = 17; break;    //否或者不等号处理
				case '\n':state = 18; break;   //回车处理
				case '"':state = 19; break;    //（字符串）语句处理
				case '&':state = 20; break;    //逻辑与或者取地址
				case '|':state = 21; break;    //或与逻辑或
				case '#':state = 22; break;    //头文件等处理
				case '\'':state = 23; break;   //处理单引号标志
				default:state = 24; break;     
			}
			break;
		}
		case 1: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			if (Is_letter(ch) || Is_digit(ch)) state = 1;
			else {
				//后退一步前向指针
				retract(Buffer,ForwardPointer);
				//转向读取新的单词
				state = 0;
				//判断是否是关键字
				string ans = CheckKey(token);
				//是关键字时，输出到文件中
				if (ans != "") {
					NumOfWord[token] += 1;   //统计单词个数
					FileOutResult(ans);
				}
				//不是关键字而是标识符时
				else {
					NumOfWord["id"] += 1;   //统计单词个数
					string attribute = "";
					auto mptr = SymbolTable.find(token);
					//存在时，即提取出属性（入口指针）
					if (mptr != SymbolTable.end()) {
						attribute = to_string(mptr->second);
					}
					//不存在时，插入，并分配入口指针
					else {
						SymbolTable.insert(pair<string,int>(token, SymbolPtr));
						attribute = to_string(SymbolPtr);
						SymbolPtr++;
					}
					FileOutResult(CheckKey("id"),token);
				}
			}
			break;
		}
		case 2: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			switch (ch) {
				//后面跟着数字
				case '0':case '1':case '2':case '3':case '4':case '5':case '6':case '7':case '8':case '9':state = 2; break;
				//小数
				case '.':state = 3; break;
				//科学计数法
				case 'E':state = 5; break;
				default: {
					NumOfWord["num"] += 1;   //统计单词个数
					//后退一步前向指针
					retract(Buffer, ForwardPointer);
					//转向读取新的单词
					state = 0;
					FileOutResult(CheckKey("num"),token);
					break;
				}
			}
			break;
		}
		case 3: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			if (Is_digit(ch)) { state = 4; break; }   //如果是数字，则跳转到状态4 
			//遇到错误 
			else {
				error();
				state = 0;
			}
			break;
		}
		case 4: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			switch (ch) {
				case '0':case '1':case '2':case '3':case '4':case '5':case '6':case '7':case '8':case '9':state = 4; break;
				case 'E':case 'e':state = 5; break;    //后面为数字或者字母E or e 
				default: {
					retract(Buffer, ForwardPointer);
					state = 0;
					NumOfWord["num"] += 1;   //统计单词个数
					FileOutResult(CheckKey("num"),token);
					break;
				}
			}
			break;
		}
		case 5: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			switch (ch) {
				case '0':case '1':case '2':case '3':case '4':case '5':case '6':case '7':case '8':case '9':state = 7; break;
				case '+':case'-':state = 6; break;    //后面跟着+ or -或者数字时 
				default: {
					retract(Buffer, ForwardPointer);   //回退一个 
					error();   //记录错误行号 
					state = 0;
					break;
				}
			}
			break;
		}
		case 6: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			if (Is_digit(ch)) { state = 7; break; }   //遇到数字，则跳转到状态7 
			else {
				retract(Buffer, ForwardPointer);    //回退一个 
				error();       //记录错误行号 
				state = 0;
				break;
			}
			break;
		}
		case 7: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			if (Is_digit(ch)) { state = 7; break; }
			else {
				NumOfWord["num"] += 1;   //统计单词个数
				FileOutResult(CheckKey("num"),token);
				retract(Buffer, ForwardPointer);
				state = 0;
			}
			break;
		}
		case 8: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			switch (ch) {
				//存在小于或者小于等于地两种情况时 
				case '<':cat(token, ch); NumOfWord[token] += 1; state = 0; FileOutResult(CheckKey("<<")); break;
				case '=':cat(token, ch); NumOfWord[token] += 1; state = 0; FileOutResult(CheckKey("<=")); break;
				default: {
					NumOfWord[token] += 1;
					FileOutResult(CheckKey("<"));
					retract(Buffer, ForwardPointer);
					state = 0;
					break;
				}
			}
			break;
		}
		case 9: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			switch (ch) {
				//存在大于或者大于等于时 
				case '>':cat(token, ch); NumOfWord[token] += 1; state = 0; FileOutResult(CheckKey(">>")); break;
				case '=':cat(token, ch); NumOfWord[token] += 1; state = 0; FileOutResult(CheckKey(">=")); break;
				default: {
					NumOfWord[token] += 1;
					FileOutResult(CheckKey(">"));
					retract(Buffer, ForwardPointer);
					state = 0;
					break;
				}
			}
			break;
		}
		case 10: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			//存在等于或者判断是否相等地符号时 
			if (ch == '=') {
				cat(token, ch); NumOfWord[token] += 1;state = 0; FileOutResult(CheckKey("=="));
			}
			else {
				NumOfWord["assign-op"] += 1;
				FileOutResult(CheckKey("="));
				retract(Buffer, ForwardPointer);
				state = 0;
			}
			break;
		}
		case 11: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			switch (ch) {
				//判断加号或者加+=时 
				case '+':cat(token, ch);state = 0; FileOutResult(CheckKey("++")); break;
				case '=':cat(token, ch);state = 0; FileOutResult(CheckKey("+=")); break;
				default: {
					FileOutResult(CheckKey("+"));
					retract(Buffer, ForwardPointer);
					state = 0;
					break;
				}
			}
			NumOfWord[token] += 1;
			break;
		}
		case 12: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			switch (ch) {
				//判断-=或者减号时 
				case '-':cat(token, ch); state = 0; FileOutResult(CheckKey("--")); break;
				case '=':cat(token, ch); state = 0; FileOutResult(CheckKey("-=")); break;
				default: {
					FileOutResult(CheckKey("-"));
					retract(Buffer, ForwardPointer);
					state = 0;
					break;
				}
			}
			NumOfWord[token] += 1;
			break;
		}
		case 13:{
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			//判断*或者*=时 
			if (ch == '=') {
				cat(token, ch); state = 0; FileOutResult(CheckKey("*="));
			}
			else {
				FileOutResult(CheckKey("*"));
				retract(Buffer, ForwardPointer);
				state = 0;
			}
			NumOfWord[token] += 1;
			break;
		}
		case 14: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			switch (ch) {
				case'*':state = 15; break;   //为“/* ”时 
				case'/':state = 16; break;   //为“// ”时 
				case'=':cat(token, ch); NumOfWord[token] += 1; state = 0; FileOutResult(CheckKey("/=")); break;  //为“/= ”时 
				default: {
					NumOfWord[token] += 1;
					FileOutResult("/");
					retract(Buffer, ForwardPointer);
					state = 0;
					break;
				}
			}
			break;
		}
		case 15: {
			//判断"*/" 
			do {
				ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
				if (ch == '\n') {    //遇到回车时，即换一行，行数增加一 
					NumOfWord["enter"] += 1;
					statement_lines++;
					F_Out << "[Line " << statement_lines << "]:" << endl;     //输出每一行的行号 
				}
			} while (ch != '*' && ch != EOF);
			//遇到文件结束符时 
			if (ch == EOF) {
				error();
				break;
			}
			else {
				ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
				if (ch == '/') state = 0;
				else
					state = 15;
				break;
			}
		}
		case 16: {
			//判断“// ”的结束 
			do {
				ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			} while (ch != '\n' && ch != EOF);
			retract(Buffer, ForwardPointer);
			state = 0;
			break;
		}
		case 17: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			//为“!= ”时 
			if (ch == '=') {
				cat(token, ch); state = 0; FileOutResult(CheckKey("!="));
			}
			//为 “！”时 
			else {
				FileOutResult(CheckKey("!"));
				retract(Buffer, ForwardPointer);
				state = 0;
			}
			NumOfWord[token] += 1;   //统计单词个数
			break;
		}
		case 18: {
			//遇见回车时 
			NumOfWord["enter"] += 1;
			++statement_lines;
			F_Out << "[Line " << statement_lines << "]:" << endl;
			state = 0;
			break;
		}
		case 19: {
			//遇见双引号时 
			string str = "";
			do {
				ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
				str += ch;
			} while (ch != '"' && ch != '\n' && ch != EOF);
			if (ch == '"') {
				str.erase(str.end() - 1);
				FileOutResult(CheckKey("\"\""),str);
				NumOfWord["literal"] += 1;
			}
			else {
				Errorlocation.insert(statement_lines);    //记录该错误所在的行号
				retract(Buffer, ForwardPointer);  //回退一步，进行相应处理
			}
			state = 0;
			break;
		}
		case 20: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			//为“&& ”时 
			if (ch == '&') {
				cat(token, ch); state = 0; FileOutResult(CheckKey("&&"));
			}
			//为 “& ”时 
			else {
				FileOutResult(CheckKey("&"));
				retract(Buffer, ForwardPointer);
				state = 0;
			}
			NumOfWord[token] += 1;
			break;
		}
		case 21: {
			cat(token, ch);
			ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
			//为““|| ”时 
			if (ch == '|') {
				cat(token, ch); state = 0; FileOutResult(CheckKey("||"));
			}
			//为“|”时 
			else {
				FileOutResult(CheckKey("|"));
				retract(Buffer, ForwardPointer);
				state = 0;
			}
			NumOfWord[token] += 1;
			break;
		}
		case 22: {
			//头文件的处理 
			string str = "";
			cat(token, ch);
			NumOfWord[token] += 1;
			FileOutResult(CheckKey(token));
			do {
				ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
				str += ch;
			} while (Is_letter(ch) && ch != EOF);
			if(ch == EOF){
				retract(Buffer, ForwardPointer);
				state = 0;
				break;
			}
			str.erase(str.end() - 1);
			retract(Buffer, ForwardPointer);
			//为头文件时或者为宏定义时
			int flag = 0;
			if (str == "include" || str == "define") {
				flag = 1;
				NumOfWord[str] += 1;
				FileOutResult(CheckKey(str));
				//消除空格
				do {
					ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
				} while (ch == ' ');

				//自定义文件头
				if (str == "include" && (ch == '"' || ch == '<')) {
					str = "";
					do {
						ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
						str += ch;
					} while (ch != '\n' && ch != EOF);
					if (ch == EOF) {
						retract(Buffer, ForwardPointer);
						state = 0;
						break;
					}
					retract(Buffer, ForwardPointer);
					str.erase(str.end() - 1);
					str.erase(std::remove(str.begin(), str.end(), ' '), str.end());
					auto p = str.find('.');
					if (p == str.npos) {
						flag = 0;
					}
					else {
						if (p + 2 < str.size() && str[p + 1] == 'h' && (str[p + 2] == '\"' || str[p + 2] == '>')) {
							str.erase(str.end() - 1);
							NumOfWord["header_file"] += 1;
							FileOutResult("header_file",str);
						}
						else
							flag = 0;
					}
				}
				else if (str == "define") {
					retract(Buffer, ForwardPointer);
					state = 0;
					break;
				}
				else {
					flag = 0;
				}
			}
			//两者都不是时，即错误
			if(flag == 0) {
				error();
			}
			state = 0;
			break;
		}
		case 23:{
			//单引号的处理 
			string str = "";
			do {
				ch = ExtractingWords(StartPointer, ForwardPointer, cnt, Buffer);
				str += ch;
			} while (ch != '\'' && ch != '\n' && ch != EOF);
			if (ch == '\'') {
				str.erase(str.end() - 1);
				FileOutResult(CheckKey("''"), str);
				NumOfWord["ch"] += 1;
			}
			else {
				Errorlocation.insert(statement_lines);    //记录该错误所在的行号
				if (ch == EOF) {
					retract(Buffer, ForwardPointer);
					state = 0;
					break;
				}
			}
			state = 0;
			break;
		}
		case 24: {
			//错误处理 
			error();
			state = 0;
			break;
		}
		}
	} while (ch != EOF);
	fclose(stdin);//关闭文件

	//输出统计信息到输出文件中
	ShowResult(); 
	F_Out.close();
	cout << "Lexical Analysis Succeed!" << endl;
	return 0;
}

char ExtractingWords(int& StartPointer, int& ForwardPointer, int& cnt, char* Buffer)
{
	char ch;   //存储临时读入的字符
	//向前指针在左半区的终点时

	if (ForwardPointer == Half_Buffer_len - 1) {
		while (cnt < Buffer_len && (ch = getchar()) != EOF) {
			Buffer[cnt++] = ch;
		}
		//读完文件时，一般的缓冲区没有填满时，即处于结束状态
		if (ch == EOF) {
			Buffer[cnt++] = EOF;
		}
		cnt = 0;
		++ForwardPointer;
	}
	//向前指针在右半区的终点时
	else if (ForwardPointer == Buffer_len - 1) {
		while (cnt < Half_Buffer_len && (ch = getchar()) != EOF) {
			Buffer[cnt++] = ch;
		}
		//读完文件时，一般的缓冲区没有填满时，即处于结束状态
		if (ch == EOF) {
			Buffer[cnt++] = EOF;
		}
		ForwardPointer = 0;
	}
	else {
		++ForwardPointer;
	}
	if(Buffer[ForwardPointer] == '\n' || Buffer[ForwardPointer] == '\t')
		char_num += 2;  //字符总数加二
	else
		char_num ++;   //字符总数加一
	return Buffer[ForwardPointer];
}

 
void cat(string& token, const char& ch)
{
	token += ch;
}

bool Is_letter(const char& ch)
{
	return (isalpha(ch) || ch == '_');
}

bool Is_digit(const char& ch)
{
	return isdigit(ch);
}

void retract(const char* Buffer,int &ForwardPointer)
{
	//当目前所读的字母为回车或者制表符号的时候 ，字符数目 -2  
	if (Buffer[ForwardPointer] == '\n' || Buffer[ForwardPointer] == '\t')
		char_num -= 2;
	else
		--char_num;  //字符总数减一
	--ForwardPointer;
	return;
}

string CheckKey(const string & token)
{
	string str = "";
	for (auto it : MaketTable) {
		if (it.first == token)
			return it.second;
	}
	return str;
}

void FileOutResult(const string &str, string attribute)  
{
	//输出格式按照 “<记号，属性>”输出 
	F_Out <<"	< ' "<< str <<" '  ,  "<< attribute <<" >" << endl;
}

void error()
{
	Errorlocation.insert(statement_lines);    //记录该错误所在的行号
	return;
}

void ShowResult()
{
	F_Out << endl << "-------以上源程序语句分析后的统计结果如下所示-------" << endl;
	F_Out << "语句行数：" << statement_lines << endl << endl;
	F_Out << "字符总数：" << char_num << endl << endl;
	F_Out << "单词数目(单词，单词数量)：" << endl;
	for (auto it : NumOfWord) {
		F_Out << "( “ "<< it.first <<" ” , " << it.second <<" ) "<< endl;
	}
	F_Out << endl;
	if (Errorlocation.empty())
		F_Out << "源程序中不存在词法错误信息" << endl;
	else {
		F_Out << "源程序中存在的词法错误，行号如下所示：" << endl << "Line: ";
		for (auto it : Errorlocation)
			F_Out << "  " << it;
	}
	return;
}

五、可执行程序
见附件

六、测试报告：

1．输入

#include<stdio.h> 
#include "123.h"  
#include "123.p"  
#define ERROR -1
#define ACCEPT 5
#define START 0
#define Q1 1
#
#define Q2 2
#define Q3 3
#define Q4 4
main()
{
	ch@ar ch; //读入的字符
	int state=START; 
	int test1 = 1E3a;
	int test2 = 1.23e+34;
	int test3 = 1.23E-2A;
	int test4 = 1.a;
	int test5 = 1.23Ew;		 
	while(state!=ACCEPT && state!=ERROR)
	{
	    scanf("%c",&ch);
	    switch(state)
	    {
	       case START: 
		  if(ch=='a') state=Q1;
		  else if(ch=='b') state=Q2; 
		       else state=ERROR; 
		  break;
	     case Q1: 
                             if(ch=='b') state=Q3;
                             else state=ERROR; 
                             break;
	
  case Q2: 
     if(ch=='a') state=Q4;
     else state=ERROR; break;
  case Q3: 
     if (ch=='b') state=Q3; 
     else if(ch=='a') state=ACCEPT; 
            else state=ERROR; break;
  case Q4: 

     if (ch=='a) state=Q4 + 1.34E-'123;/* Here
	is a test point,you can see here and how output?* */
     else if(ch=='b') state=ACCEPT; 
             else state=ERROR; 
     break;
 }
}
if(state==ACCEPT)
   printf("valid string");
else
   printf("invalid string);
	//return 0;
	/*int a = 3;
}

2．输出
词法分析结果如下所示:

[Line 1]:
	< 1 ，‘ # '  ,  - >
	< 1 ，‘ include '  ,  - >
	< 1 ，‘ header_file '  ,  stdio.h >
[Line 2]:
	< 2 ，‘ # '  ,  - >
	< 2 ，‘ include '  ,  - >
	< 2 ，‘ header_file '  ,  123.h >
[Line 3]:
	< 3 ，‘ # '  ,  - >
	< 3 ，‘ include '  ,  - >
[Line 4]:
	< 4 ，‘ # '  ,  - >
	< 4 ，‘ define '  ,  - >
	< 4 ，‘ id '  ,  ERROR >
	< 4 ，‘ - '  ,  - >
	< 4 ，‘ num '  ,  1 >
[Line 5]:
	< 5 ，‘ # '  ,  - >
	< 5 ，‘ define '  ,  - >
	< 5 ，‘ id '  ,  ACCEPT >
	< 5 ，‘ num '  ,  5 >
[Line 6]:
	< 6 ，‘ # '  ,  - >
	< 6 ，‘ define '  ,  - >
	< 6 ，‘ id '  ,  START >
	< 6 ，‘ num '  ,  0 >
[Line 7]:
	< 7 ，‘ # '  ,  - >
	< 7 ，‘ define '  ,  - >
	< 7 ，‘ id '  ,  Q1 >
	< 7 ，‘ num '  ,  1 >
[Line 8]:
	< 8 ，‘ # '  ,  - >
[Line 9]:
	< 9 ，‘ # '  ,  - >
	< 9 ，‘ define '  ,  - >
	< 9 ，‘ id '  ,  Q2 >
	< 9 ，‘ num '  ,  2 >
[Line 10]:
	< 10 ，‘ # '  ,  - >
	< 10 ，‘ define '  ,  - >
	< 10 ，‘ id '  ,  Q3 >
	< 10 ，‘ num '  ,  3 >
[Line 11]:
	< 11 ，‘ # '  ,  - >
	< 11 ，‘ define '  ,  - >
	< 11 ，‘ id '  ,  Q4 >
	< 11 ，‘ num '  ,  4 >
[Line 12]:
	< 12 ，‘ main '  ,  - >
	< 12 ，‘ ( '  ,  - >
	< 12 ，‘ ) '  ,  - >
[Line 13]:
	< 13 ，‘ { '  ,  - >
[Line 14]:
	< 14 ，‘ id '  ,  ch >
	< 14 ，‘ id '  ,  ar >
	< 14 ，‘ id '  ,  ch >
	< 14 ，‘ ; '  ,  - >
[Line 15]:
	< 15 ，‘ int '  ,  - >
	< 15 ，‘ id '  ,  state >
	< 15 ，‘ assign-op '  ,  - >
	< 15 ，‘ id '  ,  START >
	< 15 ，‘ ; '  ,  - >
[Line 16]:
	< 16 ，‘ int '  ,  - >
	< 16 ，‘ id '  ,  test1 >
	< 16 ，‘ assign-op '  ,  - >
	< 16 ，‘ num '  ,  1E3 >
	< 16 ，‘ id '  ,  a >
	< 16 ，‘ ; '  ,  - >
[Line 17]:
	< 17 ，‘ int '  ,  - >
	< 17 ，‘ id '  ,  test2 >
	< 17 ，‘ assign-op '  ,  - >
	< 17 ，‘ num '  ,  1.23e+34 >
	< 17 ，‘ ; '  ,  - >
[Line 18]:
	< 18 ，‘ int '  ,  - >
	< 18 ，‘ id '  ,  test3 >
	< 18 ，‘ assign-op '  ,  - >
	< 18 ，‘ num '  ,  1.23E-2 >
	< 18 ，‘ id '  ,  A >
	< 18 ，‘ ; '  ,  - >
[Line 19]:
	< 19 ，‘ int '  ,  - >
	< 19 ，‘ id '  ,  test4 >
	< 19 ，‘ assign-op '  ,  - >
	< 19 ，‘ ; '  ,  - >
[Line 20]:
	< 20 ，‘ int '  ,  - >
	< 20 ，‘ id '  ,  test5 >
	< 20 ，‘ assign-op '  ,  - >
	< 20 ，‘ id '  ,  w >
	< 20 ，‘ ; '  ,  - >
[Line 21]:
	< 21 ，‘ while '  ,  - >
	< 21 ，‘ ( '  ,  - >
	< 21 ，‘ id '  ,  state >
	< 21 ，‘ != '  ,  - >
	< 21 ，‘ id '  ,  ACCEPT >
	< 21 ，‘ && '  ,  - >
	< 21 ，‘ id '  ,  state >
	< 21 ，‘ != '  ,  - >
	< 21 ，‘ id '  ,  ERROR >
	< 21 ，‘ ) '  ,  - >
[Line 22]:
	< 22 ，‘ { '  ,  - >
[Line 23]:
	< 23 ，‘ scanf '  ,  - >
	< 23 ，‘ ( '  ,  - >
	< 23 ，‘ literal '  ,  %c >
	< 23 ，‘ , '  ,  - >
	< 23 ，‘ & '  ,  - >
	< 23 ，‘ id '  ,  ch >
	< 23 ，‘ ) '  ,  - >
	< 23 ，‘ ; '  ,  - >
[Line 24]:
	< 24 ，‘ switch '  ,  - >
	< 24 ，‘ ( '  ,  - >
	< 24 ，‘ id '  ,  state >
	< 24 ，‘ ) '  ,  - >
[Line 25]:
	< 25 ，‘ { '  ,  - >
[Line 26]:
	< 26 ，‘ case '  ,  - >
	< 26 ，‘ id '  ,  START >
	< 26 ，‘ : '  ,  - >
[Line 27]:
	< 27 ，‘ if '  ,  - >
	< 27 ，‘ ( '  ,  - >
	< 27 ，‘ id '  ,  ch >
	< 27 ，‘ == '  ,  - >
	< 27 ，‘ ch '  ,  a >
	< 27 ，‘ ) '  ,  - >
	< 27 ，‘ id '  ,  state >
	< 27 ，‘ assign-op '  ,  - >
	< 27 ，‘ id '  ,  Q1 >
	< 27 ，‘ ; '  ,  - >
[Line 28]:
	< 28 ，‘ else '  ,  - >
	< 28 ，‘ if '  ,  - >
	< 28 ，‘ ( '  ,  - >
	< 28 ，‘ id '  ,  ch >
	< 28 ，‘ == '  ,  - >
	< 28 ，‘ ch '  ,  b >
	< 28 ，‘ ) '  ,  - >
	< 28 ，‘ id '  ,  state >
	< 28 ，‘ assign-op '  ,  - >
	< 28 ，‘ id '  ,  Q2 >
	< 28 ，‘ ; '  ,  - >
[Line 29]:
	< 29 ，‘ else '  ,  - >
	< 29 ，‘ id '  ,  state >
	< 29 ，‘ assign-op '  ,  - >
	< 29 ，‘ id '  ,  ERROR >
	< 29 ，‘ ; '  ,  - >
[Line 30]:
	< 30 ，‘ break '  ,  - >
	< 30 ，‘ ; '  ,  - >
[Line 31]:
	< 31 ，‘ case '  ,  - >
	< 31 ，‘ id '  ,  Q1 >
	< 31 ，‘ : '  ,  - >
[Line 32]:
	< 32 ，‘ if '  ,  - >
	< 32 ，‘ ( '  ,  - >
	< 32 ，‘ id '  ,  ch >
	< 32 ，‘ == '  ,  - >
	< 32 ，‘ ch '  ,  b >
	< 32 ，‘ ) '  ,  - >
	< 32 ，‘ id '  ,  state >
	< 32 ，‘ assign-op '  ,  - >
	< 32 ，‘ id '  ,  Q3 >
	< 32 ，‘ ; '  ,  - >
[Line 33]:
	< 33 ，‘ else '  ,  - >
	< 33 ，‘ id '  ,  state >
	< 33 ，‘ assign-op '  ,  - >
	< 33 ，‘ id '  ,  ERROR >
	< 33 ，‘ ; '  ,  - >
[Line 34]:
	< 34 ，‘ break '  ,  - >
	< 34 ，‘ ; '  ,  - >
[Line 35]:
[Line 36]:
	< 36 ，‘ case '  ,  - >
	< 36 ，‘ id '  ,  Q2 >
	< 36 ，‘ : '  ,  - >
[Line 37]:
	< 37 ，‘ if '  ,  - >
	< 37 ，‘ ( '  ,  - >
	< 37 ，‘ id '  ,  ch >
	< 37 ，‘ == '  ,  - >
	< 37 ，‘ ch '  ,  a >
	< 37 ，‘ ) '  ,  - >
	< 37 ，‘ id '  ,  state >
	< 37 ，‘ assign-op '  ,  - >
	< 37 ，‘ id '  ,  Q4 >
	< 37 ，‘ ; '  ,  - >
[Line 38]:
	< 38 ，‘ else '  ,  - >
	< 38 ，‘ id '  ,  state >
	< 38 ，‘ assign-op '  ,  - >
	< 38 ，‘ id '  ,  ERROR >
	< 38 ，‘ ; '  ,  - >
	< 38 ，‘ break '  ,  - >
	< 38 ，‘ ; '  ,  - >
[Line 39]:
	< 39 ，‘ case '  ,  - >
	< 39 ，‘ id '  ,  Q3 >
	< 39 ，‘ : '  ,  - >
[Line 40]:
	< 40 ，‘ if '  ,  - >
	< 40 ，‘ ( '  ,  - >
	< 40 ，‘ id '  ,  ch >
	< 40 ，‘ == '  ,  - >
	< 40 ，‘ ch '  ,  b >
	< 40 ，‘ ) '  ,  - >
	< 40 ，‘ id '  ,  state >
	< 40 ，‘ assign-op '  ,  - >
	< 40 ，‘ id '  ,  Q3 >
	< 40 ，‘ ; '  ,  - >
[Line 41]:
	< 41 ，‘ else '  ,  - >
	< 41 ，‘ if '  ,  - >
	< 41 ，‘ ( '  ,  - >
	< 41 ，‘ id '  ,  ch >
	< 41 ，‘ == '  ,  - >
	< 41 ，‘ ch '  ,  a >
	< 41 ，‘ ) '  ,  - >
	< 41 ，‘ id '  ,  state >
	< 41 ，‘ assign-op '  ,  - >
	< 41 ，‘ id '  ,  ACCEPT >
	< 41 ，‘ ; '  ,  - >
[Line 42]:
	< 42 ，‘ else '  ,  - >
	< 42 ，‘ id '  ,  state >
	< 42 ，‘ assign-op '  ,  - >
	< 42 ，‘ id '  ,  ERROR >
	< 42 ，‘ ; '  ,  - >
	< 42 ，‘ break '  ,  - >
	< 42 ，‘ ; '  ,  - >
[Line 43]:
	< 43 ，‘ case '  ,  - >
	< 43 ，‘ id '  ,  Q4 >
	< 43 ，‘ : '  ,  - >
[Line 44]:
[Line 45]:
	< 45 ，‘ if '  ,  - >
	< 45 ，‘ ( '  ,  - >
	< 45 ，‘ id '  ,  ch >
	< 45 ，‘ == '  ,  - >
	< 45 ，‘ ch '  ,  a) state=Q4 + 1.34E- >
	< 45 ，‘ num '  ,  123 >
	< 45 ，‘ ; '  ,  - >
[Line 46]:
[Line 47]:
	< 47 ，‘ else '  ,  - >
	< 47 ，‘ if '  ,  - >
	< 47 ，‘ ( '  ,  - >
	< 47 ，‘ id '  ,  ch >
	< 47 ，‘ == '  ,  - >
	< 47 ，‘ ch '  ,  b >
	< 47 ，‘ ) '  ,  - >
	< 47 ，‘ id '  ,  state >
	< 47 ，‘ assign-op '  ,  - >
	< 47 ，‘ id '  ,  ACCEPT >
	< 47 ，‘ ; '  ,  - >
[Line 48]:
	< 48 ，‘ else '  ,  - >
	< 48 ，‘ id '  ,  state >
	< 48 ，‘ assign-op '  ,  - >
	< 48 ，‘ id '  ,  ERROR >
	< 48 ，‘ ; '  ,  - >
[Line 49]:
	< 49 ，‘ break '  ,  - >
	< 49 ，‘ ; '  ,  - >
[Line 50]:
	< 50 ，‘ } '  ,  - >
[Line 51]:
	< 51 ，‘ } '  ,  - >
[Line 52]:
	< 52 ，‘ if '  ,  - >
	< 52 ，‘ ( '  ,  - >
	< 52 ，‘ id '  ,  state >
	< 52 ，‘ == '  ,  - >
	< 52 ，‘ id '  ,  ACCEPT >
	< 52 ，‘ ) '  ,  - >
[Line 53]:
	< 53 ，‘ printf '  ,  - >
	< 53 ，‘ ( '  ,  - >
	< 53 ，‘ literal '  ,  valid string >
	< 53 ，‘ ) '  ,  - >
	< 53 ，‘ ; '  ,  - >
[Line 54]:
	< 54 ，‘ else '  ,  - >
[Line 55]:
	< 55 ，‘ printf '  ,  - >
	< 55 ，‘ ( '  ,  - >
[Line 56]:
[Line 57]:
[Line 58]:

-------以上源程序语句分析后的统计结果如下所示-------
语句行数：58

字符总数：1261

单词数目(单词，单词数量)：
( “ != ” , 2 )
( “ # ” , 11 )
( “ & ” , 1 )
( “ && ” , 1 )
( “ ( ” , 15 )
( “ ) ” , 13 )
( “ , ” , 1 )
( “ - ” , 1 )
( “ : ” , 5 )
( “ ; ” , 27 )
( “ == ” , 9 )
( “ assign-op ” , 18 )
( “ break ” , 5 )
( “ case ” , 5 )
( “ ch ” , 8 )
( “ define ” , 7 )
( “ else ” , 9 )
( “ enter ” , 57 )
( “ header_file ” , 2 )
( “ id ” , 65 )
( “ if ” , 9 )
( “ include ” , 3 )
( “ int ” , 6 )
( “ literal ” , 2 )
( “ main ” , 1 )
( “ num ” , 11 )
( “ printf ” , 2 )
( “ scanf ” , 1 )
( “ switch ” , 1 )
( “ while ” , 1 )
( “ { ” , 3 )
( “ } ” , 2 )

源程序中存在的词法错误，行号如下所示：
Line: 14 19 20 55 58

3．分析说明
1.关键词的符号与属性的对照表是从文件中进行读取的，因此在运行程序之前，需要将相应的对照表与.c文件放在同一个目录下。
2.关键词或者自定义的标识符的输出时，关键词只会输出记号而关键词的属性就没有输出，用‘-’代替；自定义的标识符输出时，如果是字符串或者单引号字符，则会在属性的地方输出引号中的内容或者字符内容。
3.记录字符的个数时，会将文件中的所有字符都计算在内，包括解释语句的所有内容（注意，此时如果注释是中文，那么在计算字符个数的时候，在此程序中，会将中文汉字按照一个汉字一个字符数目计算）。
4.回车和制表符在此程序中都当作两个字符进行计数。
5.错误处理
(1)上述示例代码的第14行，“ch@ar”很明显，‘@’是非法字符，那么处理这个错误的时候，将‘@’直接跳过，接着处理“ar”，将“ar”也当作一个自定义标识符进行处理，记录错误的行号14。
(2)第19，20行，指数的输入规则错误，第19行错在1.a,第20行错在指数的后面跟着字符。但是分析的时候，会将如上述第19列所示的将‘w’当作一个字符进行处理，并且写进了符号表中。记录错误的行号19，20。
(3)第45行的注释符处理，正常理解执行即可。
(4)第57行的注释符处理，因为找不到与第57行的注释符匹配的符号，导致直到源程序代码结束，都没有遇到“*/”,那么对该中情况的处理，将会是将“/*”之后的内容全部忽略掉，然后错误处理（即记录/*所在的行号）。记录错误的行号58（即记录最后为程序代码结束的行号）。
(5)第55行所在的输出时，“”””英文双引号不匹配，缺少了右边的对应的双引号，导致词法分析错误，这种错误的处理，就是读到回车时或者文件结束符号的时候，才会进行新的一个字符的读取。当然，会记录以上错误所发生的行号55。

如需下载完整文档及代码，请跳转

标签：case,break,ch,Buffer,分析程序,C语言,词法,state,id
来源： https://blog.csdn.net/qq_44116998/article/details/112168756